Title
#fleet
b

billcobbler

03/16/2021, 12:35 AM
We're running into a weird behavior after upgrading to 3.5 to 3.8 related to the s/kolide/fleet/ in distributed query names. We have
distributed_tls_max_attempts
set to 3, but osquery clients continue to send old distributed query results which results in repeated 500s with Fleet logging error messages indicating the hosts are still using old query names. Some example error logs:
failed to ingest result: unknown query prefix: kolide_detail_query_osquery_flags
 failed to ingest result: unknown query prefix: kolide_label_query_9
 failed to ingest result: unknown query prefix: kolide_detail_query_uptime
As far as I can tell after looking at how Fleet generates host detail queries, this might just be an issue with OSQuery not honoring the max attempts setting? Anyone else run into this, or know for a fact that the max attempts setting actually works?
zwass

zwass

03/16/2021, 12:39 AM
I cannot confirm whether the max attempts setting works. Are the errors definitely coming from the same hosts? Are all Fleet servers definitely running 3.8?
b

billcobbler

03/16/2021, 12:58 AM
All the fleet servers are actually running 3.9 now (recently upgraded). There are 26 unique IPs in
x_for_ip_addr
(out of ~3k hosts) but that might just be NAT'd IPs or something. This happened on our server fleet environment, but the behavior fell off after a couple days. However our workstation fleet environment continues to experience the issue and I don't see any downward trend in the error rate.