We re running into a weird behavior after upgrading to 3 5 t osquery #fleet

We're running into a weird behavior after upgradin...

billcobbler

03/16/2021, 12:35 AM

We're running into a weird behavior after upgrading to 3.5 to 3.8 related to the s/kolide/fleet/ in distributed query names. We have

distributed_tls_max_attempts

set to 3, but osquery clients continue to send old distributed query results which results in repeated 500s with Fleet logging error messages indicating the hosts are still using old query names. Some example error logs:

Copy code

failed to ingest result: unknown query prefix: kolide_detail_query_osquery_flags
 failed to ingest result: unknown query prefix: kolide_label_query_9
 failed to ingest result: unknown query prefix: kolide_detail_query_uptime

As far as I can tell after looking at how Fleet generates host detail queries, this might just be an issue with OSQuery not honoring the max attempts setting? Anyone else run into this, or know for a fact that the max attempts setting actually works?

zwass

03/16/2021, 12:39 AM

I cannot confirm whether the max attempts setting works. Are the errors definitely coming from the same hosts? Are all Fleet servers definitely running 3.8?

billcobbler

03/16/2021, 12:58 AM

All the fleet servers are actually running 3.9 now (recently upgraded). There are 26 unique IPs in

x_for_ip_addr

(out of ~3k hosts) but that might just be NAT'd IPs or something. This happened on our server fleet environment, but the behavior fell off after a couple days. However our workstation fleet environment continues to experience the issue and I don't see any downward trend in the error rate.

6 Views

Open in Slack

Previous Next