Osquery uses basic SQL commands to leverage a relational data-model to describe a device.

osquery

Hi everyone :grin:,
I'm having a weird issue with osquery-fleet communication.
I'm using Fleet 4.60.1 and osquery 5.13.1.
I know that osquery sends a POST request to `/api/v1/osquery/distributed/read` to check-in for distributed queries, and sends the results as a POST request to `/api/v1/osquery/distributed/write`.
I'm running my agent with `--verbose` and `--tls_dump`, to see the communication.
I see a 'read' request and the received query to execute, then I see the 'write' request with the query's final results,
but the weird part is that the next time the agent sends a 'read' request it gets the same query, as if it never returned it's results (sometimes it happens more than twice).
Can someone please help? I have no clue what's wrong, and I didn't change any configuration (it just started happening today):pray:

Hi <@U062DQE7ZB4>! Are you seeing any errors in the Fleet server logs around query ingestion? 

Is this happening for all queries, or a particular query/type of query (detail queries/policies/live queries)?

Hi <@U02KY9FDA59>!
I think I figured out why this happens.
The `distributed_interval` was set to 10. When the agent got a query, it executed it right away and sent the results to `/api/v1/osquery/distributed/write`, but my Fleet server got it *more than 10 seconds after,* so the agent got the same query again because Fleet got the results only after the second time the agent queried `/api/v1/osquery/distributed/read`.
I have a lot of agents (around 8000), and I think Fleet got the result too late because of high loads. Is there any way to disable some functions and queries from being executed regularly?
For example I saw that when the agent sends a report about all the software on the machine it sends a HUGE JSON to Fleet (and from 8000 hosts it probably causes a lot of traffic).
I also increased the `distributed_interval` to 120, but the problem still occurs sometimes. Thanks!

If a host has already picked up a distributed query once, it shouldn't pick it up again even if it checks in before sending results. It sounds like there may be some slowness between Fleet and Redis/MySQL. In the Fleet server logs, that will often manifest as "context cancelled" or "i/o timeout" errors.I'd definitely take a look at the reference architecture to make sure that you're properly scaled up for the quantity of hosts you're enrolling. There are a few other things you can do to spread out the load a bit. For instance, you could stretch out the intervals at which host data are collected.

<@U02KY9FDA59> Thanks, I followed your suggestions and it all run smoother now.
Looks like adjusting the intervals to spread the load really helped.:pray: