I wonder if anyone can give some advice on how to ...
# kolide
r
I wonder if anyone can give some advice on how to debug queries that hang? We’re managing ~1000 hosts in a Fleet instance, and occasionally queries will run on most but hang on a small subset. In a test last week we targeted 127 hosts, and 125 returned, but 2 hung for a long time and we eventually cancelled. When a query is stuck like this what’s the best way to investigate the cause?
f
I usually start debugging queries by running them on my own device with osqueryi to determine if there is an issue which is causing them to timeout. Can you share the queries in question to Slack so we can take a look?
s
Looking at the launcher logs is a good first pass.
r
Thanks - I’ll try with
osqueryi
on the devices that didn’t return, to see if that helps shed any light.
We’re not using Launcher so I don’t have any log from that to look at.
s
What's in the osquery logs? Can't even tell if it got the query...
r
let me double check 🙂
so, the
/var/log/osquery/
is empty on the client
the team member of mine who was actually exploring this isn’t online for me to get more information, but if there’s client-side logging I can enable I can roll that change out?
s
How osquery logs is also os dependent. Results are presumablly going to fleet. But osquery logs might be going out via stdout/stderr.
Anyhow, not much to debug without logs.
r
yeah, sadly no journalctl on these hosts, they’re running 14.04 😞
s
You should be able to configure basic logging there too.
r
so at the moment it’s set to log only to Fleet - via the tls plugin, so I guess I can enable syslog on the clients too for debugging
s
I haven't looked at this recently. But those logs may show if the query was received.
👍 1
r
okidoki
thanks for the tips, I’ll try a couple of these later in the week and come back
s
Really what I'm saying here is that you need logs to even start debugging. Did it hang? Was it never received? Was something dropped? Did something crash?
👍 1