I saw there is a --timeout flag in a fleetctl whic...
# fleet
w
I saw there is a --timeout flag in a fleetctl which is a nice feature. Is there anything similar in the WebUI. I've been running a simple query over 1389 hosts and under one minute 1386 of them returned results. Then the query runs forever I think (at least I have waited few mins before stopping this). What's the core of this problem. Could have this been mitigated in fleet or osquery ?
Looking at the browser debug once it 'hangs' the web socket is getting single 'h' sign every 25 seconds
z
Are you concerned that this is causing a performance impact on Fleet and/or osquery clients? With the
--timeout
flag in
fleetctl
I figure that's useful because you might be running the query with a shell script or some other automation. I don't see the same use case in the UI. Just trying to understand your motivation here.
w
No, I am mostly concerned that my query never ends 🙂 And I even don't know why and which hosts fail
z
Yeah it is a little strange. Seems you've got 1 host that didn't respond. If a host responds with an error it will be counted as responded, so it seems this one host appears "online" to Fleet but is not actually responding to the query (or possibly there is some error in recording the response).
w
yeah, I need to make a diff between all hosts in fleet and those that are sending response. maybe I can find the one that has issues
z
Please let me know if you find anything.
w
I found the offending host, it doesn't seem to be getting the distributed query
even when I restart the osqueryd
ill debug on osuqryd side
z
--tls_dump
can usually help with such things
w
it receives 13 distributed queries
even though nothing is running
something wrong
I checked other host and it receives empty hash
quieres: {}
correctly
z
Does that host respond with results to those queries?
w
it executes those queries, but they have been done in the past
they seem to be stuck on the fleet side
I mean the osqueryd log says "executing distributed query ...."
z
Do you see logs in
--tls_dump
indicating that the host actually makes the request with the results back to Fleet?
w
it should send to /write ?
z
Yes
w
no, I don't see that info
ill check another host
I need to leave now, ill come back to this tomorrow morning
thanks for help
🍻 1
so the issue is still there, should I file a GitHub issue ?
z
Have you identified whether osquery is sending the response to Fleet?
w
Yeah, I am not seeing osquery sending to /distributed/write
How does fleet work in terms of distributed queries ? Would it queue the queries at the /read endpoint for each host that it didn't get reply from ? I am trying to understand why there is so many queries for that host pending. If that's the case I guess it's a ticket to osquery ?
z
You probably have label and detail queries that Fleet is trying to run. Fleet decides whether to run those based on when the last results were written. If that host is not actually writing results then Fleet would send those queries each time. Your live queries should only be sent to the host while they are "running".
n
Hi @Wojtek did you end up finding the reasoning for a host appearing “online” in Fleet but not responding (hanging) to the live query? I think a similar issue was raised in this issue on GitHub.
w
Hi @Noah Talerman. I didn't have time to dig too much. The only thing I found out was that restarting osquery on the affected host helped. If I get more of these issues I will dig deeper and let You know.
🍻 1
👍 1