Title
#fleet
w

Wojtek

12/16/2020, 11:20 AM
I saw there is a --timeout flag in a fleetctl which is a nice feature. Is there anything similar in the WebUI. I've been running a simple query over 1389 hosts and under one minute 1386 of them returned results. Then the query runs forever I think (at least I have waited few mins before stopping this). What's the core of this problem. Could have this been mitigated in fleet or osquery ?
11:24 AM
Looking at the browser debug once it 'hangs' the web socket is getting single 'h' sign every 25 seconds
11:27 AM
zwass

zwass

12/16/2020, 5:24 PM
Are you concerned that this is causing a performance impact on Fleet and/or osquery clients? With the
--timeout
flag in
fleetctl
I figure that's useful because you might be running the query with a shell script or some other automation. I don't see the same use case in the UI. Just trying to understand your motivation here.
w

Wojtek

12/16/2020, 5:31 PM
No, I am mostly concerned that my query never ends 🙂 And I even don't know why and which hosts fail
zwass

zwass

12/16/2020, 5:36 PM
Yeah it is a little strange. Seems you've got 1 host that didn't respond. If a host responds with an error it will be counted as responded, so it seems this one host appears "online" to Fleet but is not actually responding to the query (or possibly there is some error in recording the response).
w

Wojtek

12/16/2020, 5:43 PM
yeah, I need to make a diff between all hosts in fleet and those that are sending response. maybe I can find the one that has issues
zwass

zwass

12/16/2020, 5:49 PM
Please let me know if you find anything.
w

Wojtek

12/16/2020, 5:59 PM
I found the offending host, it doesn't seem to be getting the distributed query
5:59 PM
even when I restart the osqueryd
5:59 PM
ill debug on osuqryd side
zwass

zwass

12/16/2020, 6:00 PM
--tls_dump
can usually help with such things
w

Wojtek

12/16/2020, 6:12 PM
it receives 13 distributed queries
6:12 PM
even though nothing is running
6:12 PM
something wrong
6:13 PM
I checked other host and it receives empty hash
6:13 PM
quieres: {}
6:13 PM
correctly
zwass

zwass

12/16/2020, 6:13 PM
Does that host respond with results to those queries?
w

Wojtek

12/16/2020, 6:16 PM
it executes those queries, but they have been done in the past
6:16 PM
they seem to be stuck on the fleet side
6:16 PM
I mean the osqueryd log says "executing distributed query ...."
zwass

zwass

12/16/2020, 6:22 PM
Do you see logs in
--tls_dump
indicating that the host actually makes the request with the results back to Fleet?
w

Wojtek

12/16/2020, 6:28 PM
it should send to /write ?
zwass

zwass

12/16/2020, 6:29 PM
Yes
w

Wojtek

12/16/2020, 6:30 PM
no, I don't see that info
6:30 PM
ill check another host
6:32 PM
I need to leave now, ill come back to this tomorrow morning
6:32 PM
thanks for help
🍻 1
1:38 PM
so the issue is still there, should I file a GitHub issue ?
zwass

zwass

12/17/2020, 4:19 PM
Have you identified whether osquery is sending the response to Fleet?
w

Wojtek

12/18/2020, 8:51 AM
Yeah, I am not seeing osquery sending to /distributed/write
8:53 AM
How does fleet work in terms of distributed queries ? Would it queue the queries at the /read endpoint for each host that it didn't get reply from ? I am trying to understand why there is so many queries for that host pending. If that's the case I guess it's a ticket to osquery ?
zwass

zwass

12/18/2020, 1:59 PM
You probably have label and detail queries that Fleet is trying to run. Fleet decides whether to run those based on when the last results were written. If that host is not actually writing results then Fleet would send those queries each time. Your live queries should only be sent to the host while they are "running".
Noah Talerman

Noah Talerman

02/11/2021, 6:20 PM
Hi @Wojtek did you end up finding the reasoning for a host appearing “online” in Fleet but not responding (hanging) to the live query? I think a similar issue was raised in this issue on GitHub.
w

Wojtek

02/17/2021, 10:15 AM
Hi @Noah Talerman. I didn't have time to dig too much. The only thing I found out was that restarting osquery on the affected host helped. If I get more of these issues I will dig deeper and let You know.
🍻 1
👍 1