Hi Folks. I am running into the strangest issue wi...
# kolide
e
Hi Folks. I am running into the strangest issue with Fleet. My distributed queries sometimes execute and sometimes do not. I am monitoring logs both on the Fleet server and the osquery node. I see in the Fleet server that a NewDistributedQueryCampaign is scheduled successfully. On the osquery node, I am monitoring with
--verbose
and
--tls_dump
to see all data. I see that the osquery node is polling the read endpoint (
api/v1/osquery/distributed/read
) with the correct node key. It usually just gets back an empty
queries:
response. Sometimes, however, it gets the proper query and runs it! It seems to run it about 10% of the time. Also notable is that creating packs/scheduling queries works 100% of the time. My Fleet server is deployed behind an HAProxy LB. The LB uses its own certificates (signed, wildcard), and my Fleet server uses self-signed certs. The osquery node is using the public key
pem
file for the LB cert, and it enrolls properly. Anyone have any ideas?
s
Is HAProxy caching anything?
I think it’s a GET request, so the response may be cached with an empty queries object from when osqueryd checked in and there were no distributed queries to run
Edit: looks like it’s a POST request, so less likely to be cached, but still something to check
e
interesting... i'll look into that
Woah, I just found something crazy. When I'm logged in to the Fleet Web UI not using the load balancer, so connecting to the UI directly at the server, then my distributed queries work 100% of the time! The osquery nodes can be connected through the LB!
It seems that when scheduling queries through the LB endpoint, the queries don't actually get scheduled
z
Are you referring to scheduled queries (in packs) or live queries?
e
live queries
z
Does your LB support websockets? We try to support live queries either way, but that could potentially be an issue.
e
I'm not sure, I will check. Does it normally use websockets? Can I explicitly disable websockets
z
Normally it pushes results over a websocket. I don't think you can explicitly disable them.
e
The osquery agent or the fleet server pushes results via websocket? I see some stuff in the
LiveQuery
method in
server/service/client_live_query.go
but I'm having trouble figuring out what the websocket is doing
ohh it's so the Web UI can interact with the fleet server backend, right?
z
yes
e
Interesting, when not using the LB, my queries work but I see the following error in the Chrome Console:
Copy code
WebSocket connection to '<wss://my-kolide-svc.my-company.com/api/v1/kolide/results/207/efwfuaj3/websocket>' failed: WebSocket is closed before the connection is established.
However I see in the network tab that the websocket performed the content download and it took 2.26 ms. When executing the live query through the LB, I see in the network tab that the websocket is initiated, but it just hangs there.