Hey all Just updated fleet to 4 17 0 and having a problem wi osquery #fleet

Hey all, Just updated fleet to 4.17.0, and having ...

Ari Weinberg

07/12/2022, 8:45 PM

Hey all, Just updated fleet to 4.17.0, and having a problem with distributed queries. Im getting the following error:

Copy code

{
  "component": "http",
  "err": "error in query ingestion",
  "ingestion-err": "campaign waiting for listener (please retry)",
  "ip_addr": "ENDPOINT-IP:41730",
  "level": "error",
  "method": "POST",
  "took": "1.136788ms",
  "ts": "2022-07-12T20:38:50.060101107Z",
  "uri": "/api/v1/osquery/distributed/write",
  "x_for_ip_addr": ""
}

Also getting (although not sure its related):

Copy code

{
  "component": "http",
  "err": "read auth token: reading from websocket: sockjs: session not in open state",
  "msg": "failed to read auth token",
  "ts": "2022-07-12T20:37:57.77330272Z"
}

The problem appears to be the agent talking back to the fleet server, because I can see the query being run on the agent in debug mode. It just seems to fail when posting back the results. Agent is vanilla OSquery 5.1.0 This only started since I updated a few minutes ago from fleet 4.9.1

Kathy Satterlee

07/12/2022, 9:27 PM

Moving the conversation into a thread. Sorry about that! Are you using a proxy?

Ari Weinberg

07/12/2022, 9:27 PM

Nope

Ari Weinberg

07/12/2022, 9:28 PM

Tracert shows 1 direct hop from the agent to the server

Ari Weinberg

07/12/2022, 9:28 PM

server is running in docker though

Ari Weinberg

07/12/2022, 9:36 PM

Did the osquery api endpoint change?

Kathy Satterlee

07/12/2022, 9:52 PM

There were some changes in 4.13.2 related to websockets. It seems like we're starting to see these issues when websocket traffic isn't allowed because SockJS isn't reliable as a workaround. Can you tell me more about your Fleet/MySQL/Redis setup? I can reach out to the team and see what they suggest we poke at.

Ari Weinberg

07/12/2022, 9:53 PM

all on a single server spun up with docker-compose

Kathy Satterlee

07/12/2022, 10:05 PM

Would you mind sharing your

compose

file?

Kathy Satterlee

07/12/2022, 10:05 PM

With any sensitive data removed, of course.

Ari Weinberg

07/12/2022, 10:27 PM

I dm'd it to you

Ari Weinberg

07/12/2022, 10:27 PM

Thanks so much for your help!

Jason Cetina

07/12/2022, 10:33 PM

Erm, i'm not sure if this is related or not, but we updated to 4.1.7 yesterday and all our hosts show as offline in fleet.

Jason Cetina

07/12/2022, 10:33 PM

just happened an hour ago

Kathy Satterlee

07/12/2022, 11:04 PM

@Jason Cetina Can you check your logs and see if you're seeing similar errors?

Kathy Satterlee

07/12/2022, 11:05 PM

Thanks, @Ari Weinberg. I'll bring this to the team and see if we can sort out what's up. I may not get a response until tomorrow, but I'll update this thread as soon as I do.

zwass

07/12/2022, 11:09 PM

@Jason Cetina what version did you upgrade from? We have seen some issues with load balancers using deprecated SSL configurations that have been removed from support in the Go stdlib HTTP libraries we use.

Jason Cetina

07/12/2022, 11:10 PM

we upgraded from 4.1.6

zwass

07/12/2022, 11:10 PM

When you say 4.1.7 and 4.1.6, do you mean 4.17.0 and 4.16.0?

Jason Cetina

07/12/2022, 11:11 PM

orry i'm a dummy yes 4.17

Jason Cetina

07/12/2022, 11:11 PM

from 4.16.0 ->4.17.0

zwass

07/12/2022, 11:12 PM

@Ari Weinberg can you open your browser devtools on the live query page and see if you are getting any errors in the network tab or the JS console?

Jason Cetina

07/12/2022, 11:12 PM

@Kathy Satterlee fleet logs or osquery logs? do we need debugging set to true?

zwass

07/12/2022, 11:12 PM

@Jason Cetina can you look at your Fleet server logs for any errors?

Jason Cetina

07/12/2022, 11:12 PM

@zwass do I need debugging on or no?

zwass

07/12/2022, 11:13 PM

should not need it, but let's see what is in your logs

Jason Cetina

07/12/2022, 11:13 PM

just to be clear, the weird part is that this is only for part of our fleet

Jason Cetina

07/12/2022, 11:13 PM

and to a different endpoint (internal vs external vip), but we've made no serious changes to any config

Jason Cetina

07/12/2022, 11:14 PM

anyway, i'm looking for errors

zwass

07/12/2022, 11:14 PM

that sounds very likely to be related to the LB configuration

Jason Cetina

07/12/2022, 11:15 PM

i think so, too. I will dig again.

Jason Cetina

07/12/2022, 11:15 PM

nothing in logs

Jason Cetina

07/12/2022, 11:20 PM

@zwass looks like our lb team rotated a cert roughly in this timeframe.

Jason Cetina

07/12/2022, 11:20 PM

our cert I should say

Jason Cetina

07/12/2022, 11:21 PM

anyway, not your problem anymore. Sorry for the noise.

zwass

07/12/2022, 11:25 PM

Glad to hear it!

Ari Weinberg

07/12/2022, 11:37 PM

@zwass

zwass

07/12/2022, 11:38 PM

You don't have any kind of load balancer or anything? Looks like the LB websockets issue we commonly see.

Ari Weinberg

07/12/2022, 11:38 PM

I dont think so

zwass

07/12/2022, 11:39 PM

So you have Fleet running on a server with docker-compose -- are you connecting directly to that server?

Ari Weinberg

07/12/2022, 11:39 PM

Where would the proxy cause an issue? between the server and agent? or between the client (web UI) and server?

zwass

07/12/2022, 11:39 PM

web UI and server

Ari Weinberg

07/12/2022, 11:39 PM

Ahhh. There is a proxy there

zwass

07/12/2022, 11:40 PM

Typically googling "<name of proxy> websocket configuration" is the best way to address this

Ari Weinberg

07/12/2022, 11:40 PM

Will do. Thanks so much!!!!

Kathy Satterlee

07/12/2022, 11:40 PM

Thanks for the assist, @zwass !

Ari Weinberg

07/12/2022, 11:42 PM

Confirmed that's the problem by bypassing the proxy and going directly to the server. @zwass @Kathy Satterlee Thanks so much for all your help!

🦜 1

zwass

07/12/2022, 11:43 PM

Glad to hear it!

Kathy Satterlee

07/12/2022, 11:43 PM

Awesome news!

Jason Cetina

07/12/2022, 11:50 PM

@Kathy Satterlee @zwass - just to close the loop here, we didn't have osqueryd configured to use the OS maintained cert bundle in

/etc/ssl/certs/ca-certificates.crt

. The root CA changed for this endpoint and so everything turfed when it got rotated. Not sure how/why it was setup that way. Anyway, it's fixed now.

🚀 1

zwass

07/12/2022, 11:50 PM

Sweet! Thank you 🙂

Jason Cetina

07/12/2022, 11:51 PM

Again, sorry for the noise.

Kathy Satterlee

07/12/2022, 11:51 PM

Thanks for the info! And for reaching out :)

👍 1

Ari Weinberg

07/15/2022, 3:13 PM

Sooooo, hate to re-open this, but I fixed the websocket issue, and I'm still getting no results... I'm getting the same error on the fleet server as before:

Copy code

{
  "component": "http",
  "err": "error in query ingestion",
  "ingestion-err": "campaign stopped",
  "ip_addr": "AGENT-IP:50524",
  "level": "error",
  "method": "POST",
  "took": "2.406993ms",
  "ts": "2022-07-15T15:11:55.292524773Z",
  "uri": "/api/v1/osquery/distributed/write",
  "x_for_ip_addr": ""
}

Getting the following in the console:

Ari Weinberg

07/15/2022, 3:13 PM

Any tips?

Ari Weinberg

07/15/2022, 3:14 PM

The query is coming through and executing on the agent, but results are not being returned

127 Views

Open in Slack

Previous Next