I noticed that I am unable to open couple of hosts within fl osquery #fleet

I noticed that I am unable to open couple of hosts...

Daniel Weeber

12/29/2021, 8:06 PM

I noticed that I am unable to open couple of hosts within fleetdm. showing “Unable to load host. Please try again”. Showing as online though! Live Queries also not working. Had a look at the reverseproxy logs and found bazillion of HTTP 500 for POST to /api/v1/osquery/distributed/write But some other hosts are working fine, live queries also working fine.. and they are located on the same network as the ones which are not working, so I dont suspect any networking/firewalling/proxy issue.

zwass

12/29/2021, 8:13 PM

Hmm, is there any more information about the 500 errors in the Fleet server logs? Which version of Fleet are you running? Can you run osqueryd manually on one of the effected hosts with

--verbose --tls_dump

and see if there's any information about what the actual error is?

Daniel Weeber

12/29/2021, 8:16 PM

Anything I should have a look for in special?

Daniel Weeber

12/29/2021, 8:17 PM

Only thing I notice so far is

Copy code

"error": "internal error: load host additional: load pack stats: timestamp: 2021-12-29T21:16:45+01:00: sql: Scan error on column index 10, name \"last_executed\": unsupported Scan, storing driver.Value type \u003cnil\u003e into type *time.Time"

zwass

12/29/2021, 8:23 PM

Which Fleet version is this? IIRC we fixed this issue or something similar recently.

Daniel Weeber

12/29/2021, 8:28 PM

4.6.2 Will upgrade to 4.7.0 now

zwass

12/29/2021, 8:30 PM

Please lmk if that resolves it. I think we may have fixed this issue in 4.7.0.

Daniel Weeber

12/29/2021, 8:34 PM

Working! Host is opening up again in fleetdm gui and also executing live queries… But its painfully slow. “get openssl versions” is taking like 1-2mins.

Daniel Weeber

12/29/2021, 8:34 PM

Exact query is

Copy code

SELECT name AS name, version AS version, 'deb_packages' AS source FROM deb_packages WHERE name LIKE 'openssl%' UNION SELECT name AS name, version AS version, 'apt_sources' AS source FROM apt_sources WHERE name LIKE 'openssl%' UNION SELECT name AS name, version AS version, 'rpm_packages' AS source FROM rpm_packages WHERE name LIKE 'openssl%';

Daniel Weeber

12/29/2021, 8:39 PM

just tried via osqueryi on local machine.. result is instant

Daniel Weeber

12/29/2021, 8:44 PM

Even with distributed_interval=1 its slow af

zwass

12/29/2021, 8:45 PM

Also instant with

sudo osqueryi

on local machine?

zwass

12/29/2021, 8:45 PM

Are other hosts also slow? Or just the previously effected ones?

Daniel Weeber

12/29/2021, 8:45 PM

yes. same query on target machine is instant via osqueryi

Daniel Weeber

12/29/2021, 8:46 PM

Nope, presumably all of them.

zwass

12/29/2021, 8:47 PM

How's the CPU doing on your Fleet servers/Redis/MySQL?

Daniel Weeber

12/29/2021, 8:48 PM

Seems bored

Daniel Weeber

12/29/2021, 8:49 PM

Webinterface of fleetdm is rapid fast

Daniel Weeber

12/29/2021, 8:51 PM

Query of linux, windows and macs (my macbook) is painfully slow, just tested all 3. mac seems a bit faster, but not much about 75 seconds for

Copy code

SELECT * FROM system_info;

Tomas Touceda

01/03/2022, 11:38 AM

hi there! is that CPU graph for the fleet instance? could you tell me a bit more about your setup? eg: what redis and mysql are you using, how is CPU/memory in there

Daniel Weeber

01/07/2022, 5:54 PM

Sorry for late reply. VM on big virtualization cluster, flash only SAN, and so on. redis and mysql is local. everything is blazing fast expect executing queries on hosts - even on hosts in the same subnet as the fleet server i just checked again after my last message, did not use fleet over christmas/new years, same problem, but a bit different. if I execute a query (“get openssl versions”) for the first time it takes about 10 seconds. slow… but could be okay? if I then click on “run again” right afterwards it takes over 100 seconds

Daniel Weeber

01/07/2022, 5:56 PM

and yes, this was the CPU graph of the fleet vm

zwass

01/07/2022, 5:57 PM

Did we already look at the configured

distributed_interval

for the hosts? You can see it on the host detail page.

Daniel Weeber

01/07/2022, 5:58 PM

*Distributed interval*1 min

zwass

01/07/2022, 5:59 PM

With that distributed interval, up to 60s would be totally normal (depending on where in the interval the host was when the live query started)

Daniel Weeber

01/07/2022, 5:59 PM

But I have “--distributed_interval=10” in my osquery.flags

zwass

01/07/2022, 5:59 PM

Maybe you have it set differently in your "agent options" in Fleet?

Daniel Weeber

01/07/2022, 6:01 PM

You’re right. wow… Just remove that line and apply? Or do I have to restart the remote osquery agents?

Daniel Weeber

01/07/2022, 6:06 PM

Just removed the global agent config.. still 1min

Daniel Weeber

01/07/2022, 6:13 PM

Okay, removing the line, restarting remote osqueryd and refetching.. now its showing 10sec

Daniel Weeber

01/07/2022, 6:13 PM

Thank you!

zwass

01/07/2022, 11:32 PM

Sounds like things are working as expected now? Or are you still seeing unexpectedly long delays?

8 Views

Open in Slack

Previous Next