There's any performance issue un 4.5.1? I'm runnin...
# general
e
There's any performance issue un 4.5.1? I'm running queries on kolide on 4.5.1 hosts and they are not returning nothing until i restart the service on each hosts. Even an osquery_info Query is taking forever and not returning nothing
s
I don’t think I’ve seen or heard about any issues with 4.5.1 or 4.5.0. Do you have more information? How did 4.5.1 get installed? (If there’s any of Kolide’s update mechanism here, this may belong on #kolide)
e
Via kolide launcher with autoupdate enabled
It seems that after running a Query on windows_eventlog table (without where channel) the subsequent queries starts to fail or take too long
a
Hello Esteban! Could you please share us the query you are using? (cc @Akshay Kumar)
s
Are you using Kolide’s update servers, or your own? I only pushed 4.5.1 to stable 3 hours ago….
And to clarify a bit here, the nodes update. Are running 4.5.1, and then a query to the eventlog table renders them unresponsive?
That table is new to 4.5.0, isn’t it?
a
I suspect that the table doesn't play well without a WHERE clause. It will pull in too much data, causing the watcher to kill osquery
Regardless, I think the table should reject queries without a
WHERE channel =
clause and also set a configurable limit of events that can be read from the event source
s
Yeah, that’s a good guess.
Pretty sure I called out something like that in the PR. It’s one of those boil-the-ocean sorts of things
e
I'm only running select * from windows_eventlog. After that i run any other Query like select * from osquery_info and everything stops working
I'm using Kolide servers, i've setup up an installer with kolide launcher and package builder with update channel on stable. The thing i noticed is that i've configure the osquery version to 4.5.0 and update channel to stable, and when the package finishes installing it autoupdates to 4.5.1
s
I think the Kolide part isn’t a factor here. But yes, if you point launcher’s autoupdate at our servers, you will get what we release as stable. Doesn’t matter what you build the package with. It will downgrade or upgrade as needed.
I suspect
select * from windows_eventlog
is the issue. That attempts to read most events into ram, and likely cause issues. Though I thought it had a required
channel
and
xpath
Though it’s a bit weird and complicated)
I’m not sure if it got documented well. You could take a look at the discussion and code in https://github.com/osquery/osquery/pull/6563 where it merged
There may well be a bug in that table
e
I've just tested it with
WHERE
clause and it "crashes", i usually do that way and works fine but i've never tested it with
xpath
Maybe quering by channel and event id?
Ok, filtering by channel and eventid also kills osquery on the host apparently. It's a Windows host so i don't know a proper way to debug it.
Limiting by 3 or 5 works also
s
Hrm. You said you don’t have access to a machine to debug this locally?
I’m breaking out some test cases…
e
No, i have access indeed, i don't know how to debug the service on the Windows machine
s
What happens if you run osquery on the command line?
Testing in my environment, I cannot replicate this. If I run
select * from windows_eventlog
, I get back an error. My device is not hung,
Though if I query for
select * from windows_eventlog where channel = "Security"
it’s taking a long time to return these results.
I suspect this is not an osquery bug. There may or may not be a bug somewhere in the kolide stack. r this might be expected behavior around a table with this many rows. Not sure yet
Might be a launcher bug in how this stuff is marshalled and send over the wire. Feel free to open a bug in the launcher repo, though I don’t know if I can prioritize it
e
Yes, yesterday it happened the same to me. I Will try it on the host's cli, is any way to Open a CLI for hosts installed with package builder ? Or you talking about fleetctl CLI?
s
Neither. Running osquery from powershell is sometimes a good way to debug
e
Understood
The Query without Where clause it's returning something for You?
a
The
windows_eventlog
query without WHERE clause should return error log. It requires a channel or xpath to query the events. Also if you are querying the events without other constraints, it may take longer time depending on the number of events in the
security
or
Application
channel.
e
Querying with WHERE clause also hangs up subsequent queries, also the Query takes too long to return something
s
I think I replicated what you’re seeing. My guess is there’s something weird in launcher about marshalling that much data. As said, feel free to open an issue in the launcher repo, though I don’t know if I can prioritize it
Actually, I’ll open one
a
The Query with WHERE clause may take time because of large number of events log in a specific channel. I am planning to add a
max_windows_eventlog_events
flag with the query as suggested by @alessandrogario. This will reduce the query time.
s
I don’t think it’s query time. I think it’s data returned
a
Is it the size of data or some specific event data is causing the problem?
e
For example querying with limit 3 or 5 works fine.