I’m having some weird behaviour on (windows) event based queries and I need some help troubleshooting.
I have a Fleet manager with one windows host. The thing is that if I enable a pack with 2 generic (SELECT * …) event based queries (windows_events schema) sometimes, osquery begins to increase memory usage without stop.
I’ve tested it in osquery 4.9.0 and pre-release 5.1.0.
In osquery 4.9.0, it consumes memory without limit, but in 5.1.0 graphic shows a similar behaviour (stacking memory) with the difference that the watchdog does its work and stop the worker when reach the configured memory limit.
I want help to check if this memory increase is caused by any kind of leak or if I’m missing something and can be solved just optimizing osquery configuration/queries (and how).
(Graphic shows Private Bytes in KBs/time)
12/22/2021, 7:46 PM
Hello! So in osquery version < 5.0.1 the watchdog was not taking into account the private bytes (so live memory + paged out), but only the live one, so that’s the reason for the difference.
Now for the increased usage in memory, it highly depends on how many events osquery is handling.
One way to see if the memory increase seems to stop at a certain point or not is to actually disable the watchdog
, or you can progressively increase the memory limit with
12/22/2021, 8:03 PM
In this case, events are a little less than 200/sec
I’ll try to run it without watchdog to check if memory stabilizes at some point and I’ll leave here the feedback. But the case is that behaviour seems to occur once every X so that’s why it has somewhat clueless…
Anyway, there is any way to optimize memory usage for event based queries?
I mean, if consumption stabilizes in (for example) about 3 GBs of memory is a pretty high consumption for some hosts.
12/22/2021, 8:56 PM
I’m less familiar with the event publishers on Windows but I don’t see flags that could be used to slow down/throttle something.
I think that would need some source code changes.
Although here I’m making assumptions. It depends on what is doing the allocation; if it’s simply due to the traffic, then a code change might be needed to introduce throttling when desired.
If the allocations come from when the query runs and so the preparation of the table rows, then querying more often so that there’s less data to process at once can help.