Hi all, reviewing the resource protection mechanisms available in 4.5.1, and the fact that OSquery now sets CPUQuota to 20% time of one cpu max. upon install, isn't there a conflict with the CPU watchdog default config (25% CPU time for 9 secs) that would prevent it to fire and kill runaway queries?
10/22/2020, 8:51 PM
Yes and no, the watchdog kills the osquery process based on number of cycles taken, not the % of CPU time and it usually nets out to about 10 or 13%. It does not know the number of CPUs or what cycles means in terms of load measurements.
It is very possible for the watchdog to still kill osquery when cgroups is limiting to 20% CPU.
+ The goal of the watchdog is to keep osquery from having a negative impact on the system not to prevent runaway queries.
Is there a specific problem you are encountering?
10/23/2020, 8:03 AM
Actually it's not about a specific issue, I just was trying to understand how the different protection mechanisms were fitting together and came across that ambiguity about the expression of the limits/thresholds. The phrasing of the documentation is a bit confusing in that respect. Thanks for the clarification!
10/23/2020, 3:13 PM
Ah, yes the documentation should be improved. But you bring up a good point about how to deal with runaway queries. I was considering how to handle a query that gets stuck for some reason but does not use a lot of CPU. I think we can implement a timeout for queries to handle those cases. I am not sure if that would be helpful or fix problems vs. cause more debugging problems.