Osquery uses basic SQL commands to leverage a relational data-model to describe a device.

osquery

<@U51GTKKCK>, that's correct, there was no "watchdog_utilization_limit" originally, it was just a "level" where the default was normal

<@U09M563C7> - interesting, so with 2.10, if I set that to, `100`, what does that mean in terms of how long the worker can consume X% of for the CPU before being killed?

Good question, it’s a bit obtuse I’ll apologize for that up front. But it’s the count of allowed CPU cycles between 3s intervals. 3s is what most programs use to calculate utilization so we adopted that too. 

I’d have to look and see what 100 is exactly, but iirc it’s about 65% of a core

Interesting, so, would that mean 65% of a core for 12 seconds before watchdog kills the worker? I’m looking at <https://github.com/facebook/osquery/blob/2.10.0/osquery/core/watcher.cpp#L333-L346>  but this logic has me horribly confused

yes, CPU utilization is a confusing topic

maybe we can alter the code to fit your use case

Currently, `osqueryd` is having to deal with thousands of `connect` and `execve` syscalls for the process/socket_event tag auditing. However, watchdog keeps killing the worker since it spikes to 100% CPU usage for a while (the events come in waves). So, I’m trying to give it some breathing room to actually get through the audit events.

It’s not a _great_ fix, since it still causes the CPU spikes but it’s better than the worker constantly spinning up, taking 100%, dying, and that processes continuing forever

Maybe there’s a better way to pull off what I’m trying to accomplish. I tried setting the rate limit via `auditctl -r` but, it doesn’t seem to actually have an effect from what I can see.

The other idea I just had would be to use cgroups to restrict the CPU usage allowed by osquery. Although, I think the ‘root’ of the issue is it’s kernel audit rules attempting to get all of the `connect`/`bind`/`execve` syscall events.

<@U0JA5UETV> and <@U6EFFT5FG> may have ideas for audit specifically

Yeah, I’m actually working with <@U0JA5UETV> to sort through some of the performance issues we’re seeing w/ audit &amp; osquery.

<@U09M563C7> - Do you disable watchdog that and rely on cgroups to restrict it entirely or use them both?

If you’re testing audit, use the 3.0.0 flag

This is actually our production environment w/ 2.10 although I’m looking forward to the audit rewrite work as well