In your situation I recommend setting a CPUQuota f...
# general
t
In your situation I recommend setting a CPUQuota for the systemd service to limit osquery’s usage to something reasonable.
4.5.1 implements this but you can deploy the quota now in your configuration management system.
🎉 1
You can follow the example here https://github.com/osquery/osquery/pull/6644
p
Thanks !! Does this work like CGroups, from the docs, it looks so.
Although, need to check with the infra team, one pain with corporate env, is to get them to move their a** to a newer version of anything, more so when they use a golden image to spawn everything in the world.
z
I'm running into perf issues as well, is this the preferred method to constraining resources instead of
watchdog
?
t
I recommend cpuquota in addition to using the watchdog.
p
Hey @theopolis, In order to use a 20% cpu quota, there is a possibility that osquery is pegged at 20% of usage and cgroup counters would be happy. But then watchdog would also see a 100% usage right since it would be getting the view of the cgroup. In that case, the watchdog must run as either -1 or not run at all. This poses a situation, if the watchdog is running at -1, the memory limit is 10000M which can trigger an OOM killer. We would also have to apply a
CPUQuota
and
MemoryLimit
to the service and disable the watchdog. Which could cause the service to potentially exit. if the watchdog is running at any other limit, restrictive or normal, osquery would get respawned due to cgroup view of the cpu usage.
z
Is the implication that there'd be a race condition in which the osquery service would keep flapping because of how
watchdog
interacts with the osquery daemon service?
t
I understand the theory but I’m not sure this occurs in practice. I’ve run a pegged osquery by stressing a system with 200 process starts per second and the default watchdog never killed the process. The process did hover at 20% of a core as expected.
z
Are logs generated whenever watchdog kills an osqueryd process, or is that verbosity I need to enable in the config?
p
@Zach Zeid If you run osqueryd at
--verbose
you can see watchdog killing osquery.
Copy code
Oct 06 05:09:23 <hostname> osqueryd[17003]: osqueryd worker (17769) stopping: Maximum sustainable CPU utilization limit exceeded: 12
z
ok, but it'll only show up if I pass
--verbose
in the flags file or something?
p
@theopolis If that’s the case, then it’s good.. PS: I was running osquery on a test box today and it dropped close to 15k logs per second.. I was watching the dropped count in the
auditctl -s
advance over a 10 minute period..
Yes Zach.. either run it in debug mode or just enable logs in the config when running as a deamon.
Copy code
"logger_min_status": 1
t
Yeap as you mentioned, there’s a lot of noise in audit and the real solution is merging the bpf approach and making sure that is stable.
p
would bpf be able to process the complex file paths logic in the limited 4096 instructions.. I am assuming that we are not changing this limit.
z
looking at
osquery_schedule
are the
_time
columns in nanoseconds?
and
average_memory
is in bytes?
or does it rely on
time
and everything is in seconds?
p
so afaict it's doing a diff between the cycles of when the query starts and ends, and adds that diff to a counter
but it still doesn't tell me if that diff is calculating in seconds or other