https://github.com/osquery/osquery logo
Title
t

theopolis

10/06/2020, 5:09 PM
In your situation I recommend setting a CPUQuota for the systemd service to limit osquery’s usage to something reasonable.
4.5.1 implements this but you can deploy the quota now in your configuration management system.
🎉 1
You can follow the example here https://github.com/osquery/osquery/pull/6644
p

Prateek Kumar Nischal

10/06/2020, 5:38 PM
Thanks !! Does this work like CGroups, from the docs, it looks so.
Although, need to check with the infra team, one pain with corporate env, is to get them to move their a** to a newer version of anything, more so when they use a golden image to spawn everything in the world.
z

Zach Zeid

10/06/2020, 6:10 PM
I'm running into perf issues as well, is this the preferred method to constraining resources instead of
watchdog
?
t

theopolis

10/06/2020, 6:23 PM
I recommend cpuquota in addition to using the watchdog.
p

Prateek Kumar Nischal

10/07/2020, 10:55 AM
Hey @theopolis, In order to use a 20% cpu quota, there is a possibility that osquery is pegged at 20% of usage and cgroup counters would be happy. But then watchdog would also see a 100% usage right since it would be getting the view of the cgroup. In that case, the watchdog must run as either -1 or not run at all. This poses a situation, if the watchdog is running at -1, the memory limit is 10000M which can trigger an OOM killer. We would also have to apply a
CPUQuota
and
MemoryLimit
to the service and disable the watchdog. Which could cause the service to potentially exit. if the watchdog is running at any other limit, restrictive or normal, osquery would get respawned due to cgroup view of the cpu usage.
z

Zach Zeid

10/07/2020, 12:40 PM
Is the implication that there'd be a race condition in which the osquery service would keep flapping because of how
watchdog
interacts with the osquery daemon service?
t

theopolis

10/07/2020, 12:46 PM
I understand the theory but I’m not sure this occurs in practice. I’ve run a pegged osquery by stressing a system with 200 process starts per second and the default watchdog never killed the process. The process did hover at 20% of a core as expected.
z

Zach Zeid

10/07/2020, 12:58 PM
Are logs generated whenever watchdog kills an osqueryd process, or is that verbosity I need to enable in the config?
p

Prateek Kumar Nischal

10/07/2020, 1:11 PM
@Zach Zeid If you run osqueryd at
--verbose
you can see watchdog killing osquery.
Oct 06 05:09:23 <hostname> osqueryd[17003]: osqueryd worker (17769) stopping: Maximum sustainable CPU utilization limit exceeded: 12
z

Zach Zeid

10/07/2020, 1:12 PM
ok, but it'll only show up if I pass
--verbose
in the flags file or something?
p

Prateek Kumar Nischal

10/07/2020, 1:13 PM
@theopolis If that’s the case, then it’s good.. PS: I was running osquery on a test box today and it dropped close to 15k logs per second.. I was watching the dropped count in the
auditctl -s
advance over a 10 minute period..
Yes Zach.. either run it in debug mode or just enable logs in the config when running as a deamon.
"logger_min_status": 1
t

theopolis

10/07/2020, 1:19 PM
Yeap as you mentioned, there’s a lot of noise in audit and the real solution is merging the bpf approach and making sure that is stable.
p

Prateek Kumar Nischal

10/07/2020, 1:24 PM
would bpf be able to process the complex file paths logic in the limited 4096 instructions.. I am assuming that we are not changing this limit.
z

Zach Zeid

10/07/2020, 4:02 PM
looking at
osquery_schedule
are the
_time
columns in nanoseconds?
and
average_memory
is in bytes?
or does it rely on
time
and everything is in seconds?
p

Prateek Kumar Nischal

10/07/2020, 7:56 PM
so afaict it's doing a diff between the cycles of when the query starts and ends, and adds that diff to a counter
but it still doesn't tell me if that diff is calculating in seconds or other