In your situation I recommend setting a CPUQuota for the sys osquery #general

In your situation I recommend setting a CPUQuota f...

theopolis

10/06/2020, 5:09 PM

In your situation I recommend setting a CPUQuota for the systemd service to limit osquery’s usage to something reasonable.

theopolis

10/06/2020, 5:09 PM

4.5.1 implements this but you can deploy the quota now in your configuration management system.

🎉 1

theopolis

10/06/2020, 5:11 PM

You can follow the example here https://github.com/osquery/osquery/pull/6644

Prateek Kumar Nischal

10/06/2020, 5:38 PM

Thanks !! Does this work like CGroups, from the docs, it looks so.

Prateek Kumar Nischal

10/06/2020, 5:39 PM

Although, need to check with the infra team, one pain with corporate env, is to get them to move their a** to a newer version of anything, more so when they use a golden image to spawn everything in the world.

Zach Zeid

10/06/2020, 6:10 PM

I'm running into perf issues as well, is this the preferred method to constraining resources instead of

watchdog

theopolis

10/06/2020, 6:23 PM

I recommend cpuquota in addition to using the watchdog.

Prateek Kumar Nischal

10/07/2020, 10:55 AM

Hey @theopolis, In order to use a 20% cpu quota, there is a possibility that osquery is pegged at 20% of usage and cgroup counters would be happy. But then watchdog would also see a 100% usage right since it would be getting the view of the cgroup. In that case, the watchdog must run as either -1 or not run at all. This poses a situation, if the watchdog is running at -1, the memory limit is 10000M which can trigger an OOM killer. We would also have to apply a

CPUQuota

and

MemoryLimit

to the service and disable the watchdog. Which could cause the service to potentially exit. if the watchdog is running at any other limit, restrictive or normal, osquery would get respawned due to cgroup view of the cpu usage.

Zach Zeid

10/07/2020, 12:40 PM

Is the implication that there'd be a race condition in which the osquery service would keep flapping because of how

watchdog

interacts with the osquery daemon service?

theopolis

10/07/2020, 12:46 PM

I understand the theory but I’m not sure this occurs in practice. I’ve run a pegged osquery by stressing a system with 200 process starts per second and the default watchdog never killed the process. The process did hover at 20% of a core as expected.

Zach Zeid

10/07/2020, 12:58 PM

Are logs generated whenever watchdog kills an osqueryd process, or is that verbosity I need to enable in the config?

Prateek Kumar Nischal

10/07/2020, 1:11 PM

@Zach Zeid If you run osqueryd at

--verbose

you can see watchdog killing osquery.

Copy code

Oct 06 05:09:23 <hostname> osqueryd[17003]: osqueryd worker (17769) stopping: Maximum sustainable CPU utilization limit exceeded: 12

Zach Zeid

10/07/2020, 1:12 PM

ok, but it'll only show up if I pass

--verbose

in the flags file or something?

Prateek Kumar Nischal

10/07/2020, 1:13 PM

@theopolis If that’s the case, then it’s good.. PS: I was running osquery on a test box today and it dropped close to 15k logs per second.. I was watching the dropped count in the

auditctl -s

advance over a 10 minute period..

Prateek Kumar Nischal

10/07/2020, 1:14 PM

Yes Zach.. either run it in debug mode or just enable logs in the config when running as a deamon.

Copy code

"logger_min_status": 1

theopolis

10/07/2020, 1:19 PM

Yeap as you mentioned, there’s a lot of noise in audit and the real solution is merging the bpf approach and making sure that is stable.

Prateek Kumar Nischal

10/07/2020, 1:24 PM

would bpf be able to process the complex file paths logic in the limited 4096 instructions.. I am assuming that we are not changing this limit.

Zach Zeid

10/07/2020, 4:02 PM

looking at

osquery_schedule

are the

_time

columns in nanoseconds?

Zach Zeid

10/07/2020, 4:02 PM

and

average_memory

is in bytes?

Zach Zeid

10/07/2020, 5:28 PM

or does it rely on

time

and everything is in seconds?

Prateek Kumar Nischal

10/07/2020, 7:56 PM

https://github.com/osquery/osquery/blob/master/osquery/tables/utility/osquery.cpp#L251-L259 This might be a good place to start looking for it

Zach Zeid

10/07/2020, 8:06 PM

looks like it's seconds? https://github.com/osquery/osquery/blob/6d57dc8066031b3859a8e1da0627740150d5a24d/osquery/core/sql/query_performance.h

Zach Zeid

10/08/2020, 1:01 PM

so afaict it's doing a diff between the cycles of when the query starts and ends, and adds that diff to a counter

Zach Zeid

10/08/2020, 1:01 PM

but it still doesn't tell me if that diff is calculating in seconds or other

4 Views

Open in Slack

Previous Next