Title
#general
Macear

Macear

02/12/2021, 2:13 AM
And I have another question: when I execute distributed query against osquery_schedule table, I see that on some servers there aren’t executions of scheduled queries. But on the other hand I get query results by syslog which means that query actually executed. Also I see many memory/cpu thresholds violations and osquery workers stopping. I have only two scheduled queries agains process_events and file_events tables. Why? What is the reason, what to look for?
zwass

zwass

02/12/2021, 2:16 AM
Do those queries show as
denylisted
in
osquery_schedule
? That seems likely given the watchdog issues you describe.
2:17 AM
What is the interval on those queries? Possibly setting a lower interval will mean the resource spikes at query time will be smallers and the queries will be able to complete.
Macear

Macear

02/12/2021, 2:48 AM
The intervals are 60s. They aren’t denylisted. I have default watchdog limits. Do I need to increase default limits? If so, are there any recommendations about what optimal common (more or less) thresholds are?
zwass

zwass

02/12/2021, 2:49 AM
It really depends on your environment and the observability <> performance tradeoffs you are willing to make. I discuss performance some in https://dactiv.llc/blog/osquery-performance-at-scale/ though you might already know most of what's in there.
2:51 AM
I would consider dropping the interval to 10s... 6x fewer events per query run could mean ~6x lower resource usage. Of course this will happen 6x more often. But the point is to smooth out the load spikes so that osquery stays under the watchdog limits.
2:51 AM
This strategy assumes you are using event-based tables and you have
events_optimize
turned on (the default).
Macear

Macear

02/12/2021, 2:52 AM
Yes, events_optimize is on
2:55 AM
What depends on cpu and memory utilisation of queries like “select * from process_events;” What do I need to take into consideration? Does the number of running processes affect? I suppose so, but nevertheless I will be more confident if you assure of it
2:57 AM
And also want to ask if the query profiler (from the GitHub repo) shows me resource usage by *_events queries?
zwass

zwass

02/12/2021, 2:58 AM
Keep in mind that any
_events
table will be constantly recording data into osquery's local store and the data will be read from the store when the query is run on the schedule.
2:58 AM
This means that the longer the interval for an
_events
table, the more data is processed when the query runs. For a regular table all of the data is generated at query time, so the interval has no effect.
2:59 AM
The profiler won't be able to give you any helpful information on event-based tables because there won't be any events in the buffer when the profile is run. Essentially it will just look like the query is very efficient because it's not processing any data.
Macear

Macear

02/12/2021, 3:00 AM
Ok, that’s a good piece of information. Thanks. Would it affect performance if I set the interval less than 60s? What is the possible minimum recommended options for interval setting?
zwass

zwass

02/12/2021, 3:03 AM
Let's say that your machine is generating 100 events per second. If you run the query every 10 seconds, the query process 1000 events each time. If you run it every 60 seconds, it process 60000 events each time. It's the same number of events, but the spike in resource usage is much smaller when the interval is lower.
Macear

Macear

02/12/2021, 3:03 AM
To clarify a little about my conf:1) events_expiry is 1 2) events_max is 50000
zwass

zwass

02/12/2021, 3:04 AM
I don't have any hard numbers but my instinct would be if you go below 5 seconds or so you will start to have enough overhead that it might not gain you much.
Macear

Macear

02/12/2021, 3:07 AM
Ok, so for now I will try to decrease the interval step by step and notice on how it changes the resource usage on servers. Also I try to increase watchdog limits. Thanks a lot for such a quick reply on my initial question and further help 👍
zwass

zwass

02/12/2021, 3:09 AM
Yeah good luck! Let us know how it goes 🙂
f

Francisco Huerta

02/12/2021, 6:26 PM
@zwass @Macear interesting discussion. what would be your take for the config refresh interval? is 10s reasonable here as well or too aggresive (i.e., a configuration update request forces a full download in each cycle or only whenever there is a change in it)?
zwass

zwass

02/12/2021, 6:38 PM
For config refresh I would guide it by what your needs are. Do you need a new scheduled query or a change to a scheduled query to be picked up within 10 seconds by an online host? My feeling would be you probably don't. In most environments 10 minutes, an hour, or even a day could be just fine for those. You still have live queries if you need anything quickly.
6:39 PM
The config is going to be fully downloaded each cycle. It's just some JSON so this is not a crazy amount of data, but still this is network traffic you are generating and a slight bit of load on both the osqueryd client and the server.
f

Francisco Huerta

02/12/2021, 6:41 PM
Thanks much @zwass. Those were kind of my assumptions too and indeed I don't think an immediate refresh of the config is needed that shortly. I see the benefit of more frequent updates when packs are being fine tuned, but then probably a 1h or more might make more sense.
zwass

zwass

02/12/2021, 7:03 PM
Yeah this makes sense to me.
Macear

Macear

02/15/2021, 5:31 PM
For now, some interim results.. I’ve decreased the interval of queries against *_events tables from 60s down to 10s. The number of hosts where limit violations are observed decreased by half. On several hosts the issue remains, and I noticed that 200mb is too low memory limit for them. I’m on the way of deploying new osquery configuration with increased watchdog limits. Then I’ll see if it helps.