Divya
04/04/2022, 12:43 PMLinda Zhou
04/04/2022, 3:46 PMevents_expiry
flag controls the lifetime of buffered events.
https://osquery.readthedocs.io/en/stable/development/pubsub-framework/#query-and-table-usageevents_expiry
flag controls the lifetime of buffered events.
By default, it's set to 1 day. Have you invoked any further optimizations?
https://osquery.readthedocs.io/en/latest/development/pubsub-framework/#query-and-table-usageStefano Bonicatti
04/04/2022, 10:01 PMevents_expiry
value is used (in scheduled queries) only when the value is higher than an internal calculation which is per evented table and takes all the scheduled queries intervals on evented tables, for each evented table takes the maximum interval, multiplies it by 3 and then goes to the next multiple of a minute.
If that values is higher than the events_expiry
, then it will use that calculated value, otherwise events_expiry
will be used.
Practical example:
You have 3 queries, 2 on hardware_events
and 1 on process_events
, the queries intervals are 30, 45 and 60 seconds respectively.
Now between 30 and 45, 45 is chosen. Then 45 * 3 = 135. Then the next multiple of 60 is 180, so it is set to 180.
For the harware_events
table, any value of events_expiry
lower than 180 won't be taken.
For the process_events
table the value ends up being the same because 60 * 3 = 180.
Finally you can see in the logs which one is used because you should have a message like this when the events_expiry
value has not been used:
I0404 23:53:50.431721 865360 eventfactory.cpp:352] The minimum events expiration timeout for hardware_events has been adjusted: 180
Linda Zhou
04/05/2022, 1:52 AMevents_expiry
, can I raise a PR to update the docs?Divya
04/05/2022, 5:02 AMevents_expiry
not get respected?
Another general question, I am confused as to when to use events_max
as well. From a few posts I have read, I have an understanding looks like both the flags should be used together for the expiry to work. Is this true?Stefano Bonicatti
04/05/2022, 9:03 AMSELECT
is only for scheduled queries.
An additional detail though is that expiration happens after all the queries against a table have been executed.
Obviously scheduled queries go into a infinite loop, so everytime all the queries against an evented table have been run, expiration of old events happens.
But there are also other two places where events are expired.
One is at osquery startup, the other is when new events are being added. Every 256 event batches (more on batches later), expiration of old events is triggered
Then there's events_optimize
, which is described as apply optimizations when SELECTing from events-based tables, enabled by default.
This is true, but misses details like the fact that it only works with scheduled queries again, and how it works is that for each scheduled query, it keeps track of the event tiime of the most recent queried event, so that if that same scheduled query runs again, then only newer events are returned.
This might be maybe confused with event expiration, but events in the database are still there, so if a different query on the same table runs, it will return "old" events again, once.
For events_max
, the default is actually 50k and they are not events as in rows, but event batches.
Depending on how an event publisher has been written and depending on how many events were being collected by the publisher at a certain point in time, multiple events could be written, as an optimization, as a single batch.
So what that values exactly means is a bit variable today.. it doesn't always map to 50k events.
Expiration of event batches that have gone beyond the limit happens at startup or when new events come (every 256).events_max
which is not a must for event expiration to work.
But if you're using a distributed query mechanism then events expiration (of old events, or events that have gone beyond the max threshold), will happen only if events keep coming, or if you restart osquery.
So something like hardware_events
might keep its events for a very long time, beyond what was configured in events_expiry
SELECT
does work for non scheduled queries, but if there's a scheduled query on the same table too, then expiration will happen only if that scheduled query runs.
Meaning that if you're in the shell, running queries against a table that also has a scheduled query, given that in the shell the scheduler does not run, then events will never expire.events_max
is not needed for events expiration, it's just another way to control the amount of events in the database.
As for the osquery version, there was an additional fix in 4.9.0 which was preventing expiration when new events would come.Divya
04/05/2022, 10:24 AMStefano Bonicatti
04/05/2022, 10:53 AMSELECT
not happening, especially if there's no scheduled query sounds like a bugDivya
04/05/2022, 11:03 AMStefano Bonicatti
04/05/2022, 12:12 PMDivya
04/05/2022, 1:06 PMLinda Zhou
04/05/2022, 7:41 PMStefano Bonicatti
04/05/2022, 7:45 PMDivya
04/19/2022, 11:38 AMDeepak
04/20/2022, 7:12 AMStefano Bonicatti
04/20/2022, 7:16 PM