I tried for a while say with a 15 sec pack refresh...
# core
m
I tried for a while say with a 15 sec pack refresh interval.. i could see events getting logged till i didn't stop osquery. Approximately 10-15 iterations of launching process that generates event and shows up in the log approx 15 sec later
j
Hi @manu, doing some more tests on my side, it seems (need some more testing to confirm) that disabling the flag
event_optimize
events start triggering again... in your test did you have 4.7.0 +
event_optimize
enabled? Also, do you (or anybody else) know what is the drawback of disabling
events_optimize
?
t
The drawback is that your scheduled queries may produce duplicate events. The
*_events
tables are "buffers" of events published by the operating system and the scheduled queries essentially list those buffers. When
--event_optimize
is enabled the "list" behavior turned into a "drain" behavior. The expectation is that a scheduled query will not return the same buffered event twice.
j
Will not setting
--events_expiry=1
give me a similar behavior?
m
Are u running with this much low
events_expiry
? I checked the logs you shared in github issue that seems to indicate the same. This might lead to older events from not being picked up in the queries being run.
Copy code
I0408 15:52:27.529646 20352 eventsubscriberplugin.cpp:438] Found 26 events for subscriber WindowsEventLogPublisher.windows_events
I0408 15:52:27.533550 20352 eventfactory.cpp:351] Subscriber expiration is too low: windows_events
Can you try setting this flag to a reasonable value based on the schedule interval of the queries in your pack and try out again ? Basically you want this value to be set appropriately based on the load of events on the system. https://osquery.readthedocs.io/en/stable/development/pubsub-framework/
Copy code
The buffered events will eventually expire! The --events_expiry flag controls the lifetime of buffered events. This is set to 1 day by default, this expiration occurs when events are selected from their subscriber table. For example: the process_events subscriber will buffer process starts until a query selects from this table. At that point all results will be returned and immediately after, any event that happened time-86400 seconds ago will be deleted. If you select from this table every second you will constantly see a window of 1 day's worth of process events.
j
I will give it a try increasing events_expiry and see what is the outcome. We configured it as 1 based in https://osquery.readthedocs.io/en/stable/deployment/process-auditing/#osquery-events-optimization
Maybe i misunderstood the text there, but setting expiry to 1 seemed to achieve what i wanted (no duplicate events)
t
I looked into this last night and I think there's a regression introduced during the events code refactoring, within the logic here: https://github.com/osquery/osquery/blob/master/osquery/events/eventsubscriberplugin.cpp#L234 My hypothesis is the optimization logic is being applied to the time between configuration updates, and instead it should be dynamically calculating the max delta between scheduled queries targeting subscribers.
@Juan Alvarez do you have multiple scheduled queries using the
windows_events
table?
j
I do. They are in the description of the github issue, and are the following:
Copy code
SELECT *, "events_windows." || lower(source) || "." || eventid as __devoSubTag FROM windows_events WHERE source IN ('Application', 'Setup', 'Security', 'System');
    SELECT *, "events_windows.powershell" || "." || eventid as __devoSubTag FROM windows_events WHERE source IN ('Microsoft-Windows-PowerShell/Operational', 'Microsoft-Windows-PowerShell/Admin');
    SELECT *, "events_windows.other_sources" || "." || eventid as __devoSubTag FROM windows_events WHERE source Not IN ('Application', 'Setup', 'Security', 'System', 'Microsoft-Windows-PowerShell/Operational', 'Microsoft-Windows-PowerShell/Admin');
Basically they just depend on the
source
field
t
I think there's a bug here, @alessandrogario for awareness. I can try to find time this weekend to look through how it was implemented before the refactors (the refactors were awesome and made the code a lot simpler).
Nvm, if this is a bug, it has always existed. I’m trying to write a small test case now.
Well, I created some small tests and double checked the logic in the code and I've changed my mind. I don't see a bug here.
j
Interesting, however 4.6.0 works like a charm with my actual settings while 4.7.0 does not. If I understood correctly, configuring events_expiry to 1 makes no sense when having events_optimize enabled isn’t it?
t
Did you re-test 4.6.0 and confirm that it still works when event_optimize is enabled?
Ok, I understand why you are not seeing duplicated results with
--events_optimize=false
. This is because the scheduler is treating the query like a normal differential query and it is only logging the "new" events. This works for you now, but in the background osquery is doing a lot of comparison work to determine what events it should log. This is going to cause osquery to use a lot of system resources after a while.
I am still investigating what might have caused you to miss events with
--events_optimize=true
(the default). I would like to remove this flag in the future and make this default behavior the only behavior.
j
@theopolis Thanks for the investigation. I may not have explained myself correctly, but in the 4.6.0 environment that we are using we have
events_optimize=true
and
events_expiry=1
. This has been working fine up until now. When i tried to test 4.7.0 i found out that i was missing almost all events with the same settings. When i was testing, i decided to set
events_optimize=false
and everything starting working properly. I understand that disabling
events_optimize
may not be the best idea, so i am testing now
events_optimize=true
with a bigger
events_expiry
as previously suggested. It seems to work, but i need to get some more testing done.
t
got it, thanks
j
I have been testing a bit more with
events_expiry=300
and
events_optimize=true
. It seems that this combination works properly for me. However, the docs are a bit confusing when comparing what i read in https://osquery.readthedocs.io/en/stable/deployment/process-auditing/#osquery-events-optimization or in https://osquery.readthedocs.io/en/stable/installation/cli-flags/#events-control-flags ... From what i understood about
events_optimize=true
in this thread and in https://osquery.readthedocs.io/en/stable/development/pubsub-framework/ , an event will be only returned once, so the value set in
events_expiry
should not actually have an impact on the query as the events are cleared inmediately as per the doc:
This allows each subscriber to return the exact window of the schedule and delete buffered events immediately. This saves the most memory and disk usage possible while still allowing flexible scheduling.
t
Right, it should not have an impact. My hypothesis is there is a bug within the events refactoring that landed between 4.6.0 and 4.7.0 and we need to find it 🙂
My challenge is that I have not been able to reproduce the scenario locally. What I need is concrete information "this event with this time x was missed" and "this event with this time y was not", along with a trace of when the scheduled queries are running. The important part is a log of the "ground truth" of what events are happening. My next step is writing some simulation code that publishes fake events and records that trace.
I am doing that while also comparing the unit tests to the implementation code and looking for parts of the code that are untested. This is where the bug is going to show up.
j
But, you mentioned that you wanted to have
events_optimize=true
as the only behavior. Would not that make the
events_expiry
flag obsolete?
I am mentioning it, bc if
events_expiry
should not be impacting the queries, just setting it to a different value than 1 is a valid workaround until that happens
anyway, im willing to get more info on what is going on and update it in the github issue. I will do later today hopefully.
I have added some more info in the ticket. Let me know whatever other info you are looking for.
t
I'm pretty sure I found the bug https://github.com/osquery/osquery/pull/7055
🍻 2