Anton Kuchin
07/07/2023, 4:11 PMSIGSEGV
in osquery::EventSubscriberPlugin::generateRows() method, to be more precise in std::__1::__tree_min()
called from it. So to me it looks like the map iterator got invalidated while we were traversing the context.event_index
map.
I started looking for methods that could modify the context.event_index
and found several places:
1. erase in osquery::EventSubscriberPlugin::removeOverflowingEventBatches()
2. erase in osquery::EventSubscriberPlugin::expireEventBatches()
3. assignment in EventSubscriberPlugin::generateEventDataIndex()
In first two we acquire write lock on context.event_index_mutex
to protect the map here and here.
The third one has no locking at all, but as far as I understand it is called only once at setup and could not conflict with other accesses.
Then with gdb I confirmed that generateRows()
is called from "SchedulerRunner"
thread and removeOverflowingEventBatches()
with expireEventBatches()
are called from the other thread (it has no name but looks like it is responsible for event processing). This seems to be the recipe for disaster because generateRows()
can iterate the map while elements are being removed from it.
So my question is: do we need read lock in generateRows()
and other methods that access context.event_index
or is there a more complex thread coordination mechanism that prevents such problems and I should look for root cause of crash somewhere else?Anton Kuchin
07/07/2023, 4:17 PMStefano Bonicatti
07/07/2023, 4:30 PM