Hello, everyone! I'm new to osquery and have a que...
# general
a
Hello, everyone! I'm new to osquery and have a question. I'm getting a
SIGSEGV
in osquery::EventSubscriberPlugin::generateRows() method, to be more precise in
std::__1::__tree_min()
called from it. So to me it looks like the map iterator got invalidated while we were traversing the
context.event_index
map. I started looking for methods that could modify the
context.event_index
and found several places: 1. erase in osquery::EventSubscriberPlugin::removeOverflowingEventBatches() 2. erase in osquery::EventSubscriberPlugin::expireEventBatches() 3. assignment in EventSubscriberPlugin::generateEventDataIndex() In first two we acquire write lock on
context.event_index_mutex
to protect the map here and here. The third one has no locking at all, but as far as I understand it is called only once at setup and could not conflict with other accesses. Then with gdb I confirmed that
generateRows()
is called from
"SchedulerRunner"
thread and
removeOverflowingEventBatches()
with
expireEventBatches()
are called from the other thread (it has no name but looks like it is responsible for event processing). This seems to be the recipe for disaster because
generateRows()
can iterate the map while elements are being removed from it. So my question is: do we need read lock in
generateRows()
and other methods that access
context.event_index
or is there a more complex thread coordination mechanism that prevents such problems and I should look for root cause of crash somewhere else?
I'm using 5.2.3 but all links are to code in master because I didn't find any changes that could affect that behaviour.
s
good find, to me it seems that it has been forgotten, and a ReadLock is needed, so it's a bug.
a
Created issue: #8076 And PR: #8077