HarlanF
11/04/2021, 8:31 PMRegistering extension
line in the osqueryd.INFO file). Then, less than 24h later, 1,000 of the fleet starts reporting "Error executing <pack>: no such table: <table>". An osqueryd restart fixes it. Ideas?seph
.connect
osqueryi is a separate process and will spawn its own extensions.HarlanF
11/05/2021, 1:41 AMps auxww
output for it?seph
Mike Myers
11/05/2021, 6:04 AMHarlanF
11/05/2021, 3:25 PM/opt/osquery/bin/osqueryd --flagfile /etc/osquery/osquery.flags --config_path /etc/osquery/osquery.conf
\_ /opt/osquery/bin/osqueryd
\_ .../bin/python3.8 /usr/lib/osquery/extension1.ext --socket /var/osquery/osquery.em --timeout 3 --interval 3
\_ .../bin/python3.8 /usr/lib/osquery/extension2.ext --socket /var/osquery/osquery.em --timeout 3 --interval 3
\_ .../bin/python3.8 /usr/lib/osquery/extension3.ext --socket /var/osquery/osquery.em --timeout 3 --interval 3
This how the processes look when freshly started (to 'ps auxwf'), when all the extensions are performant.
When some query (not an extension) hits a watchdog in our environment, the child process (line 2 above) must die and get restarted. In our setup, it's not restarting all the extension processes, and ends up looking like this:
ORDERED CHRONOLOGICALLY:
/opt/osquery/bin/osqueryd --flagfile /etc/osquery/osquery.flags --config_path /etc/osquery/osquery.conf
\_ .../bin/python3.8 /usr/lib/osquery/extension1.ext --socket /var/osquery/osquery.em --timeout 3 --interval 3
\_ .../bin/python3.8 /usr/lib/osquery/extension2.ext --socket /var/osquery/osquery.em --timeout 3 --interval 3
\_ /opt/osquery/bin/osqueryd
\_ .../bin/python3.8 /usr/lib/osquery/extension3.ext --socket /var/osquery/osquery.em --timeout 3 --interval 3
At this point, extension 3 works and start time matches the child daemon above it, but extensions 1 & 2 still have timing from the parent process above them. So in a watchdog situation, it'd appear something's hasn't managed to iterate through the extensions and restart them all.
If I look up the pid of the erring extensions immediately above, and kill one of them, something restarts it immediately, and it resumes working.Mike Myers
11/06/2021, 1:04 AM