Title
#general
h

HarlanF

11/04/2021, 8:31 PM
I've got three homegrown extensions deployed to a big fleet. They're used in some schedule queries, and I'm getting that they're initially all loading (with a
Registering extension
line in the osqueryd.INFO file). Then, less than 24h later, 1,000 of the fleet starts reporting "Error executing <pack>: no such table: <table>". An osqueryd restart fixes it. Ideas?
9:52 PM
Also, even when I'm getting that error emitted, I can go into osqueryi and access the table fine each time.
s

seph

11/05/2021, 12:07 AM
Unless you're using
.connect
osqueryi is a separate process and will spawn its own extensions.
12:07 AM
Sounds like the extension is dying (or being killed). You could use the process table to examine it
h

HarlanF

11/05/2021, 1:41 AM
@seph, isn't there some sort of oversight on the extension process from osqueryd? Presuming there's still a process alive for the extension, is there anything I could/should look for in
ps auxww
output for it?
s

seph

11/05/2021, 3:42 AM
extensions run as their own process. You should be able to see them in the ps output.
3:43 AM
Is it alive? Is the memory utilization okay? I don’t have any idea about your extension, so this is first principles…
3:44 AM
I don’t offhand remember how osquery handles poorly performing extensions. Crashing is likely different than hanging.
Mike Myers

Mike Myers

11/05/2021, 6:04 AM
if this is Windows, we just recently fixed a problem with extension loading https://github.com/osquery/osquery/issues/7324
h

HarlanF

11/05/2021, 3:25 PM
It's Linux, but will comb through some ones that are erring today, through the process table, and will see if I can distinguish bad from good using that. Thanks, both.
9:58 PM
@seph, I think we're experiencing a bug without how osquery recovers from watchdog errors when there are multiple extensions.
10:25 PM
I think we're experiencing a bug in extensions-handling that's brought on by a watchdog action. Again, we have three extensions we've developed, each being a python-based extension through Swift. ORDERED CHRONOLOGICALLY:
/opt/osquery/bin/osqueryd --flagfile /etc/osquery/osquery.flags --config_path /etc/osquery/osquery.conf
 \_ /opt/osquery/bin/osqueryd
 \_ .../bin/python3.8 /usr/lib/osquery/extension1.ext --socket /var/osquery/osquery.em --timeout 3 --interval 3
 \_ .../bin/python3.8 /usr/lib/osquery/extension2.ext --socket /var/osquery/osquery.em --timeout 3 --interval 3
 \_ .../bin/python3.8 /usr/lib/osquery/extension3.ext --socket /var/osquery/osquery.em --timeout 3 --interval 3
This how the processes look when freshly started (to 'ps auxwf'), when all the extensions are performant. When some query (not an extension) hits a watchdog in our environment, the child process (line 2 above) must die and get restarted. In our setup, it's not restarting all the extension processes, and ends up looking like this: ORDERED CHRONOLOGICALLY:
/opt/osquery/bin/osqueryd --flagfile /etc/osquery/osquery.flags --config_path /etc/osquery/osquery.conf
 \_ .../bin/python3.8 /usr/lib/osquery/extension1.ext --socket /var/osquery/osquery.em --timeout 3 --interval 3
 \_ .../bin/python3.8 /usr/lib/osquery/extension2.ext --socket /var/osquery/osquery.em --timeout 3 --interval 3
 \_ /opt/osquery/bin/osqueryd
 \_ .../bin/python3.8 /usr/lib/osquery/extension3.ext --socket /var/osquery/osquery.em --timeout 3 --interval 3
At this point, extension 3 works and start time matches the child daemon above it, but extensions 1 & 2 still have timing from the parent process above them. So in a watchdog situation, it'd appear something's hasn't managed to iterate through the extensions and restart them all. If I look up the pid of the erring extensions immediately above, and kill one of them, something restarts it immediately, and it resumes working.
12:47 AM
Ah, we're running Thrift 0.13.0, and we're going to try on a box upgrading to Thrift 0.15.0; we'll know Monday I suppose
Mike Myers

Mike Myers

11/06/2021, 1:04 AM
We're working on updating Thrift within osquery too. https://github.com/osquery/osquery/pull/7330/commits/ad4a128fb0eaaf811f80e2ee6a6243454da0ec2d Maybe you can build osquery from this branch to test if it resolves the issue you're seeing?
1:04 AM
Or one of us can provide a test build from that branch.