Title
#general
a

Ari Weinberg

04/20/2022, 6:25 PM
Hey all, I'm getting an issue where I'm getting upwards of 50% CPU usage by the osquery daemon on windows domain controllers. running the daemon in verbose, it happens right after the following messages (and hangs there while the CPU spikes):
I0420 11:22:13.001909  6160 eventsubscriberplugin.cpp:492] Found 15 events for subscriber WindowsEventLogPublisher.powershell_events
I0420 11:22:13.533206  6160 events.cpp:70] Skipping subscriber: powershell_events: Required publisher is disabled by configuration
After which it gets stopped by the watchdog, and the whole thing repeats a few seconds later. Any insights?
f

fritz

04/20/2022, 7:04 PM
@Ari Weinberg what's your query config look like?
7:05 PM
I know i've seen machines grind to a halt when querying seemingly innocuous tables (eg.
users
) which can have thousands of rows when the device in question is a domain controller
a

Ari Weinberg

04/20/2022, 7:05 PM
I had that problem already, and disabled all those queries
f

fritz

04/20/2022, 7:06 PM
👍 well at least i wasn't totally off the mark
a

Ari Weinberg

04/20/2022, 7:07 PM
Heres the current flags file:
# Configuration
--config_plugin=tls
--config_tls_endpoint=/api/v1/osquery/config
--config_refresh=60

# Enable watchdog to stop queries that use too much resources
--disable_watchdog=false
--watchdog_level=1

# Live query
--disable_distributed=false
--distributed_plugin=tls
--distributed_interval=10
--distributed_tls_max_attempts=3
--distributed_tls_read_endpoint=/api/v1/osquery/distributed/read
--distributed_tls_write_endpoint=/api/v1/osquery/distributed/write

# Logging
--logger_plugin=tls,filesystem
--logger_tls_endpoint=/api/v1/osquery/log
--logger_tls_period=10

# File carving
--disable_carver=false
--carver_start_endpoint=/api/v1/osquery/carve/begin
--carver_continue_endpoint=/api/v1/osquery/carve/block
--carver_block_size=2000000

# Windows Configuration
#--enable_powershell_events_subscriber
#--enable_windows_events_publisher
#--enable_windows_events_subscriber
#--windows_event_channels=System,Application,Setup,Security

# Disable problamatic table
--disable_tables=chrome_extensions,google_chrome_profiles,mdm,munki_info
f

fritz

04/20/2022, 7:08 PM
i wonder if you can slowly 50:50 your way through your scheduled query config to see where it is hanging
a

Ari Weinberg

04/20/2022, 7:08 PM
its getting the config from fleet:
config:
  options:
    disable_tables: 'munki_info,google_chrome_profiles,mdm'
    pack_delimiter: _
    logger_tls_period: 10
    distributed_plugin: tls
    disable_distributed: false
    logger_tls_endpoint: /api/v1/osquery/log
    distributed_interval: 10
    distributed_tls_max_attempts: 3
f

fritz

04/20/2022, 7:09 PM
the other option is run the queries as distributed or via osqueryi one by one and see which are not responding
a

Ari Weinberg

04/20/2022, 7:09 PM
But looking at the daemon in verbose mode, it doesnt look like its happening by a query
f

fritz

04/20/2022, 7:10 PM
do we know that the last message is related to the cause of the hang, or is it the last thing that is working successfully pre-hang
a

Ari Weinberg

04/20/2022, 7:12 PM
Here's the full loop that's happening:
I0420 12:09:09.901724  6804 eventsubscriberplugin.cpp:492] Found 15 events for subscriber WindowsEventLogPublisher.powershell_events
I0420 12:09:10.429062  6804 events.cpp:70] Skipping subscriber: powershell_events: Required publisher is disabled by configuration
I0420 12:10:04.662425  6288 config.cpp:1218] Refreshing configuration state
I0420 12:10:04.662425  6288 tls.cpp:255] TLS/HTTPS POST request to URI: <https://FLEET_URL/api/v1/osquery/config>
W0420 12:10:05.193722  6532 watcher.cpp:391] osqueryd worker (6908) stopping: Maximum sustainable CPU utilization limit exceeded: 57
I0420 12:10:05.224970  6532 watcher.cpp:656] osqueryd watcher (4528) executing worker (6712)
I0420 12:10:05.307375  6104 init.cpp:354] osquery worker initialized [watcher=4528]
I0420 12:10:05.307375  6104 dispatcher.cpp:78] Adding new service: WatcherWatcherRunner (00000216390B03F0) to thread: 6152 (000002163908E400) in process 6712
I0420 12:10:05.307375  6104 rocksdb.cpp:132] Opening RocksDB handle: \Program Files\osquery\osquery.db
I0420 12:10:05.635535  6104 dispatcher.cpp:78] Adding new service: ExtensionWatcher (000002163A95DEF0) to thread: 4888 (000002163AE92B40) in process 6712
I0420 12:10:05.635535  6104 dispatcher.cpp:78] Adding new service: ExtensionRunnerCore (000002163A94B260) to thread: 5196 (000002163AE92AE0) in process 6712
I0420 12:10:05.635535  6104 auto_constructed_tables.cpp:99] Removing stale ATC entries
I0420 12:10:05.635535  5196 interface.cpp:299] Extension manager service starting: \\.\pipe\osquery.em
I0420 12:10:05.635535  6104 dispatcher.cpp:78] Adding new service: ConfigRefreshRunner (000002163904C7F0) to thread: 6188 (000002163AF02500) in process 6712
I0420 12:10:05.635535  6104 tls.cpp:255] TLS/HTTPS POST request to URI: <https://FLEET_URL/api/v1/osquery/config>
I0420 12:10:10.446698  6104 eventfactory.cpp:156] Event publisher not enabled: ntfs_event_publisher: NTFS event publisher disabled via configuration
I0420 12:10:11.479414  6104 eventsubscriberplugin.cpp:492] Found 15 events for subscriber WindowsEventLogPublisher.powershell_events
I0420 12:10:11.932579  6104 events.cpp:70] Skipping subscriber: powershell_events: Required publisher is disabled by configuration
f

fritz

04/20/2022, 7:17 PM
i am admittedly out of my depth in terms of the cause here and any suggestions would be blind stabs
a

Ari Weinberg

04/20/2022, 7:20 PM
@fritz Really appreciate your help
f

fritz

04/20/2022, 7:21 PM
sorry i couldn't be of greater assistance, the folks over in #fleet will no doubt be a good resource though if you are having fleet related trouble
Stefano Bonicatti

Stefano Bonicatti

04/20/2022, 7:21 PM
Yeah I would maybe check how many events osquery is getting, looking at the
osquery_events
table, and or also at the size of the RocksDB database
7:23 PM
normally that's the second possible cause after a query that cases high CPU usage. Either a lot of events in general, so the publisher has to work a lot, and/or the RocksDB database that has a lot of data to compact, expire and so on.
a

Ari Weinberg

04/20/2022, 7:23 PM
I backed up and deleted the .db folder, and that seems to have solved it
7:24 PM
Whats the long term solution then to manage the db?
Stefano Bonicatti

Stefano Bonicatti

04/20/2022, 7:24 PM
Seeing you have some events active, would be useful to know the interval of your queries on evented tables
7:25 PM
normally it's better to have a low interval so that the number of keys in the RocksDB database doesn't grow too much. If they do, then there are more keys to expire when the query runs
a

Ari Weinberg

04/20/2022, 7:25 PM
In other words, you want to know how often im querying
events
tables?
Stefano Bonicatti

Stefano Bonicatti

04/20/2022, 7:25 PM
yeah
a

Ari Weinberg

04/20/2022, 7:25 PM
Im not using the eventing framework...
7:25 PM
This is just running on a DC, so there are a lot of logs
Stefano Bonicatti

Stefano Bonicatti

04/20/2022, 7:26 PM
oh my bad, I think I was a victim of word wrapping in the slim Slack thread, I see know that they are commented out.
a

Ari Weinberg

04/20/2022, 7:27 PM
Oh my bad on your bad, those were uncommented untill this morning, when I started troubleshooting
7:28 PM
but this still happened for a while after I commented those out (until i just deleted the db this morning)
Stefano Bonicatti

Stefano Bonicatti

04/20/2022, 7:28 PM
ah so, another thing to be careful about is that if you're not querying the evented tables but still receiving events, those will continuosly accumulate and then at osquery startup the expiration will start
a

Ari Weinberg

04/20/2022, 7:29 PM
What do you mean by experation?
Stefano Bonicatti

Stefano Bonicatti

04/20/2022, 7:29 PM
that been said, one thing you could check is the content of the db by starting osquery with your backed up db and pass
--database_dump
7:30 PM
events collected by osquery won't stay indefinitely in the database to be queried. There's an expiration that is calculated per table/subscriber and based on the interval of the scheduled queries that are done on them.
7:32 PM
also, what version of osquery are you using?
a

Ari Weinberg

04/20/2022, 7:33 PM
5.2.1.0
7:36 PM
what was
--database_dump
supposed to show me? I just got an infinite scroll
Stefano Bonicatti

Stefano Bonicatti

04/20/2022, 7:43 PM
What osquery was likely working on. If there are a lot of events, then it was likely working on expiring them. There also can be a lot of logs to be sent via TLS or another method, but normally that's not cause for high cpu usage.
a

Ari Weinberg

04/20/2022, 7:44 PM
So to be clear, keeping the events disabled should solve this in the long term?
7:45 PM
Becasue this all seems to come from the windowseventlogpublisher
Stefano Bonicatti

Stefano Bonicatti

04/20/2022, 7:46 PM
Well, it depends; it's not said that you can't use events, it depends on the rate of those and if your machine is capable, but most of all I would say the interval of the scheduled queries on those.
a

Ari Weinberg

04/20/2022, 7:47 PM
I dont think Im querying those tables at all, but ill check.
Stefano Bonicatti

Stefano Bonicatti

04/20/2022, 7:47 PM
If you're not using them, don't enable them. If you're using them, query the evented table often (less than a minute),
a

Ari Weinberg

04/20/2022, 7:47 PM
Do any of the query packs that come with osquery use those tables?
Stefano Bonicatti

Stefano Bonicatti

04/20/2022, 7:49 PM
Nope
a

Ari Weinberg

04/20/2022, 7:50 PM
Okay then im not using them, and im going to disable them.
Stefano Bonicatti

Stefano Bonicatti

04/20/2022, 7:50 PM
at least not Windows tables. There's one for
hardware_events
which is on Linux, and then the
osquery_events
table which tracks the status of publisher and subscribers but that's it
a

Ari Weinberg

04/20/2022, 7:50 PM
@Stefano Bonicatti Thanks so much for your prompt help and insight!