Title
#windows
j

Juan Alvarez

12/22/2021, 11:48 AM
Hi, has anybody used osquery sucessfully in windows servers with a high load of events? I am trying to set osquery to capture all the windows_events and send them up to our SIEM using the tls plugin in some scenarios with around 2000 - 3000 events per second. I have noted that osquery struggles trying to get that huge amount of events out. I am actually querying every 30 secs (tried 60 secs as well), and flushing every 10 seconds. I have also increased the
logger_tls_max_lines
to the maximum (99999). Any suggested configuration or experience from somebody?
Stefano Bonicatti

Stefano Bonicatti

12/22/2021, 1:46 PM
Are you able to have direct access to one of those servers? WIth that amount of events and a default watchdog cpu limit or memory limit I would expect the watchdog to keep killing osquery and so blocking its ability to flush the logs. One quick way to notice that, (beyond looking at the log, which though you struggle to receive) is to see via Task Manager if you see one of the
osqueryd
pid which keeps changing. You can also verify how much CPU/Memory is using.
j

Juan Alvarez

12/22/2021, 2:20 PM
yes, i should have mentioned that i disabled watchdog for testing purposes (just to see if i am able to do actually send the events)
2:21 PM
now, i have restarted in verbose mode, and i see some of these which i have not seen bfore:
I1222 14:20:13.356295  2880 scheduler.cpp:176] Found results for query: pack/DevoEventsPack/all_windows_events
I1222 14:20:16.077103  2880 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:924|column_family.cc:924>] [events] Stalling writes because we have 15 immutable memtables (waiting for flush), max_write_buffer_number is set to 16 rate 16777216
I1222 14:20:16.142108  2880 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:924|column_family.cc:924>] [events] Stalling writes because we have 15 immutable memtables (waiting for flush), max_write_buffer_number is set to 16 rate 16777216
I1222 14:20:16.149109   112 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:924|column_family.cc:924>] [events] Stalling writes because we have 15 immutable memtables (waiting for flush), max_write_buffer_number is set to 16 rate 13421772
I1222 14:20:16.712949  2880 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:924|column_family.cc:924>] [events] Stalling writes because we have 15 immutable memtables (waiting for flush), max_write_buffer_number is set to 16 rate 16777216
I1222 14:20:16.775468  2880 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:924|column_family.cc:924>] [events] Stalling writes because we have 15 immutable memtables (waiting for flush), max_write_buffer_number is set to 16 rate 16777216
I1222 14:20:16.806708  4772 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:944|column_family.cc:944>] [events] Stalling writes because we have 20 level-0 files rate 13421772
I1222 14:20:16.869204  3528 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:944|column_family.cc:944>] [events] Stalling writes because we have 21 level-0 files rate 10737417
I1222 14:20:16.900492  2880 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:944|column_family.cc:944>] [events] Stalling writes because we have 21 level-0 files rate 8589933
I1222 14:20:16.916087  2268 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:944|column_family.cc:944>] [events] Stalling writes because we have 22 level-0 files rate 6871946
I1222 14:20:16.962956   112 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:944|column_family.cc:944>] [events] Stalling writes because we have 23 level-0 files rate 5497556
I1222 14:20:17.116748  2880 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:944|column_family.cc:944>] [events] Stalling writes because we have 23 level-0 files rate 4398044
I1222 14:20:17.476115  2880 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:944|column_family.cc:944>] [events] Stalling writes because we have 23 level-0 files rate 3518435
I1222 14:20:17.788628  2880 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:944|column_family.cc:944>] [events] Stalling writes because we have 23 level-0 files rate 2814748
I1222 14:20:18.024206  2880 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:924|column_family.cc:924>] [events] Stalling writes because we have 15 immutable memtables (waiting for flush), max_write_buffer_number is set to 16 rate 3940647
I1222 14:20:18.149199  2880 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:924|column_family.cc:924>] [events] Stalling writes because we have 15 immutable memtables (waiting for flush), max_write_buffer_number is set to 16 rate 5516905
I1222 14:20:18.242956  2880 rocksdb.cpp:67] RocksDB: [WARN] [db\<http://column_family.cc:924|column_family.cc:924>] [events] Stalling writes because we have 15 immutable memtables (waiting for flush), max_write_buffer_number is set to 16 rate 7723666
2:21 PM
🤔
2:37 PM
Also, i should mention that i am using osquery 4.9.0 , not sure if there is any patch in 5.1 that might help but i am going to give it a try
2:53 PM
I am going to also try to increase this hidden flag:
HIDDEN_FLAG(int32, rocksdb_write_buffer, 16, "Max write buffer number");
any idea if that might help unstucking things?
Stefano Bonicatti

Stefano Bonicatti

12/22/2021, 3:57 PM
I see; I haven’t personally yet encountered such high amount of events and those messages, even though there . RocksDB is on disk, so disk speed will affect the ability of osquery to quickly store data.
3:58 PM
I know of others who have reported this in the past, like: https://github.com/osquery/osquery/issues/5162, it’s something that would need a deeper look into. I suspect that increasing that number will only increase the ability to handle spikes, as long as spikes are the problem, but if it’s continuous, then the writing rate has to increase.
j

Juan Alvarez

12/22/2021, 4:08 PM
I see. This server is gathering data from other servers via WEF/WEC and therefore the amount of events is so high (since the events are forwarded) . What would you say is a number of Events Per Second osquery can handle in a normal scenario?
4:09 PM
I am going to try to put a better disk in place, see if the situation changes to any better. Any other recommendation to increase the writing rate? Any parameter that can be tweaked in rocksDB?
Stefano Bonicatti

Stefano Bonicatti

12/22/2021, 4:16 PM
Giving numbers is quite difficult because there are too many variables, from hardware to the nature of the events. I think I would expect maybe less than 1k/s events; as for parameters to change, you could try increasing
rocksdb_background_flushes
(this should translates roughly in how many threads are used to do the flushing)
4:21 PM
In any case as I was saying I haven’t personally encountered such a high amount of events generated; would need to test, and do some profiling, I suspect this requires code changes.
j

Juan Alvarez

12/22/2021, 7:19 PM
Thanks, i am going to test a bit with those, ill see where i can get
4:49 PM
Tried with a gp3 in AWS with 16000 IOPS and 1000MiB throughput still could not make it work. I have left a comment in https://github.com/osquery/osquery/issues/5162 i am not sure how common is this scenario (maybe not common to be so busy) but still i feel that in the windows events we always see problems when loads get a little higher.
Stefano Bonicatti

Stefano Bonicatti

12/23/2021, 4:55 PM
@Juan Alvarez Since you were using AWS, would you be able to also give us the specs of that VM? CPU/Memory/Disk. Thanks!
j

Juan Alvarez

12/23/2021, 4:59 PM
Sure: • Instance type m5.xlarge • 4 vCPUs / 16 GB RAM • Disk: GP3 SSD ◦ Size: 40 GB ◦ IOPS: 16000 Max IOPS ◦ Throughput: 1000 MiB/s
Stefano Bonicatti

Stefano Bonicatti

12/23/2021, 5:04 PM
Do you also know roughly what's the CPU usage pre and post osquery running?
j

Juan Alvarez

12/23/2021, 5:08 PM
Around 15% in this server that i am replicating the issue in, only a event generator using https://github.com/andrewkroh/goeventgen is running to emulate the load.