I have an Osquery agent producing the following er...
# ebpf
c
I have an Osquery agent producing the following errors with eBPF enabled:
Copy code
2022-03-23 07:53:27	
Worker returned exit status
2022-03-23 07:53:08	
Error logging the results of query: pack/test-pack/BPF_PROC_EVENTS: IOError: Bad file descriptor
2022-03-23 07:53:08	
Error adding new results to database for query pack/test-pack/BPF_PROC_EVENTS: IOError: Bad file descriptor
2022-03-23 07:52:53	
RocksDB: [ERROR] [db/db_impl/db_impl_compaction_flush.cc:2541] Waiting after background compaction error: IO error: While appending to file: /var/osquery/osquery.db/052008.sst: Bad file descriptor, Accumulated background error counts: 1
2022-03-23 07:52:53	
RocksDB: [WARN] [db/error_handler.cc:334] Background IO error IO error: While appending to file: /var/osquery/osquery.db/052008.sst: Bad file descriptor
2022-03-23 07:52:53	
RocksDB: [WARN] [db/db_impl/db_impl_compaction_flush.cc:3019] Compaction error: IO error: While appending to file: /var/osquery/osquery.db/052008.sst: Bad file descriptor
eBPF flags
Copy code
#### Process Auditing ####
--disable_events=false
--enable_bpf_events=true
--events_optimize=true
--events_expiry=3600
--events_max=200000
Pack queries:
Copy code
SELECT * FROM bpf_socket_events WHERE local_port != 0;

SELECT * FROM bpf_process_events;
@sharvil
a
This is bad, is this the latest version of osquery?
c
Copy code
osqueryd --version
osqueryd version 5.1.0
@alessandrogario any next steps you want me to take? • Config dumps? • update to osquery v5.2.2 • other?
a
I am not really sure how to proceed, I don't think that bpf itself can break the database like that
my guess is that BPF is tracing a lot of process activity and the watchdog is causing the worker to get killed
and after a repeated amount of kills, the database just broke and stopped working
If you are able to reproduce this with a new database, you could try to disable the watchdog
if it doesn't happen again, then the worker being killed was the problem
as an additional note, the BPF code does not access the database directly or use rocksdb in any way
c
So the watchdog mem limit is set to 8GBs and the cpu limit is the default
Could the watchdog cpu limit be causing this?
a
That is my guess, yes
How many processes are running on that machine?
c
How many processes on the overall system or osquery proc count?
It’s a test machine with alot of hardware resources so overall system resources should not be the issue
I will try disabling the watchdog and see if I can reproduce the error above.
a
How many processes are being created/closed and how many files are being opened/closed
and/or, if there are containers running on the host and how many
mostly to estimate how many events are hitting osquery
also, as powerful as that machine can be, the event stream is processed in order and it's single threaded
👍 1
n
From the logs it looks like, problem is not confined to rocksdb, some code is randomly closing the fd or there is a double close on an fd.
c
Sorry, for the delayed response, it was a busy week. I currently disabled the watchdog limits with
-1
.
@alessandrogario as for your question about open files and procs, I am not sure. I can try and get some numbers. That being said, these are production systems so it’s expected to be high volume. Has Osquery + eBPF been tested on production systems?
a
I think bpf in its current form is hard to use on machines with high load
in order to avoid dependencies, as per the osquery guidelines, bpf has to trace all the open/fork/exec/close/chdir/etc in order to work
on those systems maybe audit could be a better idea for now, since the kernel provides a lot more metadata compared to what can be easily gathered with bpf