I have an Osquery agent producing the following errors with osquery #ebpf

I have an Osquery agent producing the following er...

CptOfEvilMinions

03/24/2022, 8:40 PM

I have an Osquery agent producing the following errors with eBPF enabled:

Copy code

2022-03-23 07:53:27	
Worker returned exit status
2022-03-23 07:53:08	
Error logging the results of query: pack/test-pack/BPF_PROC_EVENTS: IOError: Bad file descriptor
2022-03-23 07:53:08	
Error adding new results to database for query pack/test-pack/BPF_PROC_EVENTS: IOError: Bad file descriptor
2022-03-23 07:52:53	
RocksDB: [ERROR] [db/db_impl/db_impl_compaction_flush.cc:2541] Waiting after background compaction error: IO error: While appending to file: /var/osquery/osquery.db/052008.sst: Bad file descriptor, Accumulated background error counts: 1
2022-03-23 07:52:53	
RocksDB: [WARN] [db/error_handler.cc:334] Background IO error IO error: While appending to file: /var/osquery/osquery.db/052008.sst: Bad file descriptor
2022-03-23 07:52:53	
RocksDB: [WARN] [db/db_impl/db_impl_compaction_flush.cc:3019] Compaction error: IO error: While appending to file: /var/osquery/osquery.db/052008.sst: Bad file descriptor

eBPF flags

Copy code

#### Process Auditing ####
--disable_events=false
--enable_bpf_events=true
--events_optimize=true
--events_expiry=3600
--events_max=200000

Pack queries:

Copy code

SELECT * FROM bpf_socket_events WHERE local_port != 0;

SELECT * FROM bpf_process_events;

CptOfEvilMinions

03/24/2022, 8:40 PM

@sharvil

alessandrogario

03/24/2022, 9:10 PM

This is bad, is this the latest version of osquery?

CptOfEvilMinions

03/24/2022, 9:19 PM

Copy code

osqueryd --version
osqueryd version 5.1.0

CptOfEvilMinions

03/28/2022, 5:17 PM

@alessandrogario any next steps you want me to take? • Config dumps? • update to osquery v5.2.2 • other?

alessandrogario

03/28/2022, 5:29 PM

I am not really sure how to proceed, I don't think that bpf itself can break the database like that

alessandrogario

03/28/2022, 5:29 PM

my guess is that BPF is tracing a lot of process activity and the watchdog is causing the worker to get killed

alessandrogario

03/28/2022, 5:29 PM

and after a repeated amount of kills, the database just broke and stopped working

alessandrogario

03/28/2022, 5:40 PM

If you are able to reproduce this with a new database, you could try to disable the watchdog

alessandrogario

03/28/2022, 5:40 PM

if it doesn't happen again, then the worker being killed was the problem

alessandrogario

03/28/2022, 5:41 PM

as an additional note, the BPF code does not access the database directly or use rocksdb in any way

CptOfEvilMinions

03/28/2022, 6:44 PM

So the watchdog mem limit is set to 8GBs and the cpu limit is the default

CptOfEvilMinions

03/28/2022, 6:45 PM

Could the watchdog cpu limit be causing this?

alessandrogario

03/28/2022, 7:04 PM

That is my guess, yes

alessandrogario

03/28/2022, 7:04 PM

How many processes are running on that machine?

CptOfEvilMinions

03/29/2022, 5:22 PM

How many processes on the overall system or osquery proc count?

CptOfEvilMinions

03/29/2022, 5:23 PM

It’s a test machine with alot of hardware resources so overall system resources should not be the issue

CptOfEvilMinions

03/29/2022, 5:24 PM

I will try disabling the watchdog and see if I can reproduce the error above.

alessandrogario

03/29/2022, 5:25 PM

How many processes are being created/closed and how many files are being opened/closed

alessandrogario

03/29/2022, 5:25 PM

and/or, if there are containers running on the host and how many

alessandrogario

03/29/2022, 5:26 PM

mostly to estimate how many events are hitting osquery

alessandrogario

03/29/2022, 5:26 PM

also, as powerful as that machine can be, the event stream is processed in order and it's single threaded

👍 1

npamnani

03/30/2022, 7:28 AM

From the logs it looks like, problem is not confined to rocksdb, some code is randomly closing the fd or there is a double close on an fd.

CptOfEvilMinions

04/06/2022, 5:09 PM

Sorry, for the delayed response, it was a busy week. I currently disabled the watchdog limits with

-1

CptOfEvilMinions

04/06/2022, 5:11 PM

@alessandrogario as for your question about open files and procs, I am not sure. I can try and get some numbers. That being said, these are production systems so it’s expected to be high volume. Has Osquery + eBPF been tested on production systems?

alessandrogario

04/06/2022, 5:14 PM

I think bpf in its current form is hard to use on machines with high load

alessandrogario

04/06/2022, 5:15 PM

in order to avoid dependencies, as per the osquery guidelines, bpf has to trace all the open/fork/exec/close/chdir/etc in order to work

alessandrogario

04/06/2022, 5:16 PM

on those systems maybe audit could be a better idea for now, since the kernel provides a lot more metadata compared to what can be easily gathered with bpf

23 Views

Open in Slack

Previous Next