Hello osquery team is there any documentation about the cons osquery #general

Hello osquery team, is there any documentation abo...

Robert Soulé

03/23/2024, 6:58 PM

Hello osquery team, is there any documentation about the consistency guarantees that osquery provides? I'm curious, for example, how osquery guards against "phantom inserts" while executing a query? I'd like to understand both what guarantees are provided and how they are ensured.

Stefano Bonicatti

03/24/2024, 2:39 PM

Hello @Robert Soulé, not sure I fully understood your question, but osquery doesn't store data on the filesystem in a SQL database, to be later gathered. The tables it presents are all virtual, and they, for the most part, query the system via system APIs on the fly, transforming the results into log format and buffering those in RocksDB to be later shipped, or immediately written on filesystem if that kind of logger is active. I say for the most part because the evented tables use data (events) that has been previously saved in RocksDB from the event "listener" (publisher). They are fundamentally streams.

Robert Soulé

03/24/2024, 3:13 PM

Hi @Stefano Bonicatti: Perhaps I am confused. In general, in databases, we can have concurrency problems. So, for example, if we have two transactions that access the process table: one is reading to say "what processes do we currently have running" and the other transaction (really, normal OS operation) updates the state to add/remove a process. I don't think osquery does anything like 2PL around kernel data structures. I was wondering if it does anything to guard against concurrency problems?

Robert Soulé

03/24/2024, 3:14 PM

In my question, I don't think it matters that a table is virtual or not.

Stefano Bonicatti

03/25/2024, 9:56 AM

Ah I see, so no, osquery is only user-space, so it doesn't have much control on those kind of things. What the system APIs return is what it gets. It definitely happens that when it's retrieving a list of pids lets say, then when it internally loops it over to get more information, the actual process may disappear, or data can change.

Stefano Bonicatti

03/25/2024, 9:57 AM

So it all depends on each table implementation

Robert Soulé

03/25/2024, 11:36 AM

I see. Thank you. As a follow up, I didn't realize that os query was using RocksDB. I thought it was only SQLite. Is there a document that describes the high-level design?

Stefano Bonicatti

03/25/2024, 12:56 PM

Sqlite is just used to interpret the SQL queries and call the virtual tables logic (basically, the system APIs). RocksDB is used to store some state on the scheduler, the denylisted queries, but also events yet to be queried and logs that are waiting to be sent.

Robert Soulé

03/25/2024, 12:58 PM

OK. I see. Thank you

seph

03/25/2024, 7:02 PM

So, for example, if we have two transactions that access the process table: one is reading to say “what processes do we currently have running” and the other transaction (really, normal OS operation) updates the state to add/remove a process. I don’t think osquery does anything like 2PL around kernel data structures. I was wondering if it does anything to guard against concurrency problems?

That’s not really how this works. There is no real database of processes that osquery accesses. If we ignore the evented tables, osquery is really just an API translation layer. There’s a virtual table, and the generate function is basically just fetching the data via some api

Robert Soulé

03/25/2024, 7:15 PM

Thanks, @seph. That helps.

2 Views

Open in Slack

Previous Next