Hello osquery team, is there any documentation abo...
# general
r
Hello osquery team, is there any documentation about the consistency guarantees that osquery provides? I'm curious, for example, how osquery guards against "phantom inserts" while executing a query? I'd like to understand both what guarantees are provided and how they are ensured.
s
Hello @Robert Soulé, not sure I fully understood your question, but osquery doesn't store data on the filesystem in a SQL database, to be later gathered. The tables it presents are all virtual, and they, for the most part, query the system via system APIs on the fly, transforming the results into log format and buffering those in RocksDB to be later shipped, or immediately written on filesystem if that kind of logger is active. I say for the most part because the evented tables use data (events) that has been previously saved in RocksDB from the event "listener" (publisher). They are fundamentally streams.
r
Hi @Stefano Bonicatti: Perhaps I am confused. In general, in databases, we can have concurrency problems. So, for example, if we have two transactions that access the process table: one is reading to say "what processes do we currently have running" and the other transaction (really, normal OS operation) updates the state to add/remove a process. I don't think osquery does anything like 2PL around kernel data structures. I was wondering if it does anything to guard against concurrency problems?
In my question, I don't think it matters that a table is virtual or not.
s
Ah I see, so no, osquery is only user-space, so it doesn't have much control on those kind of things. What the system APIs return is what it gets. It definitely happens that when it's retrieving a list of pids lets say, then when it internally loops it over to get more information, the actual process may disappear, or data can change.
So it all depends on each table implementation
r
I see. Thank you. As a follow up, I didn't realize that os query was using RocksDB. I thought it was only SQLite. Is there a document that describes the high-level design?
s
Sqlite is just used to interpret the SQL queries and call the virtual tables logic (basically, the system APIs). RocksDB is used to store some state on the scheduler, the denylisted queries, but also events yet to be queried and logs that are waiting to be sent.
r
OK. I see. Thank you
s
So, for example, if we have two transactions that access the process table: one is reading to say “what processes do we currently have running” and the other transaction (really, normal OS operation) updates the state to add/remove a process. I don’t think osquery does anything like 2PL around kernel data structures. I was wondering if it does anything to guard against concurrency problems?
That’s not really how this works. There is no real database of processes that osquery accesses. If we ignore the evented tables, osquery is really just an API translation layer. There’s a virtual table, and the generate function is basically just fetching the data via some api
r
Thanks, @seph. That helps.