Heads up it looks we’re waiting a little longer fo...
# core
Heads up it looks we’re waiting a little longer for 4.4.0 due to wanting for https://github.com/osquery/osquery/pull/6280 to be solid. I fully support taking our time on that.
Alessandro should have some time tomorrow to give that attention, he said
We have finished with the second pass on the PR, and here's a summary of what we have changed: 1. The 'keywords' column has been changed from BIGINT to TEXT. This is because SQLite does not support std::uint64_t and Windows events sometimes return values that can't be represented with an std::int64_t 2. The tests have been changed: we have exported several different event types from different event channels. Those samples are used to compare the output we get with the output we expect 3. The
column has been slightly changed; instead of adding the fields directly inside a single object (example:
{ "key": "value"}
), we now have a root "Data" object (example:
{ "Data": { "key": "value" } }
). This allows us to extend this with additional data from the Windows Event Log XML without breaking compatibility
Ha. I have also been thwarted by sqlite’s lack of
😞 1
I don’t understand the justification on changing the data column. Not that I have a strong preference, but the justification seems weird.
It’s a json hash, what’s keeping you from extending it regardless of nesting level?
so the XML event has an EventData object, which is just a small part
I guess having
{data: {...}}
means you can add
{metadata: {}}
or something
and inside EventData, we have Data tags + other things
right now, we only have Data fields (like pre-refactor)
Cool. Makes sense. Thanks.
if we decide we want to add additional stuff the data column, it's easier if we keep things separat
what do you think? did my reasoning make sense?
I am not sure if we want to add new stuff right now, i was just trying to keep the door open 🤔
Yes, makes sense. You’re not adding new data fields, you’re adding something next to data.
For the
tables, I’ve struggled a bit with this. For runtim e variable/unknown data fields, there’s no good translation to columns. I don’t like dealing with
data, it’s hard to merge/sort/join on it. I usually end up with an EAV model.
It’s a bit weird to with, but it becomes really powerful in handling arbitrary data structures
Yeah, I also thought having JSON in colums to be weird
usually everyone tells me they love json in there because their log aggregators can easily handle it
Nice! The new features are good. They do break existing functionality though. I would recommend having FLAG's to enable this new functionality? Typically, a schema would have new added columns rather than changing existing? Also, I think most servers flatten result structures, why not go with columns with a 'data_' prefix rather than adding the data object level?
(researchers have asked me for direct access to the data fields in the past, so they will be happy)
Disregard the part about flattening and using data_ prefix... doesn't work when tying to a fixed schema. If we had the event forwarding that bypassed event cache though...
Yeah, the problem there is that they are property bags written in different formats. Sometimes it's an array, sometimes a key-value
I know it is a breaking change, but the old implementation had no tests and we had evidence that data was being silently lost
yeah, I did a POC at one point, and used data_0, data_1, etc. when it was an array
there was no way to keep the compatibility because we would have to reproduce the parsing bugs
and the reasoning i made was that if we have to break it let's break it with a new format that is stable and can be extended
sqlite has some json functions. You can get direct access to the json subfields, but it’s pretty ugly.
I think json -> SIEM is easy, but also feels very different. It’s more osquery-as-log-agent. I think there’s a lot of power lurking around sql
I’m pretty sure no one likes EAV. But it does handle all these cases 😉