Title
#core
theopolis

theopolis

06/15/2020, 11:38 PM
Heads up it looks we’re waiting a little longer for 4.4.0 due to wanting for https://github.com/osquery/osquery/pull/6280 to be solid. I fully support taking our time on that.
Mike Myers

Mike Myers

06/16/2020, 5:42 PM
Alessandro should have some time tomorrow to give that attention, he said
a

alessandrogario

06/18/2020, 1:09 PM
We have finished with the second pass on the PR, and here's a summary of what we have changed:1. The 'keywords' column has been changed from BIGINT to TEXT. This is because SQLite does not support std::uint64_t and Windows events sometimes return values that can't be represented with an std::int64_t 2. The tests have been changed: we have exported several different event types from different event channels. Those samples are used to compare the output we get with the output we expect 3. The
data
column has been slightly changed; instead of adding the fields directly inside a single object (example:
{ "key": "value"}
), we now have a root "Data" object (example:
{ "Data": { "key": "value" } }
). This allows us to extend this with additional data from the Windows Event Log XML without breaking compatibility
s

seph

06/18/2020, 2:21 PM
Ha. I have also been thwarted by sqlite’s lack of
uint64_t
2:36 PM
I don’t understand the justification on changing the data column. Not that I have a strong preference, but the justification seems weird.
2:36 PM
It’s a json hash, what’s keeping you from extending it regardless of nesting level?
a

alessandrogario

06/18/2020, 2:37 PM
so the XML event has an EventData object, which is just a small part
s

seph

06/18/2020, 2:37 PM
I guess having
{data: {...}}
means you can add
{metadata: {}}
or something
a

alessandrogario

06/18/2020, 2:37 PM
and inside EventData, we have Data tags + other things
2:37 PM
right now, we only have Data fields (like pre-refactor)
s

seph

06/18/2020, 2:37 PM
Cool. Makes sense. Thanks.
a

alessandrogario

06/18/2020, 2:38 PM
if we decide we want to add additional stuff the data column, it's easier if we keep things separat
2:38 PM
what do you think? did my reasoning make sense?
2:38 PM
I am not sure if we want to add new stuff right now, i was just trying to keep the door open 🤔
s

seph

06/18/2020, 2:38 PM
Yes, makes sense. You’re not adding new data fields, you’re adding something next to data.
2:39 PM
For the
launcher
tables, I’ve struggled a bit with this. For runtim e variable/unknown data fields, there’s no good translation to columns. I don’t like dealing with
json
data, it’s hard to merge/sort/join on it. I usually end up with an EAV model.
2:40 PM
It’s a bit weird to with, but it becomes really powerful in handling arbitrary data structures
a

alessandrogario

06/18/2020, 2:45 PM
Yeah, I also thought having JSON in colums to be weird
2:45 PM
usually everyone tells me they love json in there because their log aggregators can easily handle it
packetzero

packetzero

06/18/2020, 3:49 PM
Nice! The new features are good. They do break existing functionality though. I would recommend having FLAG's to enable this new functionality? Typically, a schema would have new added columns rather than changing existing? Also, I think most servers flatten result structures, why not go with columns with a 'data_' prefix rather than adding the data object level?
3:50 PM
(researchers have asked me for direct access to the data fields in the past, so they will be happy)
3:59 PM
Disregard the part about flattening and using data_ prefix... doesn't work when tying to a fixed schema. If we had the event forwarding that bypassed event cache though...
a

alessandrogario

06/18/2020, 4:43 PM
Yeah, the problem there is that they are property bags written in different formats. Sometimes it's an array, sometimes a key-value
4:43 PM
I know it is a breaking change, but the old implementation had no tests and we had evidence that data was being silently lost
packetzero

packetzero

06/18/2020, 4:44 PM
yeah, I did a POC at one point, and used data_0, data_1, etc. when it was an array
a

alessandrogario

06/18/2020, 4:44 PM
there was no way to keep the compatibility because we would have to reproduce the parsing bugs
4:45 PM
and the reasoning i made was that if we have to break it let's break it with a new format that is stable and can be extended
s

seph

06/18/2020, 5:15 PM
sqlite has some json functions. You can get direct access to the json subfields, but it’s pretty ugly.
5:16 PM
I think json -> SIEM is easy, but also feels very different. It’s more osquery-as-log-agent. I think there’s a lot of power lurking around sql
5:16 PM
I’m pretty sure no one likes EAV. But it does handle all these cases 😉