Heads up it looks we re waiting a little longer for 4 4 0 du osquery #core

Heads up it looks we’re waiting a little longer fo...

theopolis

06/15/2020, 11:38 PM

Heads up it looks we’re waiting a little longer for 4.4.0 due to wanting for https://github.com/osquery/osquery/pull/6280 to be solid. I fully support taking our time on that.

Mike Myers

06/16/2020, 5:42 PM

Alessandro should have some time tomorrow to give that attention, he said

alessandrogario

06/18/2020, 1:09 PM

We have finished with the second pass on the PR, and here's a summary of what we have changed: 1. The 'keywords' column has been changed from BIGINT to TEXT. This is because SQLite does not support std::uint64_t and Windows events sometimes return values that can't be represented with an std::int64_t 2. The tests have been changed: we have exported several different event types from different event channels. Those samples are used to compare the output we get with the output we expect 3. The

data

column has been slightly changed; instead of adding the fields directly inside a single object (example:

{ "key": "value"}

), we now have a root "Data" object (example:

{ "Data": { "key": "value" } }

). This allows us to extend this with additional data from the Windows Event Log XML without breaking compatibility

seph

06/18/2020, 2:21 PM

Ha. I have also been thwarted by sqlite’s lack of

uint64_t

😞 1

seph

06/18/2020, 2:36 PM

I don’t understand the justification on changing the data column. Not that I have a strong preference, but the justification seems weird.

seph

06/18/2020, 2:36 PM

It’s a json hash, what’s keeping you from extending it regardless of nesting level?

alessandrogario

06/18/2020, 2:37 PM

so the XML event has an EventData object, which is just a small part

seph

06/18/2020, 2:37 PM

I guess having

{data: {...}}

means you can add

{metadata: {}}

or something

alessandrogario

06/18/2020, 2:37 PM

and inside EventData, we have Data tags + other things

alessandrogario

06/18/2020, 2:37 PM

right now, we only have Data fields (like pre-refactor)

seph

06/18/2020, 2:37 PM

Cool. Makes sense. Thanks.

alessandrogario

06/18/2020, 2:38 PM

if we decide we want to add additional stuff the data column, it's easier if we keep things separat

alessandrogario

06/18/2020, 2:38 PM

what do you think? did my reasoning make sense?

alessandrogario

06/18/2020, 2:38 PM

I am not sure if we want to add new stuff right now, i was just trying to keep the door open 🤔

seph

06/18/2020, 2:38 PM

Yes, makes sense. You’re not adding new data fields, you’re adding something next to data.

seph

06/18/2020, 2:39 PM

For the

launcher

tables, I’ve struggled a bit with this. For runtim e variable/unknown data fields, there’s no good translation to columns. I don’t like dealing with

json

data, it’s hard to merge/sort/join on it. I usually end up with an EAV model.

seph

06/18/2020, 2:40 PM

It’s a bit weird to with, but it becomes really powerful in handling arbitrary data structures

alessandrogario

06/18/2020, 2:45 PM

Yeah, I also thought having JSON in colums to be weird

alessandrogario

06/18/2020, 2:45 PM

usually everyone tells me they love json in there because their log aggregators can easily handle it

packetzero

06/18/2020, 3:49 PM

Nice! The new features are good. They do break existing functionality though. I would recommend having FLAG's to enable this new functionality? Typically, a schema would have new added columns rather than changing existing? Also, I think most servers flatten result structures, why not go with columns with a 'data_' prefix rather than adding the data object level?

packetzero

06/18/2020, 3:50 PM

(researchers have asked me for direct access to the data fields in the past, so they will be happy)

packetzero

06/18/2020, 3:59 PM

Disregard the part about flattening and using data_ prefix... doesn't work when tying to a fixed schema. If we had the event forwarding that bypassed event cache though...

alessandrogario

06/18/2020, 4:43 PM

Yeah, the problem there is that they are property bags written in different formats. Sometimes it's an array, sometimes a key-value

alessandrogario

06/18/2020, 4:43 PM

I know it is a breaking change, but the old implementation had no tests and we had evidence that data was being silently lost

packetzero

06/18/2020, 4:44 PM

yeah, I did a POC at one point, and used data_0, data_1, etc. when it was an array

alessandrogario

06/18/2020, 4:44 PM

there was no way to keep the compatibility because we would have to reproduce the parsing bugs

alessandrogario

06/18/2020, 4:45 PM

and the reasoning i made was that if we have to break it let's break it with a new format that is stable and can be extended

seph

06/18/2020, 5:15 PM

sqlite has some json functions. You can get direct access to the json subfields, but it’s pretty ugly.

seph

06/18/2020, 5:16 PM

I think json -> SIEM is easy, but also feels very different. It’s more osquery-as-log-agent. I think there’s a lot of power lurking around sql

seph

06/18/2020, 5:16 PM

I’m pretty sure no one likes EAV. But it does handle all these cases 😉

2 Views

Open in Slack

Previous Next