We send our osquery data to Firehose which then is crawled b osquery #general

We send our osquery data to Firehose, which then i...

Johan Edholm

02/20/2019, 10:43 AM

We send our osquery data to Firehose, which then is crawled by Glue and then we query it via Athena. Osquery sends up JSON, but that JSON is a bit different for each osquery task. This makes Athena angry since Glue will crawl things and create tables how it think things look. For example the "columns" key in the JSON differes, so the generated struct won't work for all resluts osquery sends up, so Athena throws an error. We have a workaround at the moment where we just handle "columns" as a string, but then ewe of course can't search in it the same way. Do you have any better solutions for this? Am I missing something or how do people usually do this?

seph

02/20/2019, 4:52 PM

You can run osquery in snapshot mode, then you won’t get diffs. But, you will get more data.

seph

02/20/2019, 4:54 PM

You can write software to take the diffs and reconstruct state from them. I wrote https://github.com/directionless/osquery-host-tracker at my last job.

seph

02/20/2019, 4:54 PM

I don’t know those tools, but you might be able to write something to take a diff and convert it into an add or remove.

Johan Edholm

02/20/2019, 6:29 PM

It's not those kinds of diffs though. Since there are different queries different data will be returned (eg. the osquery "heartbeat" query vs the

select * from crontab

will have different looking JSONs)

Johan Edholm

02/20/2019, 6:29 PM

But thanks for the tips! Good to know those things either way

seph

02/20/2019, 6:51 PM

seph

02/20/2019, 6:51 PM

Yes, results are returned in the schema of the query

Johan Edholm

02/20/2019, 6:55 PM

Indeed! And that makes Athena/Glue sad. I haven't seen a way to direct different schemas to different firehose streams or some other way of being able to have them nicely in Athena. The workaround we have now is to convert the "columns" to a string (in the Glue crawler config), but that limits your ability to filter etc when working in Athena

seph

02/20/2019, 7:33 PM

I don’t really know athena or glue.

seph

02/20/2019, 7:33 PM

If this were sql, I’d suggest sending different queries to different tables

seph

02/20/2019, 7:34 PM

Or using something that supported a wide sparse schema

Alejandro

03/10/2020, 4:24 PM

Hi @Johan Edholm we are looking at POCing a similar solution with Athena and Glue, I wonder if you managed to get any further with this? And any learnings on the way? Thanks

3 Views

Open in Slack

Previous Next