Osquery uses basic SQL commands to leverage a relational data-model to describe a device.

osquery

<@UNNGYHBE1> a few resources out there:

<https://osquery.slack.com/archives/C0FHNQ2N6/p1588614325123400?thread_ts=1588614280.123300&amp;cid=C0FHNQ2N6>

<https://defensivedepth.com/osquery-handouts/>

Oh this is exactly what I was looking for thank you!

Is there a way to get the profiling tools without having to build osquery from source?

<@UNNGYHBE1> There is always the osqueryi `.timer ON` mode

which will tell you how long a given query took to run

that's something I use, but I want to double-y make sure queries don't affect cpu/memory before rolling out

IIRC you can point the profiler to an osqueryi binary? Or you can drop the binary where the profiler expects?

I don't understand?  the profiler isn't there when installed through the package manager (yum).

so I'd have to find the profiler binary and drop it on the instance itself?

The profiler is a python script in the osquery repository <https://github.com/osquery/osquery/blob/master/tools/analysis/profile.py>

May I ask why it's not a part of the osquery package?

also :facepalm: I can just get `tools` onto my test instance

I filed as a feature request? <https://github.com/osquery/osquery/issues/6481>

I don’t think that script is polished enough to ship

A snippet from my response on github:
&gt; I think there’s also a fundamental issue where the profiler can only guess at likely cpu and memory consumption. Much depends on your fleet specifics. Things like the shard parameter, or create test groups of hosts is going to be a more real-world way to handle this.


I see, how does `shard` work if it's distributed among 100s of servers, each with a config of `shard: 1`?

~1% of those servers will schedule the query

Right.  if that's in `osquery.conf` how do the servers know they're a part of that 1%?

Standard distributed systems trickery. From the docs:
&gt; The shard key works by hashing the hostname then taking the quotient 255 of the first byte. This allows us to select a deterministic ‘preview’ for the query, this helps when slow-rolling or testing new queries.


IIRC they use a hash of the UUID or something like that. It's looks random but it's a deterministic process.