Fleet YAML questions lt thread gt osquery #fleet

Join Slack

Fleet YAML questions <thread>

# fleet

nyanshak

03/08/2021, 10:18 PM

Fleet YAML questions <thread>

nyanshak

03/08/2021, 10:22 PM

Is it possible to target specific labels (or hosts, whatever targeting options) for specific queries within a pack, or is targeting always done at the pack level? I noticed on the query spec, there's a

support

block in this doc: https://github.com/fleetdm/fleet/blob/master/docs/1-Using-Fleet/2-fleetctl-CLI.md. Is that new? I've been using

platform: darwin

(or windows, etc). Are there docs on what type of things can be in the support block? Looks like I can see

osquery

platforms

(list), and

launcher

. Might make sense to support some sort of per-query targeting (perhaps in the

support

block?)

nyanshak

03/08/2021, 10:24 PM

For context, I have this use case (and a similar one): I want to set up these two sets of macOS hosts: • 'ESF' group: criteria: has macOS version >= 10.15.0 AND has osquery version >= $unreleasedOsqueryVersion • 'process_events' group: criteria: darwin platform and not in ESF group Then I want to target an es_process_events query to one group and the process_events group to another.

nyanshak

03/08/2021, 10:26 PM

I imagine it wouldn't be difficult to do this with labels, but it would be nice if I didn't have to split them into separate packs. 🤷‍♀️ Not a huge deal, ultimately, but nicer experience to be able to target in more ways.

Noah Talerman

03/09/2021, 10:45 PM

Hi @nyanshak . (This is unrelated to the specifics of this discussion) Your feedback and discussion threads are much appreciated thanks

nyanshak

03/09/2021, 10:45 PM

🤗

Noah Talerman

03/09/2021, 10:45 PM

is targeting always done at the pack level

Yes, targeting is always done at the pack level.

Noah Talerman

03/09/2021, 10:47 PM

The query spec you found actually doesn’t reflect the query configuration options available in Fleet. The

support

block isn’t supported.

facepalm 2

Noah Talerman

03/09/2021, 10:48 PM

Kind of a whoops in the docs so I’m submitting a PR now to have the

support

block removed

Noah Talerman

03/09/2021, 10:50 PM

Our guess is that the vision for query configs in Fleet included the

support

block but the option was never actually implemented. That being said, I’d like to understand what you’re trying to accomplish by proposing the ability to target at the query level.

nyanshak

03/09/2021, 10:54 PM

I'll give one example, but I could think up a good handful of them given a few minutes. Imagine I have some pack called

macOS monitoring

. In this pack, I have a query called

process_events

. In osquery 4.8.0 (🤞), there will be a new table called

es_process_events

that collects process events from endpoint security framework. If I want to switch to

es_process_events

, I need two things: 1. macOS is updated everywhere to 10.15+ 2. macOS osquery clients are at 4.8.0+ (or whatever the version where this gets released). Logically, this query still belongs in

macOS monitoring

pack, but I don't want to run both queries on

macOS

hosts, just the query that's supported.

nyanshak

03/09/2021, 10:56 PM

Yes, I understand that I could make individual packs targeting labels that I make for just these two queries. There is a workaround, but IMO better experience is to support more targeting options. It probably falls into the 'nice-to-have' bucket rather than 'must-have'.

Noah Talerman

03/09/2021, 11:02 PM

Got it. So the current workaround becomes a pain because you’re creating new packs and labels just to run two different queries on two groups of hosts. With better targeting options you would no longer need to do this. Just adjust some condition on the query that tells it which hosts to run against.

nyanshak

03/09/2021, 11:03 PM

nods yeah, exactly

nyanshak

03/09/2021, 11:03 PM

and then additionally, you may need to adjust any alerts on the original query (since you'll need to move the original query into a new pack)

nyanshak

03/09/2021, 11:04 PM

whereas with better targeting, you really only need to set up new detection rules around the new query, but can leave the original one alone

Noah Talerman

03/09/2021, 11:17 PM

you’ll need to move the original query into a new pack

Why is this? Would you need to move the original query into a new pack because now the pack only targets a subset of the original set of hosts? In your example you can’t just leave

macOS monitoring

as is because it no longer targets all

macOS

machines (instead a subset of them)

nyanshak

03/09/2021, 11:19 PM

Right so, in the example:

Copy code

---
pack definition
queries:
  - queryA
  - queryB
  - query...Etcetera
  - process_events

Imagine this is the original pack and it's targeted to macOS hosts (all macOS hosts)

nyanshak

03/09/2021, 11:19 PM

To do the targeting, I now have to make 2 separate packs (macOS process_events pack, and macOS es_process_events pack) and move the original query into the new pack

nyanshak

03/09/2021, 11:19 PM

So that I can target it correctly

nyanshak

03/09/2021, 11:20 PM

Does that make sense?

Noah Talerman

03/09/2021, 11:27 PM

I’m getting there, I think 🙂 Why is it not sufficient in this scenario to only create 1 new pack and leave the original query in the old pack? And alter the targeting so that the a set of hosts are targets of the old pack (old query) and different set of hosts are targets of the new pack (new query)

nyanshak

03/09/2021, 11:27 PM

only one query (process_events) is affected by this issue

💡 1

nyanshak

03/09/2021, 11:28 PM

the rest of the queries still want to be targeted at all macOS hosts

nyanshak

03/09/2021, 11:28 PM

if we alter the original pack to shrink the criteria, then all the other queries in that pack won't get run on the other set of hosts

Noah Talerman

03/09/2021, 11:33 PM

Aha right. What if the 1 new pack mirrored the old pack (has all the same queries) except process_events is swapped with es_process_events. Do we encounter the same issue?

nyanshak

03/09/2021, 11:42 PM

This does kind of work, but presents some style / best practice (totally subjective / my opinion). One way to arrange config is to just have everything in one large file. This has... obvious problems, in that it can be tricky to navigate / make sense of one giant yaml file. There's several (IMO) better options. For example, you can split something like:

Copy code

pack_a/
  queries.yml
  pack.yml
pack_b/
  queries.yml
  pack.yml

This isn't exactly what I do but it's similar and helpful for illustrating. In this example, the queries for a given pack are stored in the directory for a pack, so it's easy to reference the list of queries for a pack. --- In the 'just create a new identical pack' scenario, you can no longer manage the queries the same way, because applying the queries under one pack will overwrite the query definitions in another pack, and also you now have to maintain two copies of the pack. You could just keep a comment in pack_b that says "hey, this query is defined in pack_a". Basically, organizationally it could be problematic. --- And also, if I have another query that needs targeting specifically, now instead of two packs, I need at minimum four packs. Next split results in eight, then sixteen, etc. Using packs to get around query targeting will create an exponential growth in number of packs used.

nyanshak

03/09/2021, 11:44 PM

I'm not sure what the right answer is exactly, but it's been a persistent frustration.

Noah Talerman

03/09/2021, 11:46 PM

you now have to maintain two copies of the pack

Right and as you said this definitely doesn’t scale as you try to add more targeting by query.

Noah Talerman

03/10/2021, 12:01 AM

I’m not sure what the right answer is exactly, but it’s been a persistent frustration.

Totally understand. And my questioning / lack of understanding doesn’t help on the frustration front. Your explanations/examples are immense for my understanding. I too want to reach a good answer for this frustration. The method for arranging configurations you explained is very interesting. It seems logical to tie a specific pack to its set of queries. At a high level the issue we’ve been discussing seems to stem from new information (es_process_events_table) being only available for a specific criteria of hosts. The flexibility necessary to acquire this new info (new query) while still acquiring the rest of the info (the other queries) isn’t supported well by the pack level targeting. The new information is tied to a specific query. ^This is kind of a jumble and I’m typing as I think. Generally, I’m now curious if there are other ways to support the query level of flexibility without having targets at the query level.

Noah Talerman

03/23/2021, 3:52 PM

Logically, this query still belongs in
macOS monitoring
pack, but I don’t want to run both queries on
macOS
hosts, just the query that’s supported.

Hi @nyanshak I’m revisiting this discussion. Why is it undesirable to run both queries on

macOS

hosts? Apologies if the answer to this question was already discussed.

nyanshak

03/23/2021, 3:54 PM

every query that's run introduces overhead on the host and for a large fleet - network / storage costs Each query: going to take up extra CPU & memory to run, going to take up network bandwidth to send the data, if logging to disk, going to use more disk space, etc. These can be expensive queries, so we definitely wouldn't want to run both queries on the same host.

ty 1

nyanshak

03/23/2021, 3:55 PM

in this case, it's roughly the same data, but we'd prefer one source over the other if it's supported on that host

Noah Talerman

03/23/2021, 3:59 PM

we’d prefer one source over the other if it’s supported on that host

Got it. Am I right when thinking that the ultimate motivation for this is similar to the performance/cost motivation discussed in this thread?

nyanshak

03/23/2021, 4:03 PM

Yes, so to paint a picture: Story 1: A person at the company has a previous-generation computer, maybe not as powerful as what is currently provided, but not upgraded yet. Osquery doing extra things on this machine may cause them to notice things running slowly, battery drain, etc. Story 2: A user is tethered to their phone on a limited data plan. Osquery sending duplicate data is wasting their data plan. Story 3: A service is running osquery. The extra overhead from osquery causes the service to have to use larger instance sizes for their auto-scaling group. This causes the overall running cost of the service to go up.

nyanshak

03/23/2021, 4:03 PM

^ Those are the things that are important to me when I'm thinking about osquery performance & overhead.

Noah Talerman

03/23/2021, 5:24 PM

Awesome. These stories are helpful for my understanding on the goals you’re trying to accomplish by reducing osquery overhead. Are you currently recording/estimating osquery overhead & performance on a per device basis? If yes, how?

nyanshak

03/23/2021, 5:26 PM

https://dactiv.llc/blog/osquery-performance-at-scale/ - to some extent, yes. We're recording results from the

osquery_schedule

table, which shows stats for each scheduled query execution. We then make dashboards based on this and use it to help tune poorly-performing queries.

nyanshak

03/23/2021, 5:26 PM

It's extremely similar to how Zach describes it in the linked presentation

Noah Talerman

03/23/2021, 5:49 PM

Cool! It seems like you’ve landed on a method for recording and tuning performance on a per query basis. I’m curious if it would be helpful to record/display osquery performance on a per host basis. For example, Fleet reveals information on CPU and memory usage for osquery on a particular host. In addition to having data that shows improved query performance, you also have data that can display osquery performance on a host and verify that you’ve cut down osquery’s overhead.

nyanshak

03/23/2021, 5:53 PM

Yeah definitely could see that being useful. I have looked at individual host data when there's some sort of problem but it's more ad-hoc / don't have some dashboard pre-defined currently.

nyanshak

03/23/2021, 5:53 PM

Mostly looking at data across some specific aggregation of hosts.

Noah Talerman

03/23/2021, 6:02 PM

Got it. So in the past you’ve looked at individual host data across a specific set of hosts.

Noah Talerman

03/23/2021, 6:02 PM

Mostly in ad-hoc investigation situations. Less often as a benchmark for assessing and improving osquery overhead

nyanshak

03/23/2021, 6:03 PM

Scenarios I've mostly used it: • Across all hosts, what are the most expensive queries (by average/max memory, system cpu time, user cpu time, denylisted queries)? • Across X group of hosts (workstations, servers, for example), same question as above?

nyanshak

03/23/2021, 6:04 PM

A less common scenario is when a user reports something like "osquery seems to be using quite a lot of $resource on X host, or Y set of hosts"

nyanshak

03/23/2021, 6:04 PM

And digging into the data for those specifically reported instances

nyanshak

03/23/2021, 6:04 PM

But it's less common as it would usually only be reported on hosts that are hitting some very extreme edge case

Noah Talerman

03/23/2021, 6:17 PM

These scenarios seem aimed at attempting to answer the question (I may be over simplifying): • Which queries can we tune to reduce osquery overhead? Do your current dashboards (using

osquery_schedule

table) allow you to answer this 2nd question: • What’s the measurable amount we’ve reduced osquery overhead on a host? Is this the answer to the 2nd question even valuable?

nyanshak

03/23/2021, 6:21 PM

Yeah I don't think it's a perfectly-solved problem yet. There's certainly room for improvement. Osquery by itself (running no queries) has little to no overhead. So yes, the focus is typically on finding the most expensive (by some definition of expensive) queries. Definitions I've found useful: • memory usage (both avg & max) • system / user cpu time • # of results (for queries that might be returning super-noisy results / things we should be tuning to not return so many results) • denylisted queries

nyanshak

03/23/2021, 6:22 PM

What’s the measurable amount we’ve reduced osquery overhead on a host?

In our scenarios, we're not looking at an individual host except in the problematic case. And the measure we'd use would be graphing osquery resource usage on the host over some time period to collect cpu / memory stats, for example, and comparing with one version of the query vs another version

nyanshak

03/23/2021, 6:22 PM

But yeah, this is not super easy

4 Views

Open in Slack

Previous Next