Title
#fleet
n

nyanshak

03/08/2021, 10:18 PM
Fleet YAML questions <thread>
10:22 PM
Is it possible to target specific labels (or hosts, whatever targeting options) for specific queries within a pack, or is targeting always done at the pack level? I noticed on the query spec, there's a
support
block in this doc: https://github.com/fleetdm/fleet/blob/master/docs/1-Using-Fleet/2-fleetctl-CLI.md. Is that new? I've been using
platform: darwin
(or windows, etc). Are there docs on what type of things can be in the support block? Looks like I can see
osquery
,
platforms
(list), and
launcher
. Might make sense to support some sort of per-query targeting (perhaps in the
support
block?)
10:24 PM
For context, I have this use case (and a similar one): I want to set up these two sets of macOS hosts: • 'ESF' group: criteria: has macOS version >= 10.15.0 AND has osquery version >= $unreleasedOsqueryVersion • 'process_events' group: criteria: darwin platform and not in ESF group Then I want to target an es_process_events query to one group and the process_events group to another.
10:26 PM
I imagine it wouldn't be difficult to do this with labels, but it would be nice if I didn't have to split them into separate packs. 🤷‍♀️ Not a huge deal, ultimately, but nicer experience to be able to target in more ways.
Noah Talerman

Noah Talerman

03/09/2021, 10:45 PM
Hi @nyanshak . (This is unrelated to the specifics of this discussion) Your feedback and discussion threads are much appreciated :thanks:
n

nyanshak

03/09/2021, 10:45 PM
🤗
Noah Talerman

Noah Talerman

03/09/2021, 10:45 PM
is targeting always done at the pack level
Yes, targeting is always done at the pack level.
10:47 PM
The query spec you found actually doesn’t reflect the query configuration options available in Fleet. The
support
block isn’t supported.
10:48 PM
Kind of a whoops in the docs so I’m submitting a PR now to have the
support
block removed
10:50 PM
Our guess is that the vision for query configs in Fleet included the
support
block but the option was never actually implemented. That being said, I’d like to understand what you’re trying to accomplish by proposing the ability to target at the query level.
n

nyanshak

03/09/2021, 10:54 PM
I'll give one example, but I could think up a good handful of them given a few minutes. Imagine I have some pack called
macOS monitoring
. In this pack, I have a query called
process_events
. In osquery 4.8.0 (🤞), there will be a new table called
es_process_events
that collects process events from endpoint security framework. If I want to switch to
es_process_events
, I need two things:1. macOS is updated everywhere to 10.15+ 2. macOS osquery clients are at 4.8.0+ (or whatever the version where this gets released). Logically, this query still belongs in
macOS monitoring
pack, but I don't want to run both queries on
macOS
hosts, just the query that's supported.
10:56 PM
Yes, I understand that I could make individual packs targeting labels that I make for just these two queries. There is a workaround, but IMO better experience is to support more targeting options. It probably falls into the 'nice-to-have' bucket rather than 'must-have'.
Noah Talerman

Noah Talerman

03/09/2021, 11:02 PM
Got it. So the current workaround becomes a pain because you’re creating new packs and labels just to run two different queries on two groups of hosts. With better targeting options you would no longer need to do this. Just adjust some condition on the query that tells it which hosts to run against.
n

nyanshak

03/09/2021, 11:03 PM
:nods: yeah, exactly
11:03 PM
and then additionally, you may need to adjust any alerts on the original query (since you'll need to move the original query into a new pack)
11:04 PM
whereas with better targeting, you really only need to set up new detection rules around the new query, but can leave the original one alone
Noah Talerman

Noah Talerman

03/09/2021, 11:17 PM
you’ll need to move the original query into a new pack
Why is this? Would you need to move the original query into a new pack because now the pack only targets a subset of the original set of hosts? In your example you can’t just leave
macOS monitoring
as is because it no longer targets all
macOS
machines (instead a subset of them)
n

nyanshak

03/09/2021, 11:19 PM
Right so, in the example:
---
pack definition
queries:
  - queryA
  - queryB
  - query...Etcetera
  - process_events
Imagine this is the original pack and it's targeted to macOS hosts (all macOS hosts)
11:19 PM
To do the targeting, I now have to make 2 separate packs (macOS process_events pack, and macOS es_process_events pack) and move the original query into the new pack
11:19 PM
So that I can target it correctly
11:20 PM
Does that make sense?
Noah Talerman

Noah Talerman

03/09/2021, 11:27 PM
I’m getting there, I think 🙂 Why is it not sufficient in this scenario to only create 1 new pack and leave the original query in the old pack? And alter the targeting so that the a set of hosts are targets of the old pack (old query) and different set of hosts are targets of the new pack (new query)
n

nyanshak

03/09/2021, 11:27 PM
only one query (process_events) is affected by this issue
11:28 PM
the rest of the queries still want to be targeted at all macOS hosts
11:28 PM
if we alter the original pack to shrink the criteria, then all the other queries in that pack won't get run on the other set of hosts
Noah Talerman

Noah Talerman

03/09/2021, 11:33 PM
Aha right. What if the 1 new pack mirrored the old pack (has all the same queries) except process_events is swapped with es_process_events. Do we encounter the same issue?
n

nyanshak

03/09/2021, 11:42 PM
This does kind of work, but presents some style / best practice (totally subjective / my opinion). One way to arrange config is to just have everything in one large file. This has... obvious problems, in that it can be tricky to navigate / make sense of one giant yaml file. There's several (IMO) better options. For example, you can split something like:
pack_a/
  queries.yml
  pack.yml
pack_b/
  queries.yml
  pack.yml
This isn't exactly what I do but it's similar and helpful for illustrating. In this example, the queries for a given pack are stored in the directory for a pack, so it's easy to reference the list of queries for a pack. --- In the 'just create a new identical pack' scenario, you can no longer manage the queries the same way, because applying the queries under one pack will overwrite the query definitions in another pack, and also you now have to maintain two copies of the pack. You could just keep a comment in pack_b that says "hey, this query is defined in pack_a". Basically, organizationally it could be problematic. --- And also, if I have another query that needs targeting specifically, now instead of two packs, I need at minimum four packs. Next split results in eight, then sixteen, etc. Using packs to get around query targeting will create an exponential growth in number of packs used.
11:44 PM
I'm not sure what the right answer is exactly, but it's been a persistent frustration.
Noah Talerman

Noah Talerman

03/09/2021, 11:46 PM
you now have to maintain two copies of the pack
Right and as you said this definitely doesn’t scale as you try to add more targeting by query.
12:01 AM
I’m not sure what the right answer is exactly, but it’s been a persistent frustration.
Totally understand. And my questioning / lack of understanding doesn’t help on the frustration front. Your explanations/examples are immense for my understanding. I too want to reach a good answer for this frustration. The method for arranging configurations you explained is very interesting. It seems logical to tie a specific pack to its set of queries. At a high level the issue we’ve been discussing seems to stem from new information (es_process_events_table) being only available for a specific criteria of hosts. The flexibility necessary to acquire this new info (new query) while still acquiring the rest of the info (the other queries) isn’t supported well by the pack level targeting. The new information is tied to a specific query. ^This is kind of a jumble and I’m typing as I think. Generally, I’m now curious if there are other ways to support the query level of flexibility without having targets at the query level.
3:52 PM
Logically, this query still belongs in 
macOS monitoring
 pack, but I don’t want to run both queries on 
macOS
 hosts, just the query that’s supported.
Hi @nyanshak I’m revisiting this discussion. Why is it undesirable to run both queries on
macOS
hosts? Apologies if the answer to this question was already discussed.
n

nyanshak

03/23/2021, 3:54 PM
every query that's run introduces overhead on the host and for a large fleet - network / storage costs Each query: going to take up extra CPU & memory to run, going to take up network bandwidth to send the data, if logging to disk, going to use more disk space, etc. These can be expensive queries, so we definitely wouldn't want to run both queries on the same host.
3:55 PM
in this case, it's roughly the same data, but we'd prefer one source over the other if it's supported on that host
Noah Talerman

Noah Talerman

03/23/2021, 3:59 PM
we’d prefer one source over the other if it’s supported on that host
Got it. Am I right when thinking that the ultimate motivation for this is similar to the performance/cost motivation discussed in this thread?
n

nyanshak

03/23/2021, 4:03 PM
Yes, so to paint a picture: Story 1: A person at the company has a previous-generation computer, maybe not as powerful as what is currently provided, but not upgraded yet. Osquery doing extra things on this machine may cause them to notice things running slowly, battery drain, etc. Story 2: A user is tethered to their phone on a limited data plan. Osquery sending duplicate data is wasting their data plan. Story 3: A service is running osquery. The extra overhead from osquery causes the service to have to use larger instance sizes for their auto-scaling group. This causes the overall running cost of the service to go up.
4:03 PM
^ Those are the things that are important to me when I'm thinking about osquery performance & overhead.
Noah Talerman

Noah Talerman

03/23/2021, 5:24 PM
Awesome. These stories are helpful for my understanding on the goals you’re trying to accomplish by reducing osquery overhead. Are you currently recording/estimating osquery overhead & performance on a per device basis? If yes, how?
n

nyanshak

03/23/2021, 5:26 PM
https://dactiv.llc/blog/osquery-performance-at-scale/ - to some extent, yes. We're recording results from the
osquery_schedule
table, which shows stats for each scheduled query execution. We then make dashboards based on this and use it to help tune poorly-performing queries.
5:26 PM
It's extremely similar to how Zach describes it in the linked presentation
Noah Talerman

Noah Talerman

03/23/2021, 5:49 PM
Cool! It seems like you’ve landed on a method for recording and tuning performance on a per query basis. I’m curious if it would be helpful to record/display osquery performance on a per host basis. For example, Fleet reveals information on CPU and memory usage for osquery on a particular host. In addition to having data that shows improved query performance, you also have data that can display osquery performance on a host and verify that you’ve cut down osquery’s overhead.
n

nyanshak

03/23/2021, 5:53 PM
Yeah definitely could see that being useful. I have looked at individual host data when there's some sort of problem but it's more ad-hoc / don't have some dashboard pre-defined currently.
5:53 PM
Mostly looking at data across some specific aggregation of hosts.
Noah Talerman

Noah Talerman

03/23/2021, 6:02 PM
Got it. So in the past you’ve looked at individual host data across a specific set of hosts.
6:02 PM
Mostly in ad-hoc investigation situations. Less often as a benchmark for assessing and improving osquery overhead
n

nyanshak

03/23/2021, 6:03 PM
Scenarios I've mostly used it: • Across all hosts, what are the most expensive queries (by average/max memory, system cpu time, user cpu time, denylisted queries)? • Across X group of hosts (workstations, servers, for example), same question as above?
6:04 PM
A less common scenario is when a user reports something like "osquery seems to be using quite a lot of $resource on X host, or Y set of hosts"
6:04 PM
And digging into the data for those specifically reported instances
6:04 PM
But it's less common as it would usually only be reported on hosts that are hitting some very extreme edge case
Noah Talerman

Noah Talerman

03/23/2021, 6:17 PM
These scenarios seem aimed at attempting to answer the question (I may be over simplifying): • Which queries can we tune to reduce osquery overhead? Do your current dashboards (using
osquery_schedule
table) allow you to answer this 2nd question: • What’s the measurable amount we’ve reduced osquery overhead on a host? Is this the answer to the 2nd question even valuable?
n

nyanshak

03/23/2021, 6:21 PM
Yeah I don't think it's a perfectly-solved problem yet. There's certainly room for improvement. Osquery by itself (running no queries) has little to no overhead. So yes, the focus is typically on finding the most expensive (by some definition of expensive) queries. Definitions I've found useful: • memory usage (both avg & max) • system / user cpu time • # of results (for queries that might be returning super-noisy results / things we should be tuning to not return so many results) • denylisted queries
6:22 PM
What’s the measurable amount we’ve reduced osquery overhead on a host?
In our scenarios, we're not looking at an individual host except in the problematic case. And the measure we'd use would be graphing osquery resource usage on the host over some time period to collect cpu / memory stats, for example, and comparing with one version of the query vs another version
6:22 PM
But yeah, this is not super easy