In `file-format.md` docs: ``` # "additional" i...
# kolide
n
In
file-format.md
docs:
Copy code
# "additional" information to collect from hosts along with the host
    # details. This information will be updated at the same time as other host
    # details and is returned by the API when host objects are returned. Users
    # must take care to keep the data returned by these queries small in
    # order to mitigate potential performance impacts on the Fleet server.
    additional_queries:
      time: select * from time
      macs: select mac from interface_details
1. What is meant by "small" here? 2. How frequently do these queries get run by the hosts? Is there a way to specify intervals / reduce the intervals? <thread>
The reason I'm asking... I thought it might be useful to do something like this:
Copy code
additional_queries:
  rpm_packages: SELECT name, version, ... FROM rpm_packages;
And then be able to look at, say, installed packages for a given host.
but I get the sense from the note of caution in the docs that this might be dangerous / a bad idea... So I'm wondering if it is really a bad idea to do so, and if it is, if there's a better way to do this sort of thing in Fleet.
Edit: from reading code, I think the default interval is 1 hour. So I think I answered (2). And that this could be reduced further.
s
That data is added to the host's row in the db. So there will be performance impacts.
n
nods yeah that makes sense. I'm wondering if maybe it would have less impact to separate them into their own table / API endpoint / allow those queries to be run less frequently (say, once per day or something) 🤷 Or just in general if there's a good way to collect this data / expose it in a consumable way in fleet, but without being prohibitively expensive in terms of perf impact.
s
I mean. you can read the PR I think we talked about this 🙂
n
this one? https://github.com/kolide/fleet/pull/2236 I've read it, just trying to get some advice about this use case. I think what I'm hearing is I need to be careful and do my own testing. Which, to be fair, I would do my own testing before rolling this out to a large group of hosts eventually 🤷
s
I don’t know fleet super well, but can you do this by adding a pack?
n
Yes, but then I'm having to go look through Splunk (in my env) and try to just get the last results for a host. And some hosts might be offline for quite a while, which means the search might need to go back a long time. If this was in fleet, I'd get it "for free"
I'm already collecting in a pack actually, just considering having it be here instead for better usability
s
Yes, well…
I mean… I suspect fleet doesn’t handle this well. It’s one of the bit differences between the SaaS and fleet. The SaaS has a lot more about the presentation of this sort of data
z
I would advise experimenting to see whether your DB can handle updates of such large data.