:wave: Questions about Performance Impact - in th...
# fleet
l
👋 Questions about Performance Impact • in the Fleet web UI (screenshot), most of my queries show up as Undetermined. after i run a live query over the UI, they would show a value (like Minimal, Excessive). May I know if that’s intentional? • what’s the threshold for Minimal/Excessive performance impact? (e.g. how much CPU/memory usage is considered Excessive?) thanks in advance!
r
@Lichao Li Have these queries had a chance to run yet? The performance impact is reported with query results, so you won't see any data until after the query runs (either as a live query or until the scheduled interval has passed).
l
Have these queries had a chance to run yet?
i checked the UI again and all the queries with ‘Frequency’ set have performance impact values. otherwise I need to trigger a live query for it to appear. looks good. Thanks for the doc link. it looks like fleet’s performance impact is determined by Duration (the time taken for a query to execute). which explains why when fleet reported a query as ‘Minimal’, Osquery profile.py reported the same query as ‘highest impact’. will you consider adding CPU/fd/memory usage to the performance impact calculation?
k
If that data were available from osquery, I could see Fleet pulling it in to the impact calculations. As it is now, the time is all that's recorded.
l
If that data were available from osquery
AFAIK they are not available in osquery db. you need to download the script then run it against an osquery.conf (with the queries).
On the same topic (of CPU performance safety), I found 3 columns in the osquery_schedule table which could be relevant: • system_time • user_time • wall_time_ms
(system_time + user_time) * 100.0 / wall_time_ms
provides the average CPU utilization (%) of each query. Reference: how
top
calculates `%CPU`: https://man7.org/linux/man-pages/man1/top.1.html
Copy code
%CPU  --  CPU Usage
           The task's share of the elapsed CPU time since the last screen
           update, expressed as a percentage of total CPU time.
Some caveats, in a sample live query of
osquery_schedule
I captured: • sometimes
wall_time_ms
is zero. it’s good to check for nonzero before division. • sometimes stime and utime are 10ms each, and wall time is 1ms. the CPU utilization shows as 2000%, but those queries are fast and performant: ◦ when wall time (elapsed time) is smaller than stime/utime, and since we know
osqueryd
runs on a single core, we can probably ignore such cases (e.g. change the result to 0) Let me know if I miss anything 🙏
u
@Lichao Li (osquery) This would make an interesting and worthwhile feature request! If you'd like to submit one, the requests go straight to our Product team for review!