i think i have an easy one today finally got my first denyli osquery #fleet

i think i have an easy one today... finally got my...

mason kemmerer

02/02/2024, 3:31 PM

i think i have an easy one today... finally got my first denylisted query on a host... so denylisted = 1... after 24 hours will that turn back to 0 on its own or is there a way to force that "bit" to reset back to 0?

Kathy Satterlee

02/02/2024, 9:46 PM

Hi @mason kemmerer! I haven't found a method of manually removing a query from the denylist, it will roll back to 0 once it expires.

mason kemmerer

02/02/2024, 10:23 PM

2 questions... are there any fleet documented ways to measure or potentially warn admins that a query is non-performant BEFORE the watchdog service denylists it? I love the Performance Impact column in the fleet webgui but wish it had a better mathematic explanation as what defines the thresholds: minimal, considerable, excessive I wish there was to eval this in live queries before they go live (scheduled) to all hosts Since starting this fleet-osquery endeavor I have had this query running every 5 minutes:

Copy code

SELECT name, query, interval, executions, last_executed, denylisted, output_size,
  IFNULL(system_time / executions, 0) AS avg_sys_time,
  IFNULL(user_time / executions, 0) AS avg_usr_time,
  IFNULL(wall_time / executions, 0) AS avg_wall_time,
ROUND((average_memory * '10e-7'), 2) AS avg_mem_mb 
FROM osquery_schedule;

When reviewing the results of the above query in splunk it seemed the logs were showing that the denylisted query's avg sys, usr, and wall times and memory usage were zero leading up to the query being denylisted on the host. which was a huge bummer. FWIW, i have the watchdog service set to default settings (200M or 10% cpu for 12 secs) which also makes me wonder if theres something i misunderstood... is that 10% of the total CPU on the system, available at query execution time, or even a single core? Just trying to make sense where my miss was so I can adjust my osquery and splunk dashboard accordingly

Kathy Satterlee

02/02/2024, 11:09 PM

The challenge there is that a query being deny-listed doesn't necessarily mean that it was non-performant.

Kathy Satterlee

02/02/2024, 11:10 PM

It just means that something triggered the watchdog when it was running - it could have been this query, but it could have been something else entirely.

Kathy Satterlee

02/02/2024, 11:11 PM

We've got a ticket together to document the logic behind the performance metrics... let me grab the TL:DR

Kathy Satterlee

02/02/2024, 11:12 PM

> If the median host runtime for the query is up to 2 seconds, we consider the impact "Minimal". Up to 4 seconds would be "Considerable". More than that is "Excessive". If it hasn't actually run successfully, it's "Undetermined".

Kathy Satterlee

02/02/2024, 11:14 PM

We've also recently started including live query runs in the performance metrics. So if you save a query and run it a few times, you'll start to see stats.

mason kemmerer

02/05/2024, 2:36 PM

All of that is super helpful (i didnt know that things outside of osquery could cause in the trigger of the watchdog) and thanks for explaining the impact definitions. Ah allowing live queries to determine performance before scheduling with all hosts!?! 🤌

mason kemmerer

02/05/2024, 4:22 PM

is that something coming in a later update of fleet or available in v4.43.0+?

Kathy Satterlee

02/05/2024, 7:15 PM

It isn't that things outside of osquery might trigger the watchdog, it's just that this particular query might not be the thing that did.

mason kemmerer

02/05/2024, 7:20 PM

ah so would you say its more the culmination of all the queries running on the host, and the watchdog just happened to pick this query to denylist?

mason kemmerer

02/05/2024, 7:21 PM

sorry for all the followup questions, just trying to validate my understanding

Kathy Satterlee

02/05/2024, 8:29 PM

Totally valid questions.

Kathy Satterlee

02/05/2024, 8:30 PM

The watchdog is just looking at the overall CPU and Memory use of osquery. So the watchdog triggers based on a culmination of everything osquery is doing at the time. When it triggers, all queries that were running at the time are added to the denylist just in case they were the one that caused the watchdog to trigger.

mason kemmerer

02/05/2024, 8:31 PM

ah i guess in this particular case just that 1 query from what i saw was denylisted

Kathy Satterlee

02/05/2024, 8:33 PM

That's the most likely scenario, but it could also be something else that was happening in the background. It's interesting that the user time and system time were zero before the query was denylisted. What's the actual query?

mason kemmerer

02/05/2024, 8:33 PM

it was....

select * from process_memory_map;

Daily frequency

mason kemmerer

02/05/2024, 8:35 PM

historically the performance impact reported by fleet across all hosts was minimal... i think that host was particularly busy and i was disappointed I couldnt detect this query was problematic prior to being denylisted since it didnt seem this query was particularly not performant.

mason kemmerer

02/05/2024, 8:36 PM

if anything i think im learning, its not always the query itself that can cause a denylist.. but perhaps the condition of the host at query runtime its being denylisted on/at?

5 Views

Open in Slack

Previous Next