Hello all, I'm wondering if anything is logged reg...
# general
d
Hello all, I'm wondering if anything is logged regarding the specific metrics tripped when a query is denylisted. We wrote some intensive queries (hashing files recursively) specifically to get them denylisted so that we could make sure we were catching those in our Splunk env. Now that we have queries getting denied, I want to understand specifically why they were denied, and if that is logged somewhere that's going to be the quickest way. Or is the recommended way to do this to run the profiling script?
z
Schedule a query (or live query) to the
osquery_schedule
table and look at the
denylisted
column. Though I do see you say "why they were denied" and I don't have a good answer for that.
d
@zwass - that's how we are finding them, we have a pack that runs
SELECT * FROM osquery_schedule WHERE denylisted='1';
and we have all of this in Splunk. I can see which hosts have denied queries and which queries were denied. I was just hoping something might get logged by the watchdog that said 'denied, over RAM allocation' or something like that
z
There is some logging around that (https://github.com/osquery/osquery/blob/master/osquery/core/watcher.cpp#L527-L537) but I'm not sure it will tell you which query was executing.
👍 1
d
Cool, thanks for that.
We really wanted an easy way to tell if a query exceeded CPU or RAM allocations if denylisted. Might have to start playing around with the profiler script I think since we aren't even logging locally, we are currently only using the TLS logger. Or we can add the filesystem logger plugin I guess.... need to discuss with the team. Thanks again
Actually, @zwass - can you confirm if that watchdog logging should also be sent via the TLS logger, or would it only write if filesystem was enabled?
z
I do believe it should go to TLS but haven't looked at that in a while. Let me know if you can confirm.
d
@zwass, will do. We definitely haven't seen it in Splunk, but let me look directly in Fleet logs a bit later
z
This would be in the "status" logs. Are you also forwarding those to splunk?
d
yep
and I looked at status logs for the hostnames / uuids with denied queries and didn't see anything, but maybe we have a field we need to extract
well, I'm not finding anything in the logs when i grep for deny, exceeded, or limits
z
Hmm, perhaps I'm wrong on that front. I could easily get nerd sniped into digging into this for the next hours but I have other things I gotta get to!
d
haha. well, I see nothing in staus.log about it
i think we'll enable the FS plugin on a couple hosts and see if we can find it there
👍 1
@zwass - I'm thinking ensure the logging is written via TLS logger seems like it could be considered an issue, or at least an enhancement. Do you think it would get any traction if I opened a git issue on it?
z
Yeah I think documenting it in an issue makes sense. As far as traction, not sure.
d
🙂
ok, thanks
@zwass, just thought I'd share what we found when looking into this deeper. I had one of our guys that knows C/C++ take a look at the code and it turns out that it does a syslog() then immediately writes to the a WARNING log call to the osquery logging system. So, even though we haven't quite figured out why we don't see these in the /var/log/osquery/osquery.WARNING logs on fleet, we can still find the information we need in /var/log/messages on the hosts that have denylisted queries, so that's something. Once we enable the filesystem plugin on some test hosts, I'm guessing we'll start to see the data in the local osquery.WARNING files generated, though I still think there needs to be a way to get these written to the tls logger as well.
z
Thank you for the update. I imagine this has something to do with the RocksDB connection. Maybe the worker is holding that connection and the watcher is the one logging the reason?
d
yes, I think it's definitely the parent process logging it (i.e. the watcher)
also, unfortunately, this is right in the code:
Copy code
// Since the watchdog cannot use the logger plugin the error message
      // should be logged to stderr and to the system log.
Looks like we might not be able to use tls to capture this, but I'm thinking the syslog plugin could be an option, though that's going to increase our Splunk usage a lot.
@zwass - looks like I'll probably have to change my git issue to ask if the watcher can be updated to write to plugins.