question... running osquery 3.3.2 on several windo...
# general
s
question... running osquery 3.3.2 on several windows systems in a lab (win 7, server 2012/2016, win 10). using kolide on the frontend. if a system gets hit with a dozen or so queries, it eventually just hangs and never comes back with responses, kolide just sits there spinning waiting for it. i have to stop the service, taskkill the exe (since stopping the service doesn't always stop the osqueryd.exe), and restart the service. sometimes it's not responsive even after that, and i have to uninstall osquery and reinstall to get it to come back to life. anyone else have this happen on windows hosts?
c
Seems like there are a few odd issues with this one, but I can't say I've seen that one before. Just tried to repro on my Win10 laptop with no luck. If you start osqueryd in the foreground (from the exe not via service) with
--verbose
and
--tls_dump
you might get a better idea of what's going on
Specifically I would look to see what gets spit out in regards to queries being run/completed. It should show them getting picked up from kolide and run and then competed by osqueryd
s
When you say “kolide on the front end” do you mean Kolide Fleet as the TLS endpoint, or Kolide Launcher as the endpoint agent?
osquery provides several routes to DoS’ing a machine. You may be hitting a bug in a table, or you may just be self inflicting a DoS. Can you share the queries?
s
just saw this, thanks for responses. kolide on the frontend meaning fleet as the tls endpoint, not using launcher. most are just basic queries like select * from users, or listing files in a specific directory
s
Whatever is up, probably is not Kolide specific then.
And as represented, that doesn’t sound like it should be a DoS sort of query
Does running these from the osquery command line also exhibit the same behavior?
t
@shortstack I've noticed odd behavior trying to query
ntdomains
table on devices that are not on domain. Sometimes they hang for several minutes. I wonder if it is a specific query that causes vs a large amount.
s
having more issues today, trying to get the sha256 of a particular file. ~10 people trying to query the same system with the same query, kolide brings nothing back from osquery. i have to stop osquery, kill the exe, delete everything from within osquery.db folder, delete the host from the kolide ui, and then restart osquery. this is on win 7 hosts.
c
eeeeeeeesh. Is there a chance the host is just super under-provisioned for mem/cpu? Generally speaking, osquery isn't really designed to be hit with the query at exactly the same time by multiple people, but its also a pretty big fail if that level of load causes it to fall over soo..
hrmm
I'd be really curious to see what the logs say when you run the commands I listed above and hammer on it with those queries again
s
it's not under load, no 😞 < 50% mem (system has 2 cpu/4gb mem) and low cpu utilization. nothing in the logs on the host itself (at all, logs dir is empty), just query logs on the kolide side that show that it was submitted
c
yeah, seems unlikely to be the issue. Reason I wonder what the output of the above is, is because it would show if osquery is picking the queries up from kolide and just not returning them, or if its freezing before it ever gets to that point
s
How big is this file?