@nyanshakRegarding the problem with querying lots of hosts, I've found my problem.
The problem was with Redis.
I noticed in the logs that Redis was dropping the client's subscription a few minutes after I launched a live query and that's what caused it to stuck.
I had to increase the "output buffer limit" in the Redis configuration and that fixed the problem.
Glad to hear you worked that out! How high is the throughput of results you are working with?
01/07/2019, 9:49 PM
I upgraded ElastiCache node type to try to get a larger buffer size for Redis, but it didn't help at all. In my case, it doesn't even really look like redis is doing anything, as the CPU / network traffic is so low.
Didn't fix my problem with querying
@daniel319b what redis version / other config do you have?
01/08/2019, 9:53 PM
@zwass hmm do you mean the throughput in redis?
@nyanshak Yeah I noticed that the CPU is very low on my redis as well.
I'm using 3 fleet servers behind an NLB, 1 MySQL server, latest version, 1 Redis server, latest version.
I played around with sql max connections I think its around 2000 now
The Redis output buffer size limit is 512MB
Have you checked your Redis logs?
It's still not working as fluidly or fast as I'd like but it gets the job done
01/08/2019, 10:00 PM
I'm using elasticache and haven't checked redis logs, but I did bump the node size to where output buffer size was much larger (max size supported by amazon), but didn't see any improvement
@daniel319b how many hosts do you have connected?
and what are your settings for config_refresh and distributed_interval
01/08/2019, 10:04 PM
I have about 9K hosts
I use the default values.
When you query, do you select a label or "All Hosts"? Because I've noticed when I run a query on a label, its slower
01/08/2019, 10:06 PM
the query i'm running should be straightforward as well: select uuid from osquery_info;
01/08/2019, 10:10 PM
Whats your sql max conns?
And how much RAM/CPU do your servers have? Maybe you need to upgrade?
Oh never mind I saw the github issue
01/08/2019, 10:22 PM
Yeah... It's a db.m5.12xlarge and for fleet instances, c5.4xlarge