< nyanshak>Regarding the problem with querying lots of hosts osquery #kolide

<@UE2STHUA3>Regarding the problem with querying lo...

daniel319b

01/07/2019, 7:09 PM

@nyanshakRegarding the problem with querying lots of hosts, I've found my problem. The problem was with Redis. I noticed in the logs that Redis was dropping the client's subscription a few minutes after I launched a live query and that's what caused it to stuck. I had to increase the "output buffer limit" in the Redis configuration and that fixed the problem.

🤞 1

zwass

01/07/2019, 9:12 PM

Glad to hear you worked that out! How high is the throughput of results you are working with?

nyanshak

01/07/2019, 9:49 PM

I upgraded ElastiCache node type to try to get a larger buffer size for Redis, but it didn't help at all. In my case, it doesn't even really look like redis is doing anything, as the CPU / network traffic is so low.

nyanshak

01/07/2019, 9:49 PM

Didn't fix my problem with querying

nyanshak

01/07/2019, 9:49 PM

@daniel319b what redis version / other config do you have?

nyanshak

01/07/2019, 10:00 PM

@zwass

daniel319b

01/08/2019, 9:53 PM

@zwass hmm do you mean the throughput in redis?

daniel319b

01/08/2019, 9:56 PM

@nyanshak Yeah I noticed that the CPU is very low on my redis as well. I'm using 3 fleet servers behind an NLB, 1 MySQL server, latest version, 1 Redis server, latest version. I played around with sql max connections I think its around 2000 now The Redis output buffer size limit is 512MB

daniel319b

01/08/2019, 9:57 PM

Have you checked your Redis logs?

daniel319b

01/08/2019, 9:57 PM

It's still not working as fluidly or fast as I'd like but it gets the job done

nyanshak

01/08/2019, 10:00 PM

I'm using elasticache and haven't checked redis logs, but I did bump the node size to where output buffer size was much larger (max size supported by amazon), but didn't see any improvement

nyanshak

01/08/2019, 10:00 PM

@daniel319b how many hosts do you have connected?

nyanshak

01/08/2019, 10:01 PM

and what are your settings for config_refresh and distributed_interval

daniel319b

01/08/2019, 10:04 PM

I have about 9K hosts I use the default values. When you query, do you select a label or "All Hosts"? Because I've noticed when I run a query on a label, its slower

nyanshak

01/08/2019, 10:06 PM

all hosts

nyanshak

01/08/2019, 10:06 PM

the query i'm running should be straightforward as well: select uuid from osquery_info;

daniel319b

01/08/2019, 10:10 PM

Whats your sql max conns? And how much RAM/CPU do your servers have? Maybe you need to upgrade?

daniel319b

01/08/2019, 10:12 PM

Oh never mind I saw the github issue

nyanshak

01/08/2019, 10:22 PM

Yeah... It's a db.m5.12xlarge and for fleet instances, c5.4xlarge

nyanshak

01/08/2019, 10:23 PM

I've tried max conns at 50, 256, 512, 2048, ...

nyanshak

01/08/2019, 10:24 PM

didn't seem to make much of a difference

3 Views

Open in Slack

Previous Next