Title
#fleet
Jocelyn Bothe

Jocelyn Bothe

09/01/2021, 7:46 PM
is there any way to get all labels to show up under pack targets? the search times out, and I can't select my custom label as a pack target
Sarah Gillespie

Sarah Gillespie

09/02/2021, 10:56 PM
Hi Jocelyn, are you still experiencing this issue? If so, would you be able to provide us what you are seeing in the browser network tab or with any additional info that might help with debugging?
Jocelyn Bothe

Jocelyn Bothe

09/07/2021, 3:56 PM
the search still times out, but I figured out a workaround by adding all the labels as targets, it reveals labels further down the list, and then I go back and remove the labels I don't want to target later. It would be nice if it would display more than 6 labels at a time
3:56 PM
I'm assuming the timeout is from trying to regex match with 160k hosts
3:57 PM
there aren't any customizable timeout settings for database searches, which would also be nice to have for large deployments like ours
3:57 PM
also a way to shard across multiple database hosts
3:58 PM
our read replicas get no traffic, but our primary writer gets very hot
Tomas Touceda

Tomas Touceda

09/07/2021, 5:59 PM
hi Jocelyn, I believe we addressed not all labels appearing as part of this PR: https://github.com/fleetdm/fleet/pull/1857 and also a rework we are doing of that UX. I recommend you try 4.3.0 (which will be released this week) and let us know if the issue persists
6:02 PM
the search still times out
do you happen to have the request that times out with more details?
our read replicas get no traffic, but our primary writer gets very hot
we added read replicas very recently (in fact, it's unreleased yet), can you tell me a bit more about your setup?
Jocelyn Bothe

Jocelyn Bothe

09/07/2021, 7:05 PM
we're using an AWS RDS aurora global cluster, using db.r5.8xlarge instances
7:05 PM
CPU for that is at 31%, and that's with all our queries currently off
7:05 PM
but with 160k hosts enrolled in FleetDM
7:06 PM
SelectThroughput is at ~30000 all the time
7:07 PM
doing something like adding a label takes that CPU up to 90+ and tears up the DB for 5-10 minutes
Tomas Touceda

Tomas Touceda

09/07/2021, 7:17 PM
gotcha. You'll probably benefit a lot from the read replicas addition in 4.3.0. Unless you use something like proxysql and automatically redirect queries to one db host or another, fleet doesn't currently (4.2.4) support read replicas. What
osquery.detail_update_interval
do you have setup in fleet serve?
Jocelyn Bothe

Jocelyn Bothe

09/07/2021, 7:19 PM
detail_update_interval: 1440m
Tomas Touceda

Tomas Touceda

09/07/2021, 8:28 PM
sounds good. Could you tell me a bit more about the rest of the infrastructure? i.e. how many instances, size, etc
Jocelyn Bothe

Jocelyn Bothe

09/08/2021, 4:58 PM
we run on c5.xlarge in two regions, 40 in one and 70 in the other, and we use kinesis firehose for logging, sending to an on-prem splunk.
Tomas Touceda

Tomas Touceda

09/08/2021, 9:10 PM
that's very helpful, thank you. How did you end up with those numbers? Are you autoscaling based on CPU usage or something like that? or were those empirical numbers you got to as you scaled the amount of hosts?
Jocelyn Bothe

Jocelyn Bothe

09/09/2021, 2:21 PM
we've been running into an issue where periodically Fleet will run into some kind of problem, and eat 100% of the instance's memory. it's hard to troubleshoot, because once it happens, you can't even log on to the host. we added the AWS cloudwatch agent and started alarming on mem util, and found it stopped happening if we scaled out enough. I was able to capture a stack trace from a host I happened to already be on, but it's 30k lines long.
Aug  3 18:34:34 ip-10-12-24-164 fleet: fatal error: runtime: out of memory
2:21 PM
2:26 PM
normal mem util is about 16%