We re having to roll back FleetDM to an earlier version much osquery #fleet

We're having to roll back FleetDM to an earlier ve...

Jocelyn Bothe

09/20/2021, 3:07 PM

We're having to roll back FleetDM to an earlier version (much earlier). The detail updates put so much load on our DB it becomes non-functional, even with our actual queries completely turned off. We're using the largest RDS instance AWS has, and it is just completely crushed. It's so bad, we're going to have to start exploring other central management solutions. 😞

Keith Swagler

09/20/2021, 3:38 PM

Which version where you on and which version did you roll back to?

Tomas Touceda

09/20/2021, 3:54 PM

hi Jocelyn, I'm sorry to hear you're having so much trouble. Based on our previous conversation, we are going to be implementing a worker style architecture to have more control over how many instances write to the db, which will give setups like yours much finer grained control. We will try to have this ready for 4.4.0 which is expected to be released in the beginning of October

Tomas Touceda

09/20/2021, 3:54 PM

I understand this doesn't help with your current situation, but I just wanted to let you know that we are working on these things

Jocelyn Bothe

09/20/2021, 4:41 PM

we were on 4.3.0 and we're rolling back to 3.6.0.

zwass

09/20/2021, 4:51 PM

@Jocelyn Bothe can you confirm that 3.6.0 is stable? If so, we can look at what changed between those versions and address any issues that we identify.

zwass

09/20/2021, 4:57 PM

I wonder also what you have your

distributed_interval

set to? You'd want to use 60 or 360 seconds rather than the typical 10 if you're at 160k hosts.

Jocelyn Bothe

09/20/2021, 5:09 PM

yup

Copy code

--config_refresh=600
--config_accelerated_refresh=60
--distributed_interval=60

Jocelyn Bothe

09/20/2021, 5:10 PM

so you can see where the load on the DB writer is coming from:

Jocelyn Bothe

09/20/2021, 5:11 PM

(fwiw, the read replicas were all doing great, the issue is all with the writer)

Tomas Touceda

09/20/2021, 5:12 PM

makes sense. The plan is to allow for much more granular control over writes, which will become eventually consistent rather than how it is today if you use it, but it will reduce the load on the writer

zwass

09/20/2021, 5:14 PM

In the meantime, you could also make the Fleet

label_update_interval

and

detail_update_interval

longer so that fewer writes are generated for those top two queries you list.

Jocelyn Bothe

09/20/2021, 5:25 PM

label update was set to 120m and detail_update was set to 1440m

zwass

09/20/2021, 5:29 PM

We're going to push a couple more improvements to label performance in the 4.3.1 release which we will get out by the middle of this week -- hopefully they will help.

zwass

09/22/2021, 2:30 AM

That 4.3.1 is pushed -- Let us know if it helps at all.

2 Views

Open in Slack

Previous Next