Title
#fleet
Jocelyn Bothe

Jocelyn Bothe

09/20/2021, 3:07 PM
We're having to roll back FleetDM to an earlier version (much earlier). The detail updates put so much load on our DB it becomes non-functional, even with our actual queries completely turned off. We're using the largest RDS instance AWS has, and it is just completely crushed. It's so bad, we're going to have to start exploring other central management solutions. 😞
Keith Swagler

Keith Swagler

09/20/2021, 3:38 PM
Which version where you on and which version did you roll back to?
Tomas Touceda

Tomas Touceda

09/20/2021, 3:54 PM
hi Jocelyn, I'm sorry to hear you're having so much trouble. Based on our previous conversation, we are going to be implementing a worker style architecture to have more control over how many instances write to the db, which will give setups like yours much finer grained control. We will try to have this ready for 4.4.0 which is expected to be released in the beginning of October
3:54 PM
I understand this doesn't help with your current situation, but I just wanted to let you know that we are working on these things
Jocelyn Bothe

Jocelyn Bothe

09/20/2021, 4:41 PM
we were on 4.3.0 and we're rolling back to 3.6.0.
zwass

zwass

09/20/2021, 4:51 PM
@Jocelyn Bothe can you confirm that 3.6.0 is stable? If so, we can look at what changed between those versions and address any issues that we identify.
4:57 PM
I wonder also what you have your
distributed_interval
set to? You'd want to use 60 or 360 seconds rather than the typical 10 if you're at 160k hosts.
Jocelyn Bothe

Jocelyn Bothe

09/20/2021, 5:09 PM
yup
--config_refresh=600
--config_accelerated_refresh=60
--distributed_interval=60
5:10 PM
so you can see where the load on the DB writer is coming from:
5:11 PM
(fwiw, the read replicas were all doing great, the issue is all with the writer)
Tomas Touceda

Tomas Touceda

09/20/2021, 5:12 PM
makes sense. The plan is to allow for much more granular control over writes, which will become eventually consistent rather than how it is today if you use it, but it will reduce the load on the writer
zwass

zwass

09/20/2021, 5:14 PM
In the meantime, you could also make the Fleet
label_update_interval
and
detail_update_interval
longer so that fewer writes are generated for those top two queries you list.
Jocelyn Bothe

Jocelyn Bothe

09/20/2021, 5:25 PM
label update was set to 120m and detail_update was set to 1440m
zwass

zwass

09/20/2021, 5:29 PM
We're going to push a couple more improvements to label performance in the 4.3.1 release which we will get out by the middle of this week -- hopefully they will help.
2:30 AM
That 4.3.1 is pushed -- Let us know if it helps at all.