We're having to roll back FleetDM to an earlier ve...
# fleet
j
We're having to roll back FleetDM to an earlier version (much earlier). The detail updates put so much load on our DB it becomes non-functional, even with our actual queries completely turned off. We're using the largest RDS instance AWS has, and it is just completely crushed. It's so bad, we're going to have to start exploring other central management solutions. 😞
k
Which version where you on and which version did you roll back to?
t
hi Jocelyn, I'm sorry to hear you're having so much trouble. Based on our previous conversation, we are going to be implementing a worker style architecture to have more control over how many instances write to the db, which will give setups like yours much finer grained control. We will try to have this ready for 4.4.0 which is expected to be released in the beginning of October
I understand this doesn't help with your current situation, but I just wanted to let you know that we are working on these things
j
we were on 4.3.0 and we're rolling back to 3.6.0.
z
@Jocelyn Bothe can you confirm that 3.6.0 is stable? If so, we can look at what changed between those versions and address any issues that we identify.
I wonder also what you have your
distributed_interval
set to? You'd want to use 60 or 360 seconds rather than the typical 10 if you're at 160k hosts.
j
yup
Copy code
--config_refresh=600
--config_accelerated_refresh=60
--distributed_interval=60
so you can see where the load on the DB writer is coming from:
(fwiw, the read replicas were all doing great, the issue is all with the writer)
t
makes sense. The plan is to allow for much more granular control over writes, which will become eventually consistent rather than how it is today if you use it, but it will reduce the load on the writer
z
In the meantime, you could also make the Fleet
label_update_interval
and
detail_update_interval
longer so that fewer writes are generated for those top two queries you list.
j
label update was set to 120m and detail_update was set to 1440m
z
We're going to push a couple more improvements to label performance in the 4.3.1 release which we will get out by the middle of this week -- hopefully they will help.
That 4.3.1 is pushed -- Let us know if it helps at all.