I have not dug too deeply into this , Fleet docs s...
# fleet
g
I have not dug too deeply into this , Fleet docs suggest mysql 5.7 , I have noticed performance issues around labels , and policies on 5.7 but they’re less prominent on a 8.0 test instance I upgraded to. Is the fleet team developing against 8 or 5.7 ? I have yet to look yet through raw queries etc
t
we are developing against 5.7, support for 8 is not technically official yet, but it should work. Could you share a bit more about your setup, and the performance differences you are noticing?
we are working on improving performance there in different ways, the more information we have the better
g
So for 1000 hosts we’re seeing an additional 10-20% cpu utilisation per Device Policy added to the point where it no longer returns works after 5 policies are added . On an 8 mirrored setup we’re seeing at least a response on like hardware.
This is timeouts against
policies/manage
t
what version of fleet are you running?
g
latest.
t
4.3.1? could you tell me the output of the following sql query:
Copy code
select count(*) from policy_membership_history
?
g
Right now I can’t but I will once the DB begins to respond.
t
also, could you tell me what type of resources do you have for the db?
g
GCP 16gb RAM , 4vcpu
Fleet prior to v3 used to run with the same host count on a smaller instance of 1vcpu , 4gb ram however it was increased to handle the 3.3 migration issue.
Note I do expect additional DB load due to increased queries with new features so that is not a compliant.
t
in the meantime, if I'm understanding you correctly, what fails is when you navigate to the list of policies, correct?
g
Yes
Timeout on that endpoint with an error page to raise a GH issue.
t
yeah, new features are tricky that way
I have an idea of what might be failing, once I understand the scale of it based on your count, I can start thinking about strategies
I'm sorry you're facing this issue
g
These spikes out of context are trying to get results from compliance checks
And don’t be sorry, things happen we yolo latest to discover bugs and feedback.
🙏🏽 1
t
we might be running and storing policies too much: https://github.com/fleetdm/fleet/issues/2240
you might need to disable policies while we address this one, we have an idea of what it might be
👀 1
g
Yeah at this time I think I need to manually edit the db to remove the policies as it’s not a run time flag.
t
Copy code
delete from policies
should do the trick, and you can also clear the history with:
Copy code
delete from policy_membership_history
g
One item I am going to look at is does the number of running pods impact performance.
t
we'll publish a new version with a fix for this in the next day or two
g
select count(*) from policy_membership_history
may have an exceptionally high number of results in it as it’s currently hung until timeout.
Copy code
SELECT table_rows
    -> FROM information_schema.tables
    -> WHERE table_name='policy_membership_history';
+------------+
| table_rows |
+------------+
|   36844359 |
+------------+
1 row in set (0.04 sec)
For 1000 hosts & 1 configured policy over 24 hours I don’t think this is the norm.
t
yup, I'm working on a fix right now, if you run those deletes, it'll remove the policies and the history until we have the patch out