This message was deleted osquery #fleet

Join Slack

This message was deleted.

# fleet

Slackbot

09/17/2022, 9:20 PM

This message was deleted.

wennan.he

09/17/2022, 10:29 PM

what is settings for conn timeout of fleet? where is settings?

Kathy Satterlee

09/19/2022, 4:21 PM

Hi, @wennan.he ! How much memory do you have allocated for Fleet?

wennan.he

09/19/2022, 5:12 PM

i never limited it.

wennan.he

09/19/2022, 5:12 PM

it spent over 800G at most.

Kathy Satterlee

09/19/2022, 9:44 PM

How is Fleet deployed? It might help to get a brief rundown of your infrastructure.

wennan.he

09/19/2022, 10:09 PM

we deployed it by ourselves, but this situation is gone.

Kathy Satterlee

09/19/2022, 10:21 PM

Sounds like there may have been a hiccup somewhere that worked itself out, I'll keep an eye out to see if it happens for anyone else or if it pops back up for you!

wennan.he

09/21/2022, 4:10 AM

i am still seeing this situation going on, right we have 20k hosts and fleet is using 3-4g mem, and fleet responds pretty slow, i am feeling some thread taking too long on accessing db, is there anyway i can figure out which one? @Kathy Satterlee

wennan.he

09/21/2022, 4:47 AM

and i have some new discover, and i c in our db of fleet, we have more 10 million records in the table of host_software i really doubt this table cause the problem, and i have couple of questions. 1 what is this table? what is used for? 2 i found we can disable host software of fleet, does it relative to this table? and how to disable it from fleet.service file?

Kathy Satterlee

09/21/2022, 3:33 PM

What version of Fleet are you running? What path do you have set for

vulnerabilities.databases_path

? Does that folder have anything in it? Can you give a rundown of your Fleet architecture? It sounds like things may be struggling to keep up with the volume of traffic. The ‘host_software’ table tracks what software is installed on which hosts. With 20k hosts, I can definitely see that table getting quite large. You can disable software inventory and vulnerability scanning by setting `features.enable_software_inventory`: https://fleetdm.com/docs/using-fleet/vulnerability-processing#configuration

wennan.he

09/21/2022, 4:37 PM

could u tell me where to check vulnerabilities.databases_path?

Kathy Satterlee

09/21/2022, 4:43 PM

You can use

fleetctl get config --include-server-config

to pull your server config and check that value.

wennan.he

09/21/2022, 5:06 PM

is there anyother way to check it?

Kathy Satterlee

09/21/2022, 5:14 PM

Do you use environmental variables, a config file, or just command line flags to set up Fleet?

wennan.he

09/21/2022, 5:20 PM

i have fleet.service file but it doesn't contain it.

wennan.he

09/21/2022, 5:20 PM

this is the cfg

wennan.he

09/21/2022, 5:22 PM

and could u tell me vulnerability processing or software inventory feature would cause huge requests to fleet?

Kathy Satterlee

09/21/2022, 5:28 PM

Yes, it definitely can, especially when first enabled. Generally speaking, that activity dies down quite a bit once the inital data has been gathered. If that isn't set in the

fleet.conf

file, it may be the culprit. If it isn't, you'll need to either define it as a command line flag

--vulnerabilities-databases-path="/some/path"

(

tmp/vulndb

is common) or add it to the configuration file as an environmental variable. You can skip setting that if you disable software inventory, but I'd try making sure that is set up, restarting and seeing what happens first!

wennan.he

09/21/2022, 5:29 PM

it is not in /etc/fleet/fleet.conf

wennan.he

09/21/2022, 5:30 PM

and what is the env of

Copy code

vulnerability_settings

wennan.he

09/21/2022, 5:31 PM

what is name of vulnerability_settings i should put in that cfg?

Kathy Satterlee

09/21/2022, 5:31 PM

FLEET_VULNERABILITIES_DATABASES_PATH

wennan.he

09/21/2022, 5:38 PM

i just create this path, do i need create any file under the path?

wennan.he

09/21/2022, 5:47 PM

i tried and restart the fleet, looks like it becomes worse. the memory of fleet going higher.

Kathy Satterlee

09/21/2022, 5:48 PM

There's a lot going on there right now, I'd expect that usage to be a bit high. Vulnerability processing does require 4GB of memory.

wennan.he

09/21/2022, 5:48 PM

and you say it will die down after a while?

Kathy Satterlee

09/21/2022, 5:49 PM

Yes. There's a lot to process at first, but once the initial data gathering and scans have happened, it'll settle down quite a bit.

wennan.he

09/21/2022, 5:50 PM

so could u explain why fleet had that problem stay in high cpu and memory consuming (abut 3-4g) before i set this up? and some many errs(show above) in the log?

Kathy Satterlee

09/21/2022, 5:56 PM

Things were getting bogged down because it was trying to process the vulnerabilities unsuccessfully since the database wasn't there. We've noticed that this can cause issues, so we're making some changes to give better messaging (and prevent Fleet from starting) when things aren't set up properly. https://github.com/fleetdm/fleet/issues/7810 Just to be clear though, you may see spikes in memory usage from time to time. Your baseline just shouldn't be this high.

wennan.he

09/21/2022, 5:57 PM

hold on, that db is there for my case, i can see there a lot of records in my db. +------------------------------------+------------+ | table_name | table_rows | +------------------------------------+------------+ | host_software | 15338366 | | cve_meta | 191967 | | label_membership | 42152 | | host_users | 41266 | | host_seen_times | 20983 | | hosts | 19884 | | host_device_auth | 19667 | | host_operating_system | 18793 | | software_host_counts | 4418 | | software | 3927 | | migration_status_tables | 147 | | sessions | 31 | | software_cpe | 18 | | software_cve | 15 | | activities | 14 | | aggregated_stats | 11 | | migration_status_data | 9 | | operating_systems | 9 | | labels | 7 | | queries | 6 | | distributed_query_campaigns | 6 | | distributed_query_campaign_targets | 6 | | locks | 6 | | enroll_secrets | 3 | | windows_updates | 0 | | carve_blocks | 0 | | host_mdm | 0 | | network_interfaces | 0 | | users | 0 | | host_emails | 0 | | jobs | 0 | | app_config_json | 0 | | munki_issues | 0 | | user_teams | 0 | | scheduled_queries | 0 | | invites | 0 | | mobile_device_management_solutions | 0 | | teams | 0 | | host_batteries | 0 | | invite_teams | 0 | | statistics | 0 | | host_additional | 0 | | policy_membership | 0 | | policies | 0 | | email_changes | 0 | | password_reset_requests | 0 | | packs | 0 | | pack_targets | 0 | | osquery_options | 0 | | host_munki_issues | 0 | | scheduled_query_stats | 0 | | carve_metadata | 0 | | host_munki_info | 0 | +------------------------------------+------------+

wennan.he

09/21/2022, 5:57 PM

that is my tables

wennan.he

09/21/2022, 5:58 PM

cve_meta | 191967 | this is what you said right?

Kathy Satterlee

09/21/2022, 5:59 PM

I'm talking about the vulnerabilities database in the directory that you just created and set in Fleet.

wennan.he

09/21/2022, 5:59 PM

and the link saying the default path is /tmp/vulndbs and i also have it

Kathy Satterlee

09/21/2022, 6:00 PM

Right, you have it now and things should start to settle once the processing is able to complete.

wennan.he

09/21/2022, 6:00 PM

yes, FLEET_VULNERABILITIES_DATABASES_PATH=/var/fleet/ i c there r a lot of files under it

wennan.he

09/21/2022, 6:01 PM

and i also can c there a lot similar files under /tmp/vulndbs

wennan.he

09/21/2022, 6:01 PM

if this is the root cause how long my fleet would become normal?

Kathy Satterlee

09/21/2022, 6:01 PM

Exactly. Now that those are there, Fleet will be able to process vulnerabilities successfully, and things should start running smoothly.

Kathy Satterlee

09/21/2022, 6:03 PM

I can't give you an exact number there, there are a lot of variables that would contribute to the overall time it takes. You've got a lot of hosts with a lot of software so it could take a while.

wennan.he

09/21/2022, 6:03 PM

but my fleet still running with too high cpu consuming.

Kathy Satterlee

09/21/2022, 6:03 PM

Yes, because it's still processing.

wennan.he

09/21/2022, 6:03 PM

and it still have a lot of errs in my log

Kathy Satterlee

09/21/2022, 6:05 PM

What new errors are you seeing since restarting the server?

wennan.he

09/21/2022, 6:05 PM

Sep 21 180504 n107-019-021 fleet[3090473]: {"component":"http","err":"retrieve label queries: selecting label queries for host: context canceled","ip_addr":"10.121.40.209","level":"error","method":"POST Sep 21 180504 n107-019-021 fleet[3090473]: {"component":"http","err":"retrieve label queries: selecting label queries for host: context canceled","ip_addr":"10.121.8.215","level":"error","method":"POST" Sep 21 180504 n107-019-021 fleet[3090473]: {"component":"http","err":"authentication error: find host: context canceled","level":"info","path":"/api/v1/osquery/distributed/read","ts":"2022-09-21T18050 Sep 21 180504 n107-019-021 fleet[3090473]: {"component":"http","err":"retrieve label queries: selecting label queries for host: context canceled","ip_addr":"10.121.94.143","level":"error","method":"POST Sep 21 180504 n107-019-021 fleet[3090473]: {"component":"http","err":"authentication error: find host: context canceled","level":"info","path":"/api/v1/osquery/distributed/read","ts":"2022-09-21T18050 Sep 21 180504 n107-019-021 fleet[3090473]: {"component":"http","err":"authentication error: find host: context canceled","level":"info","path":"/api/v1/osquery/config","ts":"2022-09-21T180504.40348689 Sep 21 180504 n107-019-021 fleet[3090473]: {"component":"http","err":"authentication error: find host: context canceled","level":"info","path":"/api/v1/osquery/distributed/read","ts":"2022-09-21T18050 Sep 21 180504 n107-019-021 fleet[3090473]: {"component":"http","err":"retrieve label queries: selecting label queries for host: context canceled","ip_addr":"10.121.17.61","level":"error","method":"POST" Sep 21 180504 n107-019-021 fleet[3090473]: {"component":"http","ip_addr":"10.121.108.119","level":"debug","method":"POST","took":"14.727078384s","ts":"2022-09-21T180504.405334667Z","uri":"/api/v1/osqu Sep 21 180504 n107-019-021 fleet[3090473]: {"component":"http","err":"authentication error: find host: context canceled","level":"info","path":"/api/v1/osquery/distributed/read","ts":"2022-09-21T18050

wennan.he

09/21/2022, 6:07 PM

this doesn't look right, the situation is not mitigated. cpu mem and err in log nothing change.

Kathy Satterlee

09/21/2022, 6:20 PM

Let's check in on this again in a couple of hours. That will give time for the software processing to finish and hosts to check in a couple of times.

wennan.he

09/21/2022, 6:23 PM

ok sure

Kathy Satterlee

09/21/2022, 6:48 PM

Just for some context there, it does look like there's a bit of a bottleneck with MySQL that needs to be addressed, but it would be good to see if that levels out once things have had a bit to settle or is ongoing.

wennan.he

09/21/2022, 6:54 PM

i really doubt that, because our fleet running with 20k hosts for a while, it never had issue before.

wennan.he

09/21/2022, 6:55 PM

but sth wrong came up recently.

wennan.he

09/21/2022, 6:55 PM

do you think single host mysql cannot handle 20k hosts?

Kathy Satterlee

09/21/2022, 8:44 PM

It should be fine in theory, just might need to tweak a few things 🙂

Kathy Satterlee

09/21/2022, 8:44 PM

Can you take a look at your recent logs now and we'll see what things are looking like?

wennan.he

09/21/2022, 9:50 PM

nothing is going well.

wennan.he

09/21/2022, 9:52 PM

i don't think this is caused by not setting up vulnerabilities database, it is been couple of hrs since it is set up.

wennan.he

09/21/2022, 9:58 PM

fleet still in high usage of cpu

wennan.he

09/21/2022, 9:59 PM

how to check why there are co much computing of fleet?

Kathy Satterlee

09/21/2022, 10:08 PM

Thanks for giving it a bit to recheck. Sometimes when you find one problem, it's on to the next one. Let's keep digging in the errors and then see what the cpu usage looks like when things are running properly.

wennan.he

09/21/2022, 10:10 PM

yes, but how

Kathy Satterlee

09/21/2022, 10:11 PM

I'm noticing that all of the requests are timing out. Can you check the osquery logs on one of your hosts that is failing (based on the IP in the error) to see if there's any additional context there? If you're using Orbit (Fleet’s osquery package), here's where you can find those: https://github.com/fleetdm/fleet/tree/main/orbit#logs

Kathy Satterlee

09/21/2022, 10:14 PM

And for vanilla osquery: https://osquery.readthedocs.io/en/stable/deployment/logging/

wennan.he

09/21/2022, 10:32 PM

ok let me check out

wennan.he

09/21/2022, 10:54 PM

i only have permission to login one host and didn't find any valuable info.

wennan.he

09/21/2022, 10:54 PM

is there anything else i can chekc

3 Views

Open in Slack

Previous Next