Hey folks. We're currently running fleet in our en...
# fleet
a
Hey folks. We're currently running fleet in our environment in a horizontally scaled setup - multiple fleet servers sharing one db and redis store. On two of our fleet servers the devices that are showing as "offline" are actually showing as online in other fleet servers. the hosts/devices are indeed up when I check the osquery communication with their respective fleet server. We are not running this setup loadbalanced, rather every DC has one fleet server that serves as the TLS server for osquery agents within that DC. Any ideas what could be the problem? Here's what I have checked so far: 1. increased maximum connections support on the db and fleet servers 2.
--verbose
--tls_dump
flags with osquery agents shows that the connection is stable and the agents are able to communicate properly 3. the db is working fine and is reflecting latest
last_seen_time
in the hosts table 4. packet capture shows consistent data exchange between mysql server and the affected fleetdm servers.
m
Are you receiving any errors?
a
Hello Martavis. Nothing that would clearly relate to the issue. I am seeing some
Bad MAC
TLS errors but those are related with enrollments of specific hosts
m
Is this something that's new or have you always seen this issue?
a
It was totally new but I fixed it just now with a complete reboot of the affected fleet server. It must be an internal networking issue related to our internal services. BTW thank you for the help!
m
No problem at all. I'm glad to hear it's working again.