Hi folks, looking for some help with connectivity ...
# fleet
z
Hi folks, looking for some help with connectivity to fleet. I currently have a few RHEL hosts that are up on fleet. Some are connected and show that they're online, but others will periodically show as offline. I've tried deleting them from fleet and reinstalling my 'fleet-osquery' package to see if it fixes the issue and it seems like it does but it is not permanent as those same hosts will show offline a few hours later. I've connected these hosts to a domain controller so I edited
/etc/resolv.conf
with the domain nameserver IP and edited
/etc/NetworkManager/NetworkManager.conf
so that the changes are on
/resolv.conf
stay the same even after reboot. I've also disabled
SELINUX
because it was not letting the hosts connect. Currently am stumped on what could be blocking the connection or why the hosts go offline. Any help on what could be the issue?
z
Are you able to SSH onto one of the effected hosts when it goes offline? Can you
curl
the Fleet server at that time?
z
i can still SSH into them and when i
curl
 the Fleet server it returns HTML
z
Anything in
/var/log/osquery
?
l
Also try checking syslog messages in the host (
/var/log/messages
IIRC
z
/var/log/osquery
is empty and i do not see
messages
in
/var/log
z
systemctl status orbit.service
?
z
shows that it is active and running
z
Can you edit
/usr/lib/systemd/system/orbit.service
to add
--debug
to the
orbit
command and then reload+restart the service?
z
sorry, still a bit new to fleet/osquery, where in `orbit.service`would i add
--debug
?
z
Can you show the contents of that file?
(I don't have a Linux box up at the moment to reference easily)
z
here are the contents, i appreciate all the help so far 😄
z
Add
--debug
at the end of the
ExecStart
line please.
Then
sudo systemctl daemon-reload && sudo systemctl restart orbit.service
z
doing that brought the host back online
mind if i ask what adding
--debug
did for orbit?
z
Can you check that logs are more verbose in
systemctl status orbit.service
?
It just turned on more verbose logging. It was probably restarting the orbit/osquery process that brought it back online.
If we have more verbose logs now, hopefully we can determine what the issue is when it goes offline again.
I suspect
systemctl restart orbit.service
without any other changes would have temporarily "fixed" it because that would have restarted the processes.
z
ah i see, will those logs be in
/var/log/orbit
?
z
Yes I think so.
z
awesome, I will be keeping an eye on the host and see when it goes offline again and update here. Thank you for the help!
z
Thank you!
z
None of the hosts that had the logging turned on have gone offline since yesterday 🤦‍♂️. Still waiting on them! Anything else I could check that could be the issue in the meantime?
z
I am not sure what else we could check with them currently working as expected. Let us know if they go bad again.
z
will do!