Hi everyone, I’m new to orbit/fleet and I’ve been ...
# fleet
m
Hi everyone, I’m new to orbit/fleet and I’ve been tasked with fixing up a neglected installation, mainly on Linux servers. Our root certificate renewed a while ago and took out all of our TLS connections 😕 I’ve been able to update the fleet server cert and the client configs on the orbit side to define
--tls_server_certs
to use the new CA cert. At this point I’ve gotten a few machines to show up as Online in fleet and they even respond to queries! 😀 However, i noticed a persistent issue with what seems to be the orbit client. I’ll describe the behavior further in the thread, but the TL;DR is the orbit client will show this log 6 times:
INF enroll failed, retrying error="enroll request: unknown"
before finally giving up and throwing this error:
ERR failed initial config fetch: RunConfigReceivers get config: orbit node key enroll failed, attempts=6
I also checked on the fleet/server-side and the only errors i could find were like this:
err="host 588724 with empty platform"
Any help diagnosing this would be appreciated! More details to come in the thread! Thanks!
This segment of logs is from a fresh restart of orbit after updating the necessary certs for it to trust the fleet server. Whats interesting to me is that orbit doesn’t start osqueryd until after it fails the node key enroll the first time, which leads me to believe that this is an orbit issue not osquery. It also explains why, after osquery is started, i’m able to see the machine in fleet and have it respond to queries.
typically i wouldn’t be that concerned about some info/warning logs, but it looks like orbit keeps retrying this enroll thing every few minutes!
l
Hi @Max Prehl! Unlike
osqueryd
,
orbit
uses the OS root CA store. Maybe there's an issue with the new certificate? https://fleetdm.com/guides/certificates-in-fleetd#basic-article may be of help.
Worth trying updating the CA certificates in the Linux host? (
sudo update-ca-certificates
)
If this is a certificate from an internal CA another solution would be to add the internal CA to the OS CA cert store.
m
Hi @Lucas Rodriguez! I really appreciated the response. I looked into this and found that my linux hosts do have the CA cert installed system wide so unfortunately I don’t think that’s the root cause. I’d love to know if you have a bit more information about the Orbit enroll process VS OSQuery, since osquery seems to be working fine, but orbit is having this issue. I also found one more interesting log. It only shows up when freshly restarting orbit:
ERR failed initial config fetch: RunConfigReceivers get config: orbit node key enroll failed, attempts=6
I’m not sure what the structure of these logs are, if this is a fleet server response from an http request or if this is a log generated on the orbit client side. If it’s fleet/server side, I’d love to know if i could somehow turn on debug logs to find out why this request is failing. Let me know if you can point me in the right direction. In the mean time I’m happy to keep digging on this! Thanks again!
l
Some ideas: Enable debug in orbit, on linux add
ORBIT_DEBUG=1
on
/etc/default/orbit
and run
sudo systemctl restart orbit
Also, what's your Fleet server version? (On your browser you can go to
<https://your-fleet-server.com/version>
)
m
Ok thanks for the tips on orbit, I’ll try that in a little bit. As for Fleet, i just checked and it looks like we have 4.19.0! Looks like theres been many releases since then. I’ll have to look into upgrading!
l
That version is two years old, so it's worth trying upgrading 🙂