Orbit seemed to encounter an issue on our Linux se...
# fleet
s
Orbit seemed to encounter an issue on our Linux servers last night that severely affected all of them. I'm trying to dig through the logs, but
syslog
has a lot 🧵
Untitled
That's just the first "loud" section of logs on one server
z
@Shawn Maddock what version of Fleet are you running? what's the osquery/fleetd version on those linux servers? did any config in Fleet get changed last night or earlier yesterday?
s
I’ll have to check on the version. The Fleet server was last updated May 9. Nothing changed yesterday. I tried redeploying that server that the above logs are from and noticed orbit self-updated. It appears that particular device was hitting OOM issues, but I noticed similar logs on other servers with no resource issues. Just trying to figure out what’s going on, For now we’re rebuilding our server images without orbit to see if it makes a difference.
u
hey @Shawn Maddock (osquery) - did you see a change after the rebuild?
s
The issue has not recurred on any of our servers, so it’s a bit unknown. I mean, obviously the servers that were rebuilt without orbit are not going to have orbit errors in the logs, but we still don’t know why it blew up like that across our fleet. My best guess at this point is there was some unrelated resource issue, and instead of failing quietly, orbit compounded the issue. We may just continue forward without osquery on our servers.
1
l
Jun 25 224026 osqueryd: osqueryd worker (555067) stopping: Maximum sustainable CPU utilization limit 300ms exceeded for 15 seconds
Do you have custom osquery watcher configuration set? This means that a query exceeded the set limits on CPU utilization.
s
We do not; I assumed that was due to resource limitations on the source
l
Any new scheduled queries configured? Maybe running osquery in
--verbose
mode on one of these hosts can help us find the problematic query.
s
nope. and haven't seen that specific message recur since resolving the resource issue.
l
Gotcha. Let us know if it happens again.
s
Well, I guess we removed fleet, so we wouldn't see the message...
l
Oh ok.
s
sorry, have moved on to other things breaking. for now we'll just live without fleet agents on our servers
👍 1