Title
#kolide
Dan Achin

Dan Achin

09/22/2020, 10:48 PM
Hi. We've rolled out a POC for Fleet and osquery and I'm trying to understand the offline timeout for our clients as they seem to be dropping offline very quickly. I was given this code as the Offline (MIA) duration, which would seem to be a decently long amount of time - https://github.com/kolide/fleet/blob/7494513400b1d15d3e770358350d227ffbe2e4ce/server/kolide/hosts.go#L33. Is there a list of client events that would trigger an online status? I'm assuming config_regfresh or a check in to look for new distributed queries, distributed_interval, and probably several other client actions should be flagging them as valid.
zwass

zwass

09/22/2020, 10:48 PM
Just caught your other thread. Copying over my reply.
10:49 PM
The linked code is the duration for MIA (hosts that have not been seen for 30 days). The online status is calculated for each host based on the observed intervals set for
config_refresh
,
logger_tls_period
, and
distributed_interval
. IIRC we give some grace period over the "expected" interval.
10:49 PM
Do you by chance have any of those intervals set to something lower than 10 seconds?
Dan Achin

Dan Achin

09/22/2020, 10:50 PM
thanks. Yes we do. We felt 10 seconds was pretty aggressive for config_refresh and distributed_interval
10:50 PM
they are both currently set to 3600, though that was before we really understood what the latter meant and we planned to set it to 60
zwass

zwass

09/22/2020, 10:51 PM
How quickly do they go offline? Fleet should not set them to offline for quite some time if that's your interval.
Dan Achin

Dan Achin

09/22/2020, 10:51 PM
are you recommending we keep that at 10? we noticed a lot of open connections and file descriptors to redis and though the frequency of these check-ins might be too much
10:51 PM
most of these had just registered in the last couple of days
10:52 PM
so it seems to me that something else is amiss if they aren't showing up online after just registering
zwass

zwass

09/22/2020, 10:53 PM
Are the osquery processes still running on them?
10:53 PM
10s is a pretty typical distributed interval until you get to tens of thousands of hosts.
Dan Achin

Dan Achin

09/22/2020, 10:53 PM
it is on mine. 🙂, but I've got a request out to verify that since I don't have access
zwass

zwass

09/22/2020, 10:53 PM
It is on yours and yours shows as offline?
Dan Achin

Dan Achin

09/22/2020, 10:56 PM
that's another good question. 🙂. the fleet servers were apparently just taken offline as I was asking this question so I'll have to wait to access the UI again. I think the spirit of my question has been answered though - something else appears to be wrong. We'll do more digging here once the systems are back up.
zwass

zwass

09/22/2020, 11:14 PM
Something else wrong would be my top guess, followed by a bug in the Fleet code (much less likely in my judgement). Let us know what you find!
Dan Achin

Dan Achin

09/22/2020, 11:16 PM
Definitely, thanks for the feedback. One follow up - I can see why you'd want to be relatively aggressive on distributed_interval, but less so on config_refresh. We were planning on actually reducing that one to maybe every 12 hours. Do you see the osquery / fleet community refreshing the config frequently?
zwass

zwass

09/22/2020, 11:19 PM
Yeah, no need for it to be very short for most folks. Shorter means it's quicker to see changes roll out which is nice.
Dan Achin

Dan Achin

09/22/2020, 11:20 PM
sure, makes sense. We are reasoning that most changes would be to distributed queries. I guess we'll find out as we progress. 🙂