Andrew Zick
01/29/2025, 8:26 AMSELECT * FROM windows_security_center
that we put in our additional queries for debugging and have a host that will return “good” for all the columns when live queried, but responses from the /api/v1/fleet/hosts
endpoint will have the same host and same query with this:
{
"firewall": "Error",
"antivirus": "Error",
"autoupdate": "Good",
"internet_settings": "Error",
"user_account_control": "Error",
"windows_security_center_service": "Error"
}
and the host with "status": "offline",
It seems quite similar to this slack post from April 2024 so possibly something about start up/shutdown timing?Kathy Satterlee
01/29/2025, 2:12 PMAndrew Zick
01/29/2025, 6:28 PMdo you mean that the hosts are showing up as “offline” in Fleet, but responding to the Live query?No sorry that’s poor phrasing on my part. I meant that I wouldn’t expect these hosts to have different responses saved in the
/hosts
endpoint when offline vs. online.
This could be a misunderstanding of Fleet server behavior on my part. Is the /hosts
endpoint response simply a summary of what Fleet has in its database at the time, and the database is updated via the Host detail queries that you mentioned?
Does refetching the host update the value in the API for your additional query?The host is offline so I can’t refetch it via the UI, and it doesn’t seem like the
/hosts/:identifier/refetch
sets refetch_requested
to true either (when the host is offline)? But again I described the situation poorly so I bet this is expected.
Are you seeing any errors in the Fleet server logs around query ingestion (or in general)?To be honest I don’t think I’ve ever looked at our Fleet server logs. Are those one of these three kinds of logs? Or the underlying MYSQL db logs?
Kathy Satterlee
01/29/2025, 6:51 PM/hosts
endpoint response simply a summary of what Fleet has in its database at the time, and the database is updated via the Host detail queries that you mentioned?
This might be the key factor. Any data in the Fleet UI and API is updated on a set interval when the host is online, so you're seeing the last known state of the host when you fetch things from the API.Kathy Satterlee
01/29/2025, 6:52 PMAndrew Zick
01/29/2025, 11:11 PMso you’re seeing the last known state of the host when you fetch things from the APIHm, okay, so possibly what’s happening is that the host is sending details after the user has logged out, so Windows no longer has access to the information that
windows_security_center
relies on?Unthread
01/29/2025, 11:12 PMAndrew Zick
01/29/2025, 11:16 PMlogged_in_users
. Or I’ve seen other queries that check registry entries for Firewall being enabled, but I figured that would suffer from the same not-logged-in->no-access-to-hkeys 🤔Unthread
01/30/2025, 12:31 AMAndrew Zick
01/30/2025, 6:11 PMKathy Satterlee
01/30/2025, 6:36 PM/var/log/pods
If you don't have CloudWatch or Container Insights set up.Kathy Satterlee
01/30/2025, 6:37 PMfleetctl debug archive
. That contains some aggregated logs pulled from Redis. We might not get the complete picture, but it's a good start.Kathy Satterlee
01/30/2025, 6:38 PMKathy Satterlee
01/30/2025, 6:39 PMAndrew Zick
01/30/2025, 7:31 PMfleetctl debug archive
command output:
secureframe-cdk git:(master) ✗ fleetctl debug archive
Warning: Version mismatch.
Client Version: 4.51.1
Server Version: 0.0.0-SNAPSHOT-b31e25a
Ran allocs
Ran block
Ran cmdline
Failed errors: get errors received status 500
Ran goroutine
Ran heap
Ran mutex
Ran profile
Ran threadcreate
Ran trace
Ran db-locks
Ran db-innodb-status
Ran db-process-list
################################################################################
# WARNING:
# The files in the generated archive may contain sensitive data.
# Please review them before sharing.
#
# Archive written to: fleet-profiles-archive-20250130112845Z.tar.gz
################################################################################
Don’t worry about the client mismatch, that’s on purpose.
I’m worried about the
Failed errors: get errors received status 500
line 😬 possibly meaning there’s no error logs included…but still dm’ing the Zipfile right now.Kathy Satterlee
01/31/2025, 4:57 PMUnthread
01/31/2025, 5:16 PMKathy Satterlee
01/31/2025, 5:28 PMosquery
and fleetd
for host enrollment. That usually points to VMs that are either pulling the UUID from their host machine or are hardcoded with the same UUID.Kathy Satterlee
01/31/2025, 5:28 PMKathy Satterlee
01/31/2025, 5:33 PMKathy Satterlee
01/31/2025, 5:38 PMinstance
as the host identifier. This is a unique UUID associated with the actual osquery database present on the host.
fleetctl package --host-identifier instance [...otherFlags]
Kathy Satterlee
01/31/2025, 5:41 PMAndrew Zick
01/31/2025, 7:34 PMDo you have a lot of VMs in your environment? And is this host one of them?I don’t believe this host is a VM, based on the customer that they’re a part of. We definitely have a handful of VMs, often it’s EC2 instances that people install the agent on, but it’s a minority of our ~10,000 hosts.
Andrew Zick
01/31/2025, 7:59 PM--host-identifier
, we do something kinda funky. Back in ~2022 when we first set up Fleet, the original dev added an option to fleetctl package
to allow for passing in a specific “HostId” which is then passed into Orbit’s startup command, like so:
ExecStart=/opt/orbit/bin/orbit/orbit {{ if .HostId }} -- --host_identifier specified --specified_identifier {{ .HostId }} {{ end }}
“HostId” could be duplicated if a customer tries to install the same Agent package on multiple hosts, so those errors being present are plausible + expected in some amount.
But combining with what you’re saying, I wonder if the first time Orbit starts, it’s not starting with our custom HostId but instead the default UUID? I say this because our specified identifiers are 3 UUIDs concatenated with “:” separating each one, so a single UUID seems wrong.
We’ve had a long running issues with “ghost devices” where two hosts will share the same serial number + UUID, but one never checks in and is missing most info. I’ll attach a screenshot as an example.
However this is beyond the original scope of this thread / question. And the original host in question does not have a ghost host associated with it.
I don’t see the original host’s UUID in the Fleet webserver logs, but it hasn’t checked in for the last 9 days, so this isn’t surprising. I think my next step is to monitor the original host and check webserver logs when it next comes online?Unthread
02/06/2025, 5:12 PM