Good morning afternoon, I am having an issue where...
# fleet
b
Good morning afternoon, I am having an issue where these 6 empty hosts keep re-enrolling themselves with no data. I have tired deleting through the Fleet UI and the API, but they keep coming back. I have tried sifting through logs and the only useful information I can find is a generic error stating
Authentication required
. I pulled this via the API using the
/debug
endpoint:
Copy code
{
  "count": 1353044,
  "chain": [
    {
      "message": "Authentication required"
    },
    {
      "data": {
        "timestamp": "2023-02-16T11:39:00-05:00"
      },
      "stack": [
        "<http://github.com/fleetdm/fleet/v4/server/service.(*Service).AuthenticateOrbitHost|github.com/fleetdm/fleet/v4/server/service.(*Service).AuthenticateOrbitHost> (orbit.go:91)",
        "<http://github.com/fleetdm/fleet/v4/server/service.authenticatedOrbitHost.func1|github.com/fleetdm/fleet/v4/server/service.authenticatedOrbitHost.func1> (endpoint_middleware.go:132)",
        "<http://github.com/fleetdm/fleet/v4/server/service.logged.func1|github.com/fleetdm/fleet/v4/server/service.logged.func1> (endpoint_middleware.go:225)",
        "<http://github.com/fleetdm/fleet/v4/server/service/middleware/authzcheck.(*Middleware).AuthzCheck.func1.1|github.com/fleetdm/fleet/v4/server/service/middleware/authzcheck.(*Middleware).AuthzCheck.func1.1> (authzcheck.go:31)",
        "<http://github.com/go-kit/kit/transport/http.Server.ServeHTTP|github.com/go-kit/kit/transport/http.Server.ServeHTTP> (server.go:121)",
        "<http://github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerRequestSize.func2|github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerRequestSize.func2> (instrument_server.go:245)",
        "net/http.HandlerFunc.ServeHTTP (server.go:2109)",
        "<http://github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerResponseSize.func1|github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerResponseSize.func1> (instrument_server.go:284)",
        "net/http.HandlerFunc.ServeHTTP (server.go:2109)",
        "<http://github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1|github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1> (instrument_server.go:142)",
        "net/http.HandlerFunc.ServeHTTP (server.go:2109)"
      ]
    }
  ]
}
Any other ideas or advice would be appreciated. Let me know if there's any additional information I can provide.
k
Hey @Benjamin Heater can you take a look at this ticket to see if your issue may be related?
l
Any chance these ghost hosts that keep re-enrolling are Linux based and
/sys/class/dmi/id/product_uuid
file does not exist on these hosts? See this comment from the issue Kathy posted.
b
Yes, I just had a read through the ticket and it looks exactly like my issue. Here's a screenshot from my Fleet server that I had meant to include before:
All of the hosts in my environment were enrolled using packages generated by
fleetctl
and pushed by Ansible.
More than likely the culprits are Linux based, as I only have a handful of Windows hosts in this lab environment, all of which are powered down at the moment.
k
Sounds likely. What Linux distro(s) are you using?
b
Debian.
l
@Benjamin Heater What version? This is so we can try reproduce the issue.
b
@Lucas Rodriguez
Copy code
fleetctl - version 4.22.1
  branch:       HEAD
  revision:     43881e1de734dda1321c532c6c244fe9ca607e00
  build date:   2022-10-27
  build user:   runner
  go version:   go1.19.1
Table Output
Copy code
uuid                                 build    id status  osquery_version os_version              seen_time            updated_at
----                                 -----    -- ------  --------------- ----------              ---------            ----------
c97d3c18-e090-4c6d-9387-9ecafc438422          21 offline 5.6.0           Debian GNU/Linux 11.0.0 2023-01-23T23:18:21Z 2023-01-23T22:49:02Z
3425f955-002e-49b3-b77e-558a10debdb6          22 online  5.7.0           Debian GNU/Linux 11.0.0 2023-02-16T17:39:06Z 2023-02-16T17:28:52Z
8d67027e-c199-4046-a808-ae7e71711bd6          24 online  5.7.0           Debian GNU/Linux 11.0.0 2023-02-16T17:39:06Z 2023-02-16T17:11:55Z
f7f861b3-00f3-47fe-9f96-2c9778dab88d          26 online  5.7.0           Debian GNU/Linux 11.0.0 2023-02-16T17:39:11Z 2023-02-16T17:21:54Z
ae28ecfe-fde2-4640-97bb-9f779e89c236          28 online  5.7.0           Debian GNU/Linux 11.0.0 2023-02-16T17:39:11Z 2023-02-16T17:35:37Z
2bcbc49c-fe40-4335-a76c-bb00e6d34fd0          30 online  5.7.0           Debian GNU/Linux 10.0.0 2023-02-16T17:39:06Z 2023-02-16T17:15:39Z
9ff0a86e-c91a-4c98-b889-4074745f5987          32 online  5.7.0           Debian GNU/Linux 11.0.0 2023-02-16T17:39:06Z 2023-02-16T17:12:36Z
b18dd4e0-73dd-4ba6-8d04-eedf9fd2aba6          33 online  5.7.0           Debian GNU/Linux 11.0.0 2023-02-16T17:39:06Z 2023-02-16T17:07:45Z
a279b3d7-8706-4e54-88c3-e91614c746bf          34 online  5.7.0           Ubuntu 20.04.5 LTS      2023-02-16T17:39:06Z 2023-02-16T17:20:09Z
                                           16052 offline                                         2023-02-16T17:27:05Z 2023-02-16T17:27:05Z
                                           16053 offline                                         2023-02-16T17:27:06Z 2023-02-16T17:27:06Z
                                           16054 offline                                         2023-02-16T17:27:15Z 2023-02-16T17:27:15Z
                                           16055 offline                                         2023-02-16T17:27:16Z 2023-02-16T17:27:16Z
                                           16056 offline                                         2023-02-16T17:27:17Z 2023-02-16T17:27:17Z
                                           16057 offline                                         2023-02-16T17:27:27Z 2023-02-16T17:27:27Z
JSON Output
Copy code
[{"uuid":"c97d3c18-e090-4c6d-9387-9ecafc438422","build":"","id":21,"status":"offline","osquery_version":"5.6.0","os_version":"Debian GNU/Linux 11.0.0","seen_time":"2023-01-23T23:18:21Z","updated_at":"2023-01-23T22:49:02Z"},{"uuid":"3425f955-002e-49b3-b77e-558a10debdb6","build":"","id":22,"status":"online","osquery_version":"5.7.0","os_version":"Debian GNU/Linux 11.0.0","seen_time":"2023-02-16T17:39:06Z","updated_at":"2023-02-16T17:28:52Z"},{"uuid":"8d67027e-c199-4046-a808-ae7e71711bd6","build":"","id":24,"status":"online","osquery_version":"5.7.0","os_version":"Debian GNU/Linux 11.0.0","seen_time":"2023-02-16T17:39:06Z","updated_at":"2023-02-16T17:11:55Z"},{"uuid":"f7f861b3-00f3-47fe-9f96-2c9778dab88d","build":"","id":26,"status":"online","osquery_version":"5.7.0","os_version":"Debian GNU/Linux 11.0.0","seen_time":"2023-02-16T17:39:11Z","updated_at":"2023-02-16T17:21:54Z"},{"uuid":"ae28ecfe-fde2-4640-97bb-9f779e89c236","build":"","id":28,"status":"online","osquery_version":"5.7.0","os_version":"Debian GNU/Linux 11.0.0","seen_time":"2023-02-16T17:39:11Z","updated_at":"2023-02-16T17:35:37Z"},{"uuid":"2bcbc49c-fe40-4335-a76c-bb00e6d34fd0","build":"","id":30,"status":"online","osquery_version":"5.7.0","os_version":"Debian GNU/Linux 10.0.0","seen_time":"2023-02-16T17:39:06Z","updated_at":"2023-02-16T17:15:39Z"},{"uuid":"9ff0a86e-c91a-4c98-b889-4074745f5987","build":"","id":32,"status":"online","osquery_version":"5.7.0","os_version":"Debian GNU/Linux 11.0.0","seen_time":"2023-02-16T17:39:06Z","updated_at":"2023-02-16T17:12:36Z"},{"uuid":"b18dd4e0-73dd-4ba6-8d04-eedf9fd2aba6","build":"","id":33,"status":"online","osquery_version":"5.7.0","os_version":"Debian GNU/Linux 11.0.0","seen_time":"2023-02-16T17:39:06Z","updated_at":"2023-02-16T17:07:45Z"},{"uuid":"a279b3d7-8706-4e54-88c3-e91614c746bf","build":"","id":34,"status":"online","osquery_version":"5.7.0","os_version":"Ubuntu 20.04.5 LTS","seen_time":"2023-02-16T17:39:06Z","updated_at":"2023-02-16T17:20:09Z"},{"uuid":"","build":"","id":16052,"status":"offline","osquery_version":"","os_version":"","seen_time":"2023-02-16T17:27:05Z","updated_at":"2023-02-16T17:27:05Z"},{"uuid":"","build":"","id":16053,"status":"offline","osquery_version":"","os_version":"","seen_time":"2023-02-16T17:27:06Z","updated_at":"2023-02-16T17:27:06Z"},{"uuid":"","build":"","id":16054,"status":"offline","osquery_version":"","os_version":"","seen_time":"2023-02-16T17:27:15Z","updated_at":"2023-02-16T17:27:15Z"},{"uuid":"","build":"","id":16055,"status":"offline","osquery_version":"","os_version":"","seen_time":"2023-02-16T17:27:16Z","updated_at":"2023-02-16T17:27:16Z"},{"uuid":"","build":"","id":16056,"status":"offline","osquery_version":"","os_version":"","seen_time":"2023-02-16T17:27:17Z","updated_at":"2023-02-16T17:27:17Z"},{"uuid":"","build":"","id":16057,"status":"offline","osquery_version":"","os_version":"","seen_time":"2023-02-16T17:27:27Z","updated_at":"2023-02-16T17:27:27Z"}]
l
Thanks!
We will post updates on this ticket.
b
Thank you!
l
Thank you for the detailed report! 🙂
@Benjamin Heater Just enrolled a Debian 11 host with no issues. Do you have other Linux distros that could be having this issue? (Other than Debian 11)
Seems the hosts you show there (Ubuntu 20.04, Debian 10 and 11) are enrolled successfully. So maybe the issue is with other distros?
b
@Lucas Rodriguez that's the part where I'm having issues troubleshooting this. These are all Linux containers running on a Proxmox host. I've had no issues enrolling them at one point or another, but I'm struggling to recall when this started happening or why. I can't pinpoint one event or one issue that would have contributed to the issue. It's possible -- but I can't recall specifically -- that there may have been some cloned instances or instances rolled back to a prior snapshot. But, there's just not enough diagnostic data to go off at this point to pinpoint a root cause. I may have to break out
tcpdump
to see if I can get more info, cause the hosts re-enroll even after deletions. So, I can time it around that.
l
Are you able to ssh into one of these VMs? (And check if
/sys/class/dmi/id/product_uuid
exists) Seems Proxmox VMs run a modified Debian/Ubuntu linux?
b
Proxmox is the hypervisor, running QEMU/KVM. It also utilizes LXC (Linux Containers). In the case of these particular Linux guests, they are all LXC. I did some testing on the Linux containers. I use unprivileged containers in all cases with Proxmox, as it is the recommended configuration in the Proxmox documentation for security purposes.
/sys/class/dmi/id/product_uuid
does exist on the unprivileged containers, but due to the fact that they are containers and unprivileged, they cannot read
/sys/class/dmi/id/product_uuid
which is mapped from the host operating system. I created a privileged container for a quick test and found that on that container I was able to read the file
/sys/class/dmi/id/product_uuid
due to being directly mapped to the root user's namespace. Not sure if that's directly the issue, but the file is unreadable on unprivileged containers. I'm not sure how this directly correlates to other people's environment on the GitHub issue, but it could be a permissions issue with other hypevisors as well. However, reading that file would not be and issue on fully virtualized guests such as VMs.
@Lucas Rodriguez just adding some additional detail here as I've continued to try to and pick the issue apart. I enabled debug logging on the Fleet server and followed the log as I removed hosts and watched them re-enroll. I noticed a particular pattern: 1. Remove device 2. Fleet registers a new device:
{"hostID":16214,"level":"info","ts":"2023-02-16T22:32:31.295926453Z"}
3. Then the following two events happen within milliseconds of each other:
Copy code
{
  "component": "http",
  "ip_addr": "10.0.0.6:44954",
  "level": "debug",
  "method": "POST",
  "took": "1.168925ms",
  "ts": "2023-02-16T22:32:31.303012658Z",
  "uri": "/api/fleet/orbit/config",
  "x_for_ip_addr": ""
}
Copy code
{
  "component": "http",
  "hardware_uuid": "a1897f32-6e00-4321-90e9-fd85604327e9",
  "level": "debug",
  "method": "POST",
  "took": "12.646874ms",
  "ts": "2023-02-16T22:32:31.306260822Z",
  "uri": "/api/fleet/orbit/enroll",
  "user": "unauthenticated"
}
These events are always appear next to each other, although the order isn't always consistent, in which they're logged. Another example:
Copy code
{"hostID":16212,"level":"info","ts":"2023-02-16T22:32:21.072867017Z"}

{"component":"http","hardware_uuid":"979fbc3d-62fc-47f6-b870-cf8e637409fe","level":"debug","method":"POST","took":"12.033707ms","ts":"2023-02-16T22:32:21.081909305Z","uri":"/api/fleet/orbit/enroll","user":"unauthenticated"}

{"component":"http","ip_addr":"10.0.0.4:57200","level":"debug","method":"POST","took":"1.956903ms","ts":"2023-02-16T22:32:21.085456173Z","uri":"/api/fleet/orbit/config","x_for_ip_addr":""}
Those UUIDs are consistent, such that when the device tries to re-enroll, it is keeping the UUID:
Copy code
Feb 16 16:48:31 fleetdm fleet[205760]: {"component":"http","hardware_uuid":"a1897f32-6e00-4321-90e9-fd85604327e9","level":"debug","method":"POST","took":"10.019654ms","ts":"2023-02-16T21:48:31.446982311Z","uri":"/api/fleet/orbit/enroll","user":"unauthenticated"}
Feb 16 16:51:31 fleetdm fleet[205814]: {"component":"http","hardware_uuid":"a1897f32-6e00-4321-90e9-fd85604327e9","level":"debug","method":"POST","took":"8.098737ms","ts":"2023-02-16T21:51:31.488982941Z","uri":"/api/fleet/orbit/enroll","user":"unauthenticated"}
Feb 16 16:58:31 fleetdm fleet[205814]: {"component":"http","hardware_uuid":"a1897f32-6e00-4321-90e9-fd85604327e9","level":"debug","method":"POST","took":"18.17667ms","ts":"2023-02-16T21:58:31.133877303Z","uri":"/api/fleet/orbit/enroll","user":"unauthenticated"}
Feb 16 17:09:31 fleetdm fleet[205814]: {"component":"http","hardware_uuid":"a1897f32-6e00-4321-90e9-fd85604327e9","level":"debug","method":"POST","took":"14.257451ms","ts":"2023-02-16T22:09:31.194199184Z","uri":"/api/fleet/orbit/enroll","user":"unauthenticated"}
Feb 16 17:32:31 fleetdm fleet[205814]: {"component":"http","hardware_uuid":"a1897f32-6e00-4321-90e9-fd85604327e9","level":"debug","method":"POST","took":"12.646874ms","ts":"2023-02-16T22:32:31.306260822Z","uri":"/api/fleet/orbit/enroll","user":"unauthenticated"}
So, my hunch at the moment is that already-registered hosts are trying to re-register. But, I'm trying to find more log evidence to corroborate this.
l
Hi @Benjamin Heater!
I did some testing on the Linux containers. I use unprivileged containers in all cases with Proxmox, as it is the recommended configuration in the Proxmox documentation for security purposes.
We haven't developed+tested Orbit for running as non-root (unprivileged). So you might be hitting untested/unsupported issues/scenarios.
So, my hunch at the moment is that already-registered hosts are trying to re-register. But, I'm trying to find more log evidence to corroborate this.
You might be hitting a re-enroll loop between two or more hosts trying to enroll to Fleet with the same UUID (which could happen if two instances of Orbit are using the same
/sys/class/dmi/id/product_uuid
file).