Hello - one more question: I've installed the age...
# fleet
m
Hello - one more question: I've installed the agent on a Mac OS system, and I am seeing this error on the server side - "Host 1 was not found in the database" - This is the first agent I've attempted to join to FleetDM. Not sure what this error is, or what's causing it.
k
That's odd! Are you using vanilla osquery, or the Fleet installer?
m
Fleet installer - I'm using Fleetctl to generate the package and then run it. Here's the whole error log: {"component":"http","err":"get host: : Host 1 was not found in the datastore","level":"error"," method":"GET","took":"1.471236ms","ts":"2023-01-24T164548.024415587Z","uri":"/api/latest/fleet/hosts/1","user":"<REMOVED USERNAME>"}
k
Is your host showing up in the UI?
m
Unfortunately no, I keep running into certificate issues which I am attempting to resolve.
k
That's likely the culprit. The host may have partially enrolled. Let's table that error for now until the cert issues are resolved and see if it resolves itself.
m
So I'm past the cert issue - but it looks like I'm running into the same issue with the self-hosted solution that I did with the sandbox. It's online for less than a minute and then drops. So the good news is that it doesn't look like the sandbox is the issue!
I'm pulling the error log now.
"component":"http","err":": Authentication required","internal":" authentication error: invalid orbit node key","level":"info","path":"/api/fleet/orbit/config","ts":"2023-01-24T195826.821710023Z"} {"hostID":1,"level":"info","ts":"2023-01-24T195826.964969142Z"} {"component":"http","err":"Requires Fleet Premium license","ip_addr":"<IP>","level":"error","method":"GET","took":"1.46443ms","ts":"2023-01-24T195827.550638717Z","uri":"/api/latest/fleet/device/4c153ef3-64aa-4b21-ae04-0b2e97aad5f0/desktop","x_for_ip_addr":"<IP>"} {"err":"host 1 with empty platform","level":"error","ts":"2023-01-24T195917.468252549Z"} {"err":"host 1 with empty platform","level":"error","ts":"2023-01-24T195918.524011199Z"} {"err":"host 1 with empty platform","level":"error","ts":"2023-01-24T195919.586713516Z"} {"err":"host 1 with empty platform","level":"error","ts":"2023-01-24T195920.644208803Z"} {"err":"host 1 with empty platform","level":"error","ts":"2023-01-24T195921.698893489Z"} {"err":"host 1 with empty platform","level":"error","ts":"2023-01-24T195947.557048459Z"} {"cron":"integrations","level":"info","schedule":"integrations","status":"pending","ts":"2023-01-24T200000.022559417Z"} {"cron":"integrations","level":"info","schedule":"integrations","status":"completed","ts":"2023-01-24T200000.029790451Z"}
k
This really has been a bit of an adventure! We might need to do one more round of completely removing everything on the host, but first can you try removing the host from Fleet to see if it is able to successfully re-enroll now that the certificate issue is resolved?
m
Hello! Indeed it has 🙂 - so it looks like I was wrong about the certificate issue being resolved. I've still got the same enrollment issue as before - but after being introduced to the lovely fleetctl debug connection command I'm seeing another certificate error. So I'll fix that and follow up!
k
Was going to suggest using debug in the package creation next!
m
So I THINK we're finally past the cert problems. Still having the same issue, unfortunately. But I can run the debug connection and get no certificate errors, so hooray for progress! Here's the latest error: Jan 25 210944 fleetdm orbit[14103]: 2023-01-25T210944Z INF enroll failed, retrying error="enroll request: POST /api/fleet/orbit/enroll: Post \"https://fleetdm.pluralsight.com/api/fleet/orbit/enroll\": dial tcp 34.215.82.168443 i/o timeout" Jan 25 211014 fleetdm orbit[14103]: 2023-01-25T211014Z INF enroll failed, retrying error="enroll request: POST /api/fleet/orbit/enroll: Post \"https://fleetdm.pluralsight.com/api/fleet/orbit/enroll\": dial tcp 34.215.82.168443 i/o timeout" Jan 25 211014 fleetdm orbit[14103]: 2023-01-25T211014Z INF initial update to fetch extensions from /config API failed error="extensionsUpdate: error getting extensions config from fleet: orbit node key enroll failed, attempts=3"
I can confirm that IP resolves to the correct DNS entry, so I think we're good there.
No iptables issues, UFW is disabled (Ubuntu) on the server.
On the EC2 instance, we are allowing all 443 outbound, and 443 inbound from the test workstation.
k
How would you feel about me enrolling a vm to see what happens?
And what are you seeing in the Fleet server logs?
m
Sure, that's fine.
k
Mind sending me a DM with the flags I should use to generate the package?
m
OS?
k
Mac
Looks like the same issue on my end, digging in.
Can you show me the latest log in Fleet referencing
/api/fleet/orbit/enroll
?
It looks like requests might not be making it to Fleet. I attempted sending a
curl
request and got no response. Are you using a load balancer, or anything else that might be restricting access?
m
Not that I'm aware of. I disabled our VPN to remove that from the equation. We are using EC2 Security Groups with 443 open - are there other ports that need opened?
k
We're a little out of my wheelhouse at the moment so I may need to loop someone in. Here's how we're setting up security groups in our terraform: https://github.com/fleetdm/fleet/blob/fa720900d99e1dd222b0a1f69be8acf91d917cdf/tools/terraform/ecs-sgs.tf
m
I realized I goofed - Our SG was set to block all IPs except for mine. Sorry about that! I can add your IP in for testing if you like.
k
Did that do the trick for your machine?
m
Nah, same issue unfortunately.
k
That makes sense since your IP address was allowed :) It also sounds like you’re able to hit the front end, right?
m
Yep all good there.
k
What happens if you try to hit the enroll endpoint?
Copy code
curl -X POST -v <server url>/api/v1/osquery/enroll
m
Failed - I see it's trying port 80 which I have blocked at the moment.
Let me unblock that and give it another go!
Looks like the connection is being refused now. Is there a configuration in fleet where I need to serve up port 80?
k
That's odd, it should be using 443
m
I changed the curl command to use 443, but it is insisting on using 80.
Here's my flags file, if that helps: # Server --tls_hostname=fleetdm.pluralsight.com --tls_server_certs=fleet.pem # Enrollment --host_identifier=instance --enroll_secret_path=/opt/orbit/secret.txt --enroll_tls_endpoint=/api/v1/osquery/enroll # Configuration --config_plugin=tls --config_tls_endpoint=/api/v1/osquery/config --config_refresh=10 # Live query --disable_distributed=false --distributed_plugin=tls --distributed_interval=10 --distributed_tls_max_attempts=3 --distributed_tls_read_endpoint=/api/v1/osquery/distributed/read --distributed_tls_write_endpoint=/api/v1/osquery/distributed/write # Logging --logger_plugin=tls --logger_tls_endpoint=/api/v1/osquery/log --logger_tls_period=10 # File carving --disable_carver=false --carver_start_endpoint=/api/v1/osquery/carve/begin --carver_continue_endpoint=/api/v1/osquery/carve/blocki --carver_block_size=2000000 ~
k
Copy code
curl -X  POST -v <https://xxxx.xxxx.com/api/v1/osquery/enroll>
*   Trying <http://xx.xxx.xx.xxx:443|xx.xxx.xx.xxx:443>...
* connect to <http://xx.xxx.xx.xxx|xx.xxx.xx.xxx> port 443 failed: Operation timed out
* Failed to connect to <http://xxxx.xxxxxx.com|xxxx.xxxxxx.com> port 443 after 75084 ms: Operation timed out
* Closing connection 0
curl: (28) Failed to connect to <http://xxxx.xxxxxx.com|xxxx.xxxxxx.com> port 443 after 75084 ms: Operation timed out
One sec. verifying that I'm not doing something silly with the endpoint.
Or are you looking at the outgoing port?
m
curl -X POST -v fleetdm.pluralsight.com:443/apt/osquery/enroll * Trying 34.215.82.168:443... * Connected to fleetdm.pluralsight.com (34.215.82.168) port 443 (#0)
POST /apt/osquery/enroll HTTP/1.1
Host: fleetdm.pluralsight.com:443
User-Agent: curl/7.85.0
Accept: /
* Mark bundle as not supporting multiuse * HTTP 1.0, assume close after body < HTTP/1.0 400 Bad Request < Client sent an HTTP request to an HTTPS server.
k
Ah! You're missing the protocol in the url.
<https://xxxx.xxxx.com:443/api/osquery/enroll>
is the full url (with placeholders)
m
DOH
Ok so THAT works 🙂
So the enrollment API is accessible.
k
Brilliant. I'll DM my IP so I can see what changes.
Good stuff! Now I'm seeing a useful error in the osquery logs:
Copy code
enroll failed, retrying error="enroll request: POST /api/fleet/orbit/enroll received status 500 no matching secret found: no matching secret found
Any chance that the enroll secret has been changed or removed?
m
Hmm not that I'm aware of.
Secret looks like the same as the one I DM'd you previously
k
Got a minute to hop on a quick Zoom call?
m
Sure!
k
Brilliant.
Just to recap for posterity, the secret issue was totally on my end and there's still some certificate funkiness going on. Since this is a dev environment, the simplest solution was to take advantage of the
--insecure
flag to get around that until things are flowing properly there.
Nice meeting with you, have fun poking around!
266 Views