Hello everyone! I am trying to deploy fleet on AWS...
# fleet
r
Hello everyone! I am trying to deploy fleet on AWS but I am having problems with the agent. What I did: 1- create a RDS 2- create a Redis 3- create a self-signed certificate 4- create an EC2, download fleet and start it. 5- generate an installer to connect to Fleet But the host is never connected. I checked the connectivity between them and it is fine. So I am with some doubts: 1 - the installer should generate a process in my machine, right? I found nothing running with fleet in the name or osquery 2- is there a debug mode for the connector?
l
Hi Rafa!, a couple of questions 1. Which OS did you install the generated package? 2. If the Fleet URL serves a self-signed certificate (not trusted by your system's CA root bundle) then you will need to specify it when generating the package
fleetctl package ... --fleet-certificate=fleet.pem ...
, did you set such flag?
r
Hi Lucas! Fleet is running in Ubuntu 22.04 and the generated package in Ubuntu 20.04.4. I generate the package in the machine that I want to connect with: fleetctl package --type=deb --fleet-url=https://ip:8080 --enroll-secret=ASDASDASDASDASDASDASDASD --fleet-certificate=fleet_osquery.pem LHi
l
OK, once the package is generated, did you install it via
sudo dpkg --install
?
r
(I omit the true ip and enroll-secret)
Yes, running sudo dpkg -i fleet-osquery_0.0.13_amd64.deb
l
OK, let's check logs then:
sudo vim /var/log/syslog
(vim or other text editor, and look for orbit/osquery logs)
r
Good! Found something:
Jun 30 105330 orbit[3648258]: W0630 105330.888624 3648258 tls_enroll.cpp:101] Failed enrollment request to https://ip:8080/api/v1/osquery/enroll (Request error: certificate verify failed) retrying...
l
OK, osquery is complaining about the self-signed certificate (doesn't trust it).
r
But setting --fleet-certificate should solve this right?
l
Did you set
--fleet-certificate=fleet_osquery.pem
and it still doesn't work?
r
yeap... but I will do the process again .1 minute
l
OK, if it still doesn't work, then you can also try (from the Ubuntu agent):
Copy code
$ curl --cacert ./fleet_osquery.pem <https://ip:8080/version>
(To check any issues with the generated certificate itself.)
❤️ 1
r
Noob question:
Copy code
openssl req -x509 -newkey rsa:4096 -sha256 -days 3650 -nodes \
  -keyout /tmp/server.key -out /tmp/server.cert -subj "/CN=SERVER_NAME" \
  -addext "subjectAltName=DNS:SERVER_NAME"
SERVER_NAME in my case would be the ec2 ip?
l
IIRC
subjectAltName=DNS:SERVER_NAME
it should actually be
subjectAltName=IP:$SERVER_IP
r
Thanks! And in the CN=SERVER_NAME ?
l
Try with the IP too
r
openssl req -x509 -newkey rsa:4096 -sha256 -days 3650 -nodes \ -keyout /tmp/server.key -out /tmp/server.cert -subj "/CN=$IP" \ -addext "subjectAltName=IP:$IP"
I create with this command
Now the accessing the UI is secure
but the problem is the same: Jun 30 113413 orbit[3659248]: W0630 113413.101204 3659248 tls_enroll.cpp:101] Failed enrollment request to https//8080/api/v1/osquery/enroll (Request error: certificate verify failed) retrying...
l
The curl command works now?
r
Yeap! curl --cacert ./fleet_.pem https://ip:8080/version
Copy code
{
  "version": "4.16.0",
  "branch": "HEAD",
  "revision": "865ab32d03c37e8a74e811bc5ac697202f14e455",
  "go_version": "go1.17.8",
  "build_date": "2022-06-21",
  "build_user": "runner"
}
l
OK, and to double check, you re-generated the package with such cert and re-installed?
If this is a non-production test, then this can be fixed simply by adding
fleetctl package ... --insecure ...
(which will fix any certificate errors but it's not recommended for production environments.) I can suggest the above and then, once all it's working and tested, configure a proper certificate (not self-signed) for Fleet.
r
Yeap, I re-generated the package and use the pem download using UI
orbit[3665440]: 2022-06-30T115154-03:00 ERR run orbit failed error="write server cert: open /tmp/fleet.crt: permission denied"
with --insecure
l
Do not provide the certificate
--fleet-certificate
when using
--insecure
Mhm... not sure what's the permission error, let me check
Isn't Orbit running as root? (it should have access to
/tmp
, right?)
r
root 3666950 19.6 0.0 713056 14312 ? Ssl 11:56 0:00 /opt/orbit/bin/orbit/orbit
yeap, that is the crazy thing
l
(The
--insecure
mode creates a certificate in
/tmp/fleet.crt
.)
Q: Is the goal to test the Fleet deployment before creating a proper certificate for it?
r
Yeap!
I forced the creation /tmp/fleet.crt with privileges and problem continues...
l
Orbit needs to create such cert on the fly
r
one more thing: when I installed the connector, from which file orbit read the configs?
l
Some configs can be set at
fleetctl package
generation time (see
fleetctl package --osquery-flagfile flagfile.txt
option), other options can be set via the Fleet UI (in the Settings -> "Global agent options").
Let me know if that makes sense
r
Found it
there is another certificate in the middle
I am fixing this and let you know
l
OK
r
worked!
thanks a lot!
l
Cool. Glad to be of help!
r
Just one thing more: {"component":"http","err":"error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || getting app config: selecting app config: context canceled","ingestion-err":"ingest detail query: selecting app config: context canceled","ip_addr":"IP:13576","level":"error","method":"POST","took":"15.562593584s","ts":"2022-06-30T164353.030188861Z","uri":"/api/v1/osquery/distributed/write","x_for_ip_addr":""}
any idea what could be the problem?
l
context canceled
errors usually are due: • Slow database, and/or • Configured timeouts in osquery, a load balancer or database (
took
says 15s so a guess is that there's a 15s timeout somewhere)
r
strange
because I put in debug mode
get the request and send manually
and it worked
(fleet ui is showing everything I sent)
scaling db solved =D
👍 1
And last thing: I tried everything: scan host, adding policies, queries, force a rescan and everything works fine. Just one thing is not working: when I run query in the UI it stays forever in loading and in the console this message appears: WebSocket connection to 'wss://ip:8080/api/v1/fleet/results/109/u2d4l2fp/websocket' failed: POST https://ip:8080/api/v1/fleet/results/109/b4sqcld0/xhr_streaming?t=1656612540211 405
But, from the side of the machine, the request is received and answered correctly.
l
It might be an issue with websockets ("live queries" use websocket to connect to Fleet)
As in something in the infrastructure not allowing websockets traffic.
Actually the user in that issue got a 405, same as you. Please take a look at the thread in that issue. (We will be adding an entry for this in the FAQ)
r
thanks!