We are configuring the fleet server to run behind ...
# kolide
s
We are configuring the fleet server to run behind NGINX. Osquery on clients seems to run properly and not getting any errors, but our NGINX is giving 499 errors for requests to /api/v1/osquery/distributed/read. So the error in the log looks like
10.100.29.126 - - [25/Sep/2020:22:34:00 +0000] "POST /api/v1/osquery/distributed/read HTTP/1.1" 499 0 "-" "osquery/4.4.0" "-" "127.0.0.1:8080" "TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256" "-" 15.555 15.683 -
ย  At the same time, the request to distributed/read from the client looks like
I0925 15:41:36.760367 119746560 tls.cpp:253] TLS/HTTPS POST request to URI: <https://fleet.domain.net:443/api/v1/osquery/distributed/read>
{"node_key":"HN32A+71pXAVPF57U63QIANo45P2J5I+"}
And osquery_status.log on fleet server shows this
{"hostIdentifier":"7AD00D8C-E849-5DE8-B20A-BD35D6F6137E","calendarTime":"Fri Sep 25 22:41:36 2020 UTC","unixTime":"1601073696","severity":"0","filename":"tls.cpp","line":"253","message":"TLS/HTTPS POST request to URI: <https://fleet.domain.net:443/api/v1/osquery/distributed/read>","version":"4.4.0","decorations":{"host_uuid":"7AD00D8C-E849-5DE8-B20A-BD35D6F6137E","hostname":"<http://c02w40vchv2r.domain.com|c02w40vchv2r.domain.com>"}}
Anyone know why this is happening? thank you! Steve K.
๐Ÿ‘ 1
z
Are you able to see any error in the stderr on the Fleet server? I can't think of any place that we return a 499 error code.
s
499 comes from Nginx, so it looks like the client killed the connection after 15 seconds I am not seeing anything on the reads in the logs
2020-09-26T17:33:24.746039+00:00 <http://m0546938.domain.net|m0546938.domain.net> Fleet: {"component":"service","err":null,"ip_addr":"127.0.0.1:34708","level":"debug","method":"SubmitStatusLogs","took":"50.315ยตs","ts":"2020-09-26T17:33:24.745482131Z","x_for_ip_addr":"10.100.29.126"}
2020-09-26T17:33:25.378083+00:00 <http://m0546938.domain.net|m0546938.domain.net> Fleet[13845]: {"component":"service","err":null,"ip_addr":"127.0.0.1:34710","level":"debug","method":"AuthenticateHost","took":"4.8951ms","ts":"2020-09-26T17:33:25.378003043Z","x_for_ip_addr":"10.100.29.126"}
2020-09-26T17:33:25.378255+00:00 <http://m0546938.domain.net|m0546938.domain.net> Fleet: {"component":"service","err":null,"ip_addr":"127.0.0.1:34710","level":"debug","method":"AuthenticateHost","took":"4.8951ms","ts":"2020-09-26T17:33:25.378003043Z","x_for_ip_addr":"10.100.29.126"}
2020-09-26T17:33:29.328066+00:00 <http://m0546938.domain.net|m0546938.domain.net> Fleet[13845]: {"component":"service","err":null,"ip_addr":"127.0.0.1:34718","level":"debug","method":"AuthenticateHost","took":"5.066852ms","ts":"2020-09-26T17:33:29.327978239Z","x_for_ip_addr":"10.100.29.126"}
2020-09-26T17:33:29.328685+00:00 <http://m0546938.domain.net|m0546938.domain.net> Fleet: {"component":"service","err":null,"ip_addr":"127.0.0.1:34718","level":"debug","method":"AuthenticateHost","took":"5.066852ms","ts":"2020-09-26T17:33:29.327978239Z","x_for_ip_addr":"10.100.29.126"}
2020-09-26T17:33:29.331378+00:00 <http://m0546938.domain.net|m0546938.domain.net> Fleet[13845]: {"component":"service","err":null,"ip_addr":"127.0.0.1:34718","level":"debug","method":"GetClientConfig","took":"3.319019ms","ts":"2020-09-26T17:33:29.331331817Z","x_for_ip_addr":"10.100.29.126"}
2020-09-26T17:33:29.331516+00:00 <http://m0546938.domain.net|m0546938.domain.net> Fleet: {"component":"service","err":null,"ip_addr":"127.0.0.1:34718","level":"debug","method":"GetClientConfig","took":"3.319019ms","ts":"2020-09-26T17:33:29.331331817Z","x_for_ip_addr":"10.100.29.126"}
osqueryd.ERROR, osqueryd.INFO and osqueryd.WARNING all sho the same
E0926 17:37:20.421149 23336 scheduler.cpp:101] Error executing scheduled query macos_kextstat: no such table: kernel_extensions
E0926 17:37:30.422411 23336 scheduler.cpp:101] Error executing scheduled query macos_kextstat: no such table: kernel_extensions
d
@zwass, any thoughts on why the client is closing the request to /distributed/read while nginx is still processing it? Looks like it's hitting a 15 second timeout, which to me would indicate that the client isn't getting the response from fleet that it expects within a 15 sec timeout so it kills the connection.
z
That seems way too long for that endpoint. Is your Redis functioning properly?
๐Ÿ‘ 1
s
i will verify the redis connection and let you know
๐Ÿ‘ 1
we are using redis in AWS and it looks like the encrypted connection does not work, but unencrypted connection is successful. Do i need to flag something to allow an encrypted connection or is it not supported?
d
Found this - https://github.com/kolide/fleet/issues/2038 but it hasn't had any movement
z
Someone connected to TLS Redis via stunnel in this chat recently. See if you can find that in the history.
๐Ÿ‘ 1
d
k, thanks. I also saw an article where someone did that. is that the only way that's supported @zwass?
z
I believe so at the moment. It may only be a few line change to support it directly in Fleet. Happy to review PR.
d
@zwass - looks like our team was able to help out. ๐Ÿ™‚ Thanks for the suggestion https://github.com/kolide/fleet/pull/2311 Add redis use_tls cfgย #2311
z
Yes, thank you for the contribution!
๐Ÿ‘ 1