hey why is fleet making so many redis calls now we re seeing osquery #fleet

hey, why is fleet making so many redis calls now? ...

Jocelyn Bothe

07/29/2021, 7:45 PM

hey, why is fleet making so many redis calls now? we're seeing connections max out on our redis cluster (10 nodes each with 65000 connections)

Jocelyn Bothe

07/29/2021, 7:46 PM

Copy code

Jul 29 19:38:53 <http://osquery-service-vaa32.ec2.vzbuilders.com|osquery-service-vaa32.ec2.vzbuilders.com> fleet[8471]: {"component":"service","err":"retrieve live queries: scan active queries: scan keys: redisc: failed to get a connection","ip_addr":"127.0.0.1:42004","level":"info","method":"GetDistributedQueries","took":"7.691258278s","ts":"2021-07-29T19:38:53.186431717Z","x_for_ip_addr":"216.155.204.8"}
Jul 29 19:38:53 <http://osquery-service-vaa32.ec2.vzbuilders.com|osquery-service-vaa32.ec2.vzbuilders.com> fleet[8471]: {"component":"service","err":"retrieve live queries: scan active queries: scan keys: redisc: failed to get a connection","ip_addr":"127.0.0.1:42000","level":"info","method":"GetDistributedQueries","took":"7.704909626s","ts":"2021-07-29T19:38:53.186638487Z","x_for_ip_addr":"98.139.22.220"}
Jul 29 19:38:53 <http://osquery-service-vaa32.ec2.vzbuilders.com|osquery-service-vaa32.ec2.vzbuilders.com> fleet[8471]: {"component":"service","err":"retrieve live queries: scan active queries: scan keys: redisc: failed to get a connection","ip_addr":"127.0.0.1:39646","level":"info","method":"GetDistributedQueries","took":"9.993457634s","ts":"2021-07-29T19:38:53.186933079Z","x_for_ip_addr":"98.139.22.220"}

zwass

07/30/2021, 1:26 AM

What version of Fleet are you using? Which did you upgrade from? We have not seen Fleet using up Redis connections in any way similar to that in the past.

Jocelyn Bothe

07/30/2021, 12:17 PM

we upgraded from 3.10 to 4.0.1 last week, and then to 4.1.0 yesterday

Jocelyn Bothe

07/30/2021, 12:17 PM

it looks like every node in the fleet is periodically doing this

Copy code

"error": "retrieve live queries: scan active queries: scan keys: redisc: failed to get a connection"

Rachel Perkins

07/30/2021, 9:11 PM

Hey Jocelyn, can you give us more info on how you configured Redis?

Jocelyn Bothe

08/02/2021, 12:44 PM

I got our connections sorted by turning off tcp keepalive and setting the connection idle timeout to 20 seconds, but we're still getting this error when attempting live queries

Copy code

Live query request failed
Error: Unknown error: TypeError: Cannot read property '0' of undefined

Jocelyn Bothe

08/02/2021, 12:45 PM

our fleet redis config is

Copy code

redis:
  address: 127.0.0.1:6379
  password: ${redis_auth}

Jocelyn Bothe

08/02/2021, 12:47 PM

we're using stunnel to connect to our global elasticache redis cluster

Copy code

fips = no
setuid = root
setgid = root
pid = /var/run/stunnel.pid
debug = 7 
delay = yes
options = NO_SSLv2
options = NO_SSLv3
[redis-cli]
   client = yes
   accept = 127.0.0.1:6379
   connect = ${redis_m}
[redis-cli-replica]
   client = yes
   accept = 127.0.0.1:6380
   connect = ${redis_r}

zwass

08/02/2021, 3:16 PM

Can you look at the network inspector in your browser devtools and see if there's any more details on the error in the response from the Fleet server?

Jocelyn Bothe

08/02/2021, 3:27 PM

Jocelyn Bothe

08/02/2021, 3:27 PM

Jocelyn Bothe

08/02/2021, 3:28 PM

we have redis-cli installed too, if there's a query I could run manually to generate additional data

Jocelyn Bothe

08/02/2021, 3:43 PM

Untitled

zwass

08/02/2021, 3:51 PM

Is there possibly some

stunnel

configuration that is causing this?

Jocelyn Bothe

08/02/2021, 3:53 PM

I can't rule it out, but it was working successfully with the same config before we upgraded to the latest fleet

Jocelyn Bothe

08/02/2021, 3:54 PM

we are seeing fleet connect to redis

Jocelyn Bothe

08/03/2021, 5:02 PM

we disabled encryption so we could get rid of stunnel and connect directly to redis from fleet

Copy code

redis:
#  address: 127.0.0.1:6379
  address: [cluster-name].<http://nl4nlg.clustercfg.use1.cache.amazonaws.com:6380|nl4nlg.clustercfg.use1.cache.amazonaws.com:6380>

Jocelyn Bothe

08/03/2021, 5:02 PM

still getting errors on live queries

Jocelyn Bothe

08/03/2021, 5:03 PM

Copy code

[root@osquery-service-orb164 log]# redis-cli -h [cluster-name].<http://nl4nlg.clustercfg.use1.cache.amazonaws.com|nl4nlg.clustercfg.use1.cache.amazonaws.com> -p 6380
[cluster-name].<http://nl4nlg.clustercfg.use1.cache.amazonaws.com:6380|nl4nlg.clustercfg.use1.cache.amazonaws.com:6380>> scan 0
1) "0"
2) (empty array)

Jocelyn Bothe

08/03/2021, 5:03 PM

and the redis cache appears to be empty

5 Views

Open in Slack

Previous Next