Title
#fleet
Jocelyn Bothe

Jocelyn Bothe

07/29/2021, 7:45 PM
hey, why is fleet making so many redis calls now? we're seeing connections max out on our redis cluster (10 nodes each with 65000 connections)
7:46 PM
Jul 29 19:38:53 <http://osquery-service-vaa32.ec2.vzbuilders.com|osquery-service-vaa32.ec2.vzbuilders.com> fleet[8471]: {"component":"service","err":"retrieve live queries: scan active queries: scan keys: redisc: failed to get a connection","ip_addr":"127.0.0.1:42004","level":"info","method":"GetDistributedQueries","took":"7.691258278s","ts":"2021-07-29T19:38:53.186431717Z","x_for_ip_addr":"216.155.204.8"}
Jul 29 19:38:53 <http://osquery-service-vaa32.ec2.vzbuilders.com|osquery-service-vaa32.ec2.vzbuilders.com> fleet[8471]: {"component":"service","err":"retrieve live queries: scan active queries: scan keys: redisc: failed to get a connection","ip_addr":"127.0.0.1:42000","level":"info","method":"GetDistributedQueries","took":"7.704909626s","ts":"2021-07-29T19:38:53.186638487Z","x_for_ip_addr":"98.139.22.220"}
Jul 29 19:38:53 <http://osquery-service-vaa32.ec2.vzbuilders.com|osquery-service-vaa32.ec2.vzbuilders.com> fleet[8471]: {"component":"service","err":"retrieve live queries: scan active queries: scan keys: redisc: failed to get a connection","ip_addr":"127.0.0.1:39646","level":"info","method":"GetDistributedQueries","took":"9.993457634s","ts":"2021-07-29T19:38:53.186933079Z","x_for_ip_addr":"98.139.22.220"}
zwass

zwass

07/30/2021, 1:26 AM
What version of Fleet are you using? Which did you upgrade from? We have not seen Fleet using up Redis connections in any way similar to that in the past.
Jocelyn Bothe

Jocelyn Bothe

07/30/2021, 12:17 PM
we upgraded from 3.10 to 4.0.1 last week, and then to 4.1.0 yesterday
12:17 PM
it looks like every node in the fleet is periodically doing this
"error": "retrieve live queries: scan active queries: scan keys: redisc: failed to get a connection"
Rachel Perkins

Rachel Perkins

07/30/2021, 9:11 PM
Hey Jocelyn, can you give us more info on how you configured Redis?
Jocelyn Bothe

Jocelyn Bothe

08/02/2021, 12:44 PM
I got our connections sorted by turning off tcp keepalive and setting the connection idle timeout to 20 seconds, but we're still getting this error when attempting live queries
Live query request failed
Error: Unknown error: TypeError: Cannot read property '0' of undefined
12:45 PM
our fleet redis config is
redis:
  address: 127.0.0.1:6379
  password: ${redis_auth}
12:47 PM
we're using stunnel to connect to our global elasticache redis cluster
fips = no
setuid = root
setgid = root
pid = /var/run/stunnel.pid
debug = 7 
delay = yes
options = NO_SSLv2
options = NO_SSLv3
[redis-cli]
   client = yes
   accept = 127.0.0.1:6379
   connect = ${redis_m}
[redis-cli-replica]
   client = yes
   accept = 127.0.0.1:6380
   connect = ${redis_r}
zwass

zwass

08/02/2021, 3:16 PM
Can you look at the network inspector in your browser devtools and see if there's any more details on the error in the response from the Fleet server?
Jocelyn Bothe

Jocelyn Bothe

08/02/2021, 3:27 PM
3:27 PM
3:28 PM
we have redis-cli installed too, if there's a query I could run manually to generate additional data
3:43 PM
zwass

zwass

08/02/2021, 3:51 PM
Is there possibly some
stunnel
configuration that is causing this?
Jocelyn Bothe

Jocelyn Bothe

08/02/2021, 3:53 PM
I can't rule it out, but it was working successfully with the same config before we upgraded to the latest fleet
3:54 PM
we are seeing fleet connect to redis
5:02 PM
we disabled encryption so we could get rid of stunnel and connect directly to redis from fleet
redis:
#  address: 127.0.0.1:6379
  address: [cluster-name].<http://nl4nlg.clustercfg.use1.cache.amazonaws.com:6380|nl4nlg.clustercfg.use1.cache.amazonaws.com:6380>
5:02 PM
still getting errors on live queries
5:03 PM
[root@osquery-service-orb164 log]# redis-cli -h [cluster-name].<http://nl4nlg.clustercfg.use1.cache.amazonaws.com|nl4nlg.clustercfg.use1.cache.amazonaws.com> -p 6380
[cluster-name].<http://nl4nlg.clustercfg.use1.cache.amazonaws.com:6380|nl4nlg.clustercfg.use1.cache.amazonaws.com:6380>> scan 0
1) "0"
2) (empty array)
5:03 PM
and the redis cache appears to be empty