https://github.com/osquery/osquery logo
Title
j

Jocelyn Bothe

09/08/2021, 6:37 PM
still having issues with redis, even after upgrading to 4.2.4 today. We're now seeing the error
err="scan keys: dial tcp 10.10.24.224:6380: i/o timeout" msg="failed to migrate live query redis keys"
t

Tomas Touceda

09/08/2021, 7:42 PM
we changed how some things behave in 4.2.4 and improved them further in 4.3.0 around redis clusters. It might be worth a try
that said, can you connect directly to that node with redis-cli?
j

Jocelyn Bothe

09/08/2021, 7:42 PM
yup, redis-cli works fine
but it shows no keys
t

Tomas Touceda

09/08/2021, 7:43 PM
does it take longer than 5s to connect?
it's ok if there are no keys
j

Jocelyn Bothe

09/08/2021, 7:44 PM
nope, it was definitely less than 5 seconds
t

Tomas Touceda

09/08/2021, 7:45 PM
and this is from the same host that is running fleet serve, correct?
j

Jocelyn Bothe

09/08/2021, 7:45 PM
yes
if I enable debug logging, will it have more info about that redis timeout?
Sep 08 19:44:34 <http://osquery-service-vab147.ec2.vzbuilders.com|osquery-service-vab147.ec2.vzbuilders.com> systemd[1]: Started Kolide Fleet.
Sep 08 19:44:34 <http://osquery-service-vab147.ec2.vzbuilders.com|osquery-service-vab147.ec2.vzbuilders.com> fleet[15741]: Using config file:  /etc/kolide/fleet.yml
Sep 08 19:44:51 <http://osquery-service-vab147.ec2.vzbuilders.com|osquery-service-vab147.ec2.vzbuilders.com> fleet[15741]: level=info ts=2021-09-08T19:44:51.921971702Z err="scan keys: dial tcp 10.10.24.226:6380: i/o timeout" msg="failed to migrate live query redis keys"
there is more than 5 seconds between fleet reading the config yaml and getting the redis error on startup
t

Tomas Touceda

09/08/2021, 7:48 PM
yeah, something is preventing it to connect fast enough, or at all
j

Jocelyn Bothe

09/08/2021, 7:50 PM
okay, interesting, when I restarted in debug it didn't get the error, so it seems intermittent
is that 5s timeout going to be configurable in 4.3?
t

Tomas Touceda

09/08/2021, 7:51 PM
are the metrics for the redis cluster ok?
I'm looking into the timeout to see what we can provide
j

Jocelyn Bothe

09/08/2021, 7:51 PM
yeah, the cluster is basically doing nothing right now
t

Tomas Touceda

09/08/2021, 7:52 PM
is it in the same region as the fleet instance?
j

Jocelyn Bothe

09/08/2021, 7:53 PM
it is a global cluster, but yes, the instance I'm testing on is in the same region
as the primary redis
t

Tomas Touceda

09/08/2021, 7:53 PM
gotcha, thank you for answering my million questions
j

Jocelyn Bothe

09/08/2021, 7:53 PM
thanks for helping me troubleshoot
t

Tomas Touceda

09/08/2021, 7:53 PM
we'll make the timeout configurable
j

Jocelyn Bothe

09/08/2021, 7:53 PM
👍
and if you're taking requests, a configurable retry for that connection would be lovely 🙂
t

Tomas Touceda

09/08/2021, 8:00 PM
always welcoming requests!
will look into retries, the issue is at the connection level, key migrations are just one point that uses that
retries will probably not make it to 4.3.0, could you create a feature request issue: https://github.com/fleetdm/fleet/issues/new?assignees=&amp;labels=idea&amp;template=feature-request.md&amp;title= ? as for the timeout, https://github.com/fleetdm/fleet/pull/1968
j

Jocelyn Bothe

09/08/2021, 8:19 PM
done!
:ty: 1
m

mikermcneil

09/09/2021, 8:08 PM
fyi here's the link to @Jocelyn Bothe's GitHub issue: https://github.com/fleetdm/fleet/issues/1969 (Thanks Jocelyn!)
👍 1