I'm seeing the following errors in my logs `"err":...
# fleet
c
I'm seeing the following errors in my logs
"err":"retrieve live queries: receive sql: redigo: nil returned"
. They appear to be caused either be running a query live via
fleetctl
or ending one of those queries early. However, the errors keep popping up in the logs long after any queries are being run. They seem to spike up in regular 5 minute intervals, which is my
logger_tls_period
. Is there something I can do to fix these?
z
Is 5 minutes also your distributed_interval?
c
No, that's 30 seconds
But, I haven't run a live query in over an hour and I'm still seeing these messages come in
Well, they're less regular in 5 minute increments now, but that could be due to more than one query stuck like this
From what I can tell the redis is empty (but redis is a bit of a mystery to me)
z
keys '*'
returns nothing?
c
I'm using redis commander and there aren't any entries in the GUI. I got an error when I tried to run that in the commandline bit. Let me dig a bit more.
The tree view should show keys, but it's empty. Which is what I recall when I had to go in and clean up the live queries before the cleanup logic was implemented.
Also, we just updated to 3.10.1 from 3.1 (I think, it was old).
I added a test key and can see it in the tree, so I'm pretty sure it's empty aside from that key.
z
Okay thanks for all that info. I'm going to see if we can do some better cleanup for this in the next release.
c
Thanks! Two quick questions, are these sort of errors something I should be worried about? Are the expected if I run or cancel a query?
z
I don't think you need to worry about them. We're being overly noisy about cleaning up older queries.
But it is strange that you still see them even though Redis is empty... You shouldn't be able to hit that code path if Redis is empty.
c
Anything else I can do to debug right now?
z
Is it still happening even after verifying Redis empty?
c
Yup, I also redeployed our fleet (in kubernetes)
z
Are you seeing those errors in the Fleet server logs or in the logs that osquery clients are writing to Fleet?
Can you paste a full log line?
c
That was from Fleet, one sec. I may have misspoken.
I am still seeing them. 9k in the last 15 minutes.
Copy code
{
  "component": "service",
  "err": "retrieve live queries: receive sql: redigo: nil returned",
  "ip_addr": "10.125.6.0:19173",
  "level": "info",
  "method": "GetDistributedQueries",
  "took": "12.15416ms",
  "ts": "2021-04-09T18:50:47.401563414Z",
  "x_for_ip_addr": "10.127.50.32"
}
z
Is it possible your Redis UI is connected to a different DB than Fleet? Can you verify by running a live query and seeing that some keys appear?
c
Running one now. I see
livequery
and
sql
Canceled it and those went away
z
And does live query actually work? I'm just looking at the code and it seems like it should not be possible to hit that line if there are no keys at all.
c
Yup, last time I let it run for a while I got
⠓ 59% responded (100% online) | 7169/12159 targeted hosts (7169/7205 online)
before it felt like it wasn't going to get any more.
And a bunch of data flowed in
There are keys when the query runs, when I cancel it they get cleaned up.