Fleet 2.6.0 / Osquery 4.4.0 on W10 with Launcher ...
# kolide
d
Fleet 2.6.0 / Osquery 4.4.0 on W10 with Launcher v0.11.11 I have a scheduled query for
windows_events
, but am getting the following errors in the W10 application logs:
Copy code
caller=level.go:63 level=info caller=extension.go:494 err="sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8"
Any thoughts as to where to look next?
z
@seph is this the same UTF8 bug we've seen occasionally over quite some time?
IIRC it seemed to be that osquery outputs invalid UTF8
d
I wasnt sure if this error was related to not getting windows_events logs or something else...
z
I think it could definitely be related to that
s
There are handful of utf8 issues in this ecosystem. Osquery sometimes screws up the encoding. This may be a rocksdb issue. There is some info about it at https://github.com/kolide/launcher/pull/481 some more at https://github.com/osquery/osquery/issues/5288
There is also another chunk of windows string encoding issues. The windows sstring functions have been slowly getting reworked.
I haven’t looked at this in awhile though
d
Looks like a major pr related to this was merged a couple weeks ago: https://github.com/osquery/osquery/pull/6338
And this issue looks like what I am running into: https://github.com/kolide/launcher/issues/445
s
Yeah, that’s the windows rework I alluded to. (thanks for finding the PR)
👍 1
We never really came to a good solution for how to handle bad utf8 that osquery emits
d
So that windows rework should make it more unlikely that osquery would emit bad utf8, but not impossible. And when it does, errors like this will show up and the scheduled query will be blocked until the offending log is removed from rocks... Is that accurate?
s
Probably. I haven’t thought about this in awhile, but that sounds right.
I think either: * status quo * launcher reworks the logs * launcher drops the logs * server (fleet in this case) accepts them * server drops them
Pretty sure the SaaS side accepts them.
Less sure what the right approach here is. I’d review/accept a PR to launcher to drop a log on marshalling error. Or attempt to repair it. Not sure how trivial that logic would be. Repair is easy, but the conditional on potential failure seems harder.
d
From an ops side, I would rather see the server (fleet) accept it, and then let me fix it within the parsing pipeline
s
Feel free to comment in any of the linked kolide github issues. this is unlikely to bubble up my list, but the comments would be noted
d
Will do
s
my guess is having fleet pass it along unmodified is relatively hard. (a bunch of the plumbing there is typed)_
d
hmmmmm
s
Not sure though. I don’t work much on fleet
d
@zwass thoughts?
z
I don't think the logs ever hit Fleet. It's an issue with gRPC encoding on the client side. See my comment https://github.com/kolide/launcher/issues/445#issuecomment-601867302
s
Question is what we should do. Launcher could attempt to repair, but we were dicy about that too
z
Oh
Drop the offending character IMO
Unless someone has a strategy for repairing it.
z
Heh I voted against it
s
I know 😛
z
I don't think I realized in that context that it prevents logs from sending entirely via Launcher
s
We might not have realized it at the time.
Anyhow, tht’s the code Nick wrote for herd. (extracted and dropped into launcher)
z
Given that information, I'd say go with this strategy but only for logs that fail send the first time with this error.
s
Want to comment? Using that error handling seems possible, but I’m not sure how yet, Haven’t read the code with that in mind
(comment and re-opened)
z
Do you prefer my comment on this issue or PR?
s
Probably not needed. I think I captured this sentiment in the PR commnt. If you have anything additional feel free to add it
z
Looks good. Thank you
s
Unclear. There are probably multiple places that cause this
d
So a temp fix is to clear rocks and restart launcher?