hi team, this may be the wrong place to ask this, ...
# core
t
hi team, this may be the wrong place to ask this, but I am working a ton with osquery data outputs these days and this won't stop anytime soon. I have osquery running across all endpoints via #C01DXJL16D8 and this works great, but the problem I am running into is that the JSON output is not great data. I see lots of blank strings where there should be
null
values in the JSON output. This matters downstream a lot when data modeling the output from osquery. When I type cast a query result that should be an integer, but the JSON value is a blank string instead of a
null
value then I cannot really typecast a nothing to a datatype. So, I have to implement tons of
NULLIF
in my data models. This is not a huge deal as I am able to easily work around it, but the labor is definitely increased for every time I want to make the osquery data very useful downstream in our data platform. So, should I file a gh issue against the main repo? I am still new to osquery and not 100% sure how to report a problem like this
m
I think that's definitely a valid issue to file on the osquery repo: https://github.com/osquery/osquery Even though I think a lot of the tables are expected to return empty strings for data that isn't available across platforms, perhaps it could be trivial to clean it up before conversion to JSON
ty 1
t
if anyone has any questions on this I would be more than happy to explain in more detail or give examples
m
Just to set expectations though, issues are only worked on by volunteers or people sponsored with time to work on them by an employer, so there's a big backlog
t
yup sure thing I get it, open source maintainers aren't here to fix my exact needs
😄 1
f
can you work around this by using COALESCE() in your queries? This may help to surface where the actual bug is?
t
this is a downstream problem in my view, and yes I can work around it with
NULLIF
so I see this as a qualify of life feature request personally
👍 1
f
that in theory would prevent you from ever have NULL (and presumably not getting "" values as well?
s
I’m pretty sure there are issues about this already. IIRC one of the problems is that NULL breaks things if another osquery table is querying the first onbe
t
gotcha well from a data modeling perspective, folks are very likely to follow the db schema docs and see that a
child_pid
is an integer, but when there is no
child_pid
the JSON return of the results is actually a blank string
s
Yeah, I don’t mean to say this isn’t kinda ugly. I agree it’s ugly, and I know people have thought about fixing it it and not made progress. I don’t know how hard it really is.