i can’t figure out why my unwanted chrome extensio...
# general
c
i can’t figure out why my unwanted chrome extensions query isn’t working.
SELECT * FROM users JOIN chrome_extensions USING (uid);
works and returns the list of extensions, but
SELECT * FROM users JOIN chrome_extensions USING (uid) WHERE identifier='gkojfkhlekighikafcpjkiklfbnlmeio';
returns empty even though its there
z
try replacing
JOIN
with
CROSS JOIN
c
oh sweet that works
is this something wrong with my set up or should the packs be updated to use cross join?
z
Use
CROSS JOIN
for those queries where you need to join against the users table. The sqlite optimizer will sometimes change the order of the join otherwise and cause false negatives.
🙌 1
🆒 1
👍 2
c
@zwass do you think the query packs should be updated to use cross join everywhere that a query is being joined against the users table?
z
Yes. But when that is done, we need to ensure that the users table appears before the
cross join
.
c
ah okay, that’s good to know as i think we’re doing that in a LOT of places
d
@zwass We should get that snippet of wisdom somewhere in the docs / FAQ
👍 2
v
Instead of CROSS JOIN, I would recommend to use subquery or CTE
Something like:
Copy code
SELECT * 
FROM   ( 
                  SELECT     * 
                  FROM       users 
                  cross join chrome_extensions 
                  using      (UID)) foo 
WHERE  foo.identifier = 'blpcfgokakmgnkcojhhkbfbldkacnbeo' ;

WITH foo AS 
( 
           SELECT     * 
           FROM       users 
           cross join chrome_extensions 
           using      (UID)) 
SELECT * 
FROM   foo 
WHERE  identifier = 'blpcfgokakmgnkcojhhkbfbldkacnbeo';
Please note CROSS JOIN or Cartieian product may give wrong results and can be resource intensive, depending on how much data its going to generate
g
Sounds like a bug. Thanks for details @Vibhor @zwass
👍 1
Bad thing about cross join is that, we will prohibit sqlite for any optimization. Will open issue and reference to that discussion
👍 1
Hmm, in my case, it actually works. Is it OS specific thing?(Tried on macOS)
v
What is the version of MacOSX? I can reproduce the problem on Mac OSX 10.14.3
z
@Vibhor you are making assertions here that are not correct. Please see the SQLite docs (https://www.sqlite.org/lang_select.html). I’ll include some excerpts from the docs below:
All joins in SQLite are based on the cartesian product of the left and right-hand datasets.
There is no difference between the "INNER JOIN", "JOIN" and "," join operators. They are completely interchangeable in SQLite. The "CROSS JOIN" join operator produces the same result as the "INNER JOIN", "JOIN" and "," operators, but is handled differently by the query optimizer in that it prevents the query optimizer from reordering the tables in the join.
Note the docs indicate that the results will be the same, but this is not the case because changing the loop nesting ordering actually changes the results of the tables in osquery (remember that some tables only return data when they have the appropriate constraints in the WHERE clause). This would not be a property that the SQLite optimizer expects.
There is also a good explanation for why it works for some folks and not others: the optimizer is making different decisions about reordering tables.
@gguli per the docs, cross join will not prohibit all optimizations... it will just prohibit the one that causes incorrect results.
@Vibhor it is entirely possible to rewrite this query without cross join by using a sub query and IN clause, however your examples both still include cross join.
v
Hi @zwass I didn’t check the sqlite doc. However its too bad that there is no difference in sqlite engine for CROSS JOIN and INNER JOIN. IIRC, CROSS JOIN and INNER JOIN are two different operators. In set theory, its A x B Vs A intersection B.
In future if sqlite team fix this, then whoever is using cross join querypack will end up in different issue (that is re-writing their query packs)
Also, in my example, I used inner join not really cross join. Its a different case, that currently sqlite doesn’t differentiate between these two operators
z
Please read the docs
There is a variety of misinformation you are providing.
z
The SQLite team will not fix this because it is caused by the osquery table implementation violating laws of the relational algebra.
v
I see. My comments were based on general differences between two operators
z
You are linking images for SQL server as far as I can tell. From the SQLite docs:
All joins in SQLite are based on the cartesian product of the left and right-hand datasets.
If the join-operator is "CROSS JOIN", "INNER JOIN", "JOIN" or a comma (",") and there is no ON or USING clause, then the result of the join is simply the cartesian product of the left and right-hand datasets.
Also
The special handling of CROSS JOIN is an SQLite-specific feature and is not a part of standard SQL.
z
Any join without a constraint generates a Cartesian product.
It’s up to the query optimizer to apply any constraints appropriately so as to not actually waste the cpu generating the Cartesian product and then selecting on it.
@gguli one more thing I realize is that you must run osquery as the root user in order to see the bug (because the table implementations only violate the relational algebra laws when running as root).
g
Thank you for looking into that @zwass. As you mentioned, osquery is definitely very strange case for sqlite, because of required, additional, optimization queries. (enum ColumnOptions). We, trick sqlite to provide us required column constraints when possible virtual_table.cpp:767 . Do you think, similar thing can help sqlite in case of ADDITIONAL and OPTIMIZED?
z
I think it may be possible to nudge sqlite into using the correct ordering for these tables that require the constraint.
Actually @gguli they even give some explicit advice that looks relevant to our situation: https://www.sqlite.org/vtab.html#return_value. Tables like
chrome_extensions
actually are table valued functions with required constraints.
g
Did not know about that. It's cool, we can use SQLITE_CONSTRAINT for required columns and manipulate with high costs in case of ADDITIONAL and OPTIMIZED. Are you be working on that or should I put it in my todo list? 😄
z
I'm taking a look into it right now. Currently struggling with my buck build which seems to be picking up incorrect dependencies from homebrew.
Oh, anything I can help?
z
Let's take it to #core