Title
#general
c

ccc

01/31/2019, 6:33 PM
i can’t figure out why my unwanted chrome extensions query isn’t working.
SELECT * FROM users JOIN chrome_extensions USING (uid);
works and returns the list of extensions, but
SELECT * FROM users JOIN chrome_extensions USING (uid) WHERE identifier='gkojfkhlekighikafcpjkiklfbnlmeio';
returns empty even though its there
zwass

zwass

01/31/2019, 6:35 PM
try replacing
JOIN
with
CROSS JOIN
c

ccc

01/31/2019, 6:35 PM
oh sweet that works
6:36 PM
is this something wrong with my set up or should the packs be updated to use cross join?
zwass

zwass

01/31/2019, 6:38 PM
Use
CROSS JOIN
for those queries where you need to join against the users table. The sqlite optimizer will sometimes change the order of the join otherwise and cause false negatives.
clong

clong

01/31/2019, 8:44 PM
@zwass do you think the query packs should be updated to use cross join everywhere that a query is being joined against the users table?
zwass

zwass

01/31/2019, 8:45 PM
Yes. But when that is done, we need to ensure that the users table appears before the
cross join
.
clong

clong

01/31/2019, 8:46 PM
ah okay, that’s good to know as i think we’re doing that in a LOT of places
defensivedepth

defensivedepth

01/31/2019, 9:50 PM
@zwass We should get that snippet of wisdom somewhere in the docs / FAQ
v

Vibhor

02/01/2019, 12:30 AM
Instead of CROSS JOIN, I would recommend to use subquery or CTE
12:30 AM
Something like:
SELECT * 
FROM   ( 
                  SELECT     * 
                  FROM       users 
                  cross join chrome_extensions 
                  using      (UID)) foo 
WHERE  foo.identifier = 'blpcfgokakmgnkcojhhkbfbldkacnbeo' ;

WITH foo AS 
( 
           SELECT     * 
           FROM       users 
           cross join chrome_extensions 
           using      (UID)) 
SELECT * 
FROM   foo 
WHERE  identifier = 'blpcfgokakmgnkcojhhkbfbldkacnbeo';
12:32 AM
Please note CROSS JOIN or Cartieian product may give wrong results and can be resource intensive, depending on how much data its going to generate
gguli

gguli

02/01/2019, 10:57 AM
Sounds like a bug. Thanks for details @Vibhor @zwass
11:04 AM
Bad thing about cross join is that, we will prohibit sqlite for any optimization. Will open issue and reference to that discussion
11:08 AM
Hmm, in my case, it actually works. Is it OS specific thing?(Tried on macOS)
v

Vibhor

02/01/2019, 2:27 PM
What is the version of MacOSX? I can reproduce the problem on Mac OSX 10.14.3
zwass

zwass

02/01/2019, 4:20 PM
@Vibhor you are making assertions here that are not correct. Please see the SQLite docs (https://www.sqlite.org/lang_select.html). I’ll include some excerpts from the docs below:
4:21 PM
All joins in SQLite are based on the cartesian product of the left and right-hand datasets.
4:22 PM
There is no difference between the "INNER JOIN", "JOIN" and "," join operators. They are completely interchangeable in SQLite. The "CROSS JOIN" join operator produces the same result as the "INNER JOIN", "JOIN" and "," operators, but is handled differently by the query optimizer in that it prevents the query optimizer from reordering the tables in the join.
4:27 PM
Note the docs indicate that the results will be the same, but this is not the case because changing the loop nesting ordering actually changes the results of the tables in osquery (remember that some tables only return data when they have the appropriate constraints in the WHERE clause). This would not be a property that the SQLite optimizer expects.
4:28 PM
There is also a good explanation for why it works for some folks and not others: the optimizer is making different decisions about reordering tables.
4:29 PM
@gguli per the docs, cross join will not prohibit all optimizations... it will just prohibit the one that causes incorrect results.
4:30 PM
@Vibhor it is entirely possible to rewrite this query without cross join by using a sub query and IN clause, however your examples both still include cross join.
v

Vibhor

02/01/2019, 4:38 PM
Hi @zwass I didn’t check the sqlite doc. However its too bad that there is no difference in sqlite engine for CROSS JOIN and INNER JOIN. IIRC, CROSS JOIN and INNER JOIN are two different operators. In set theory, its A x B Vs A intersection B.
4:40 PM
In future if sqlite team fix this, then whoever is using cross join querypack will end up in different issue (that is re-writing their query packs)
4:40 PM
Also, in my example, I used inner join not really cross join. Its a different case, that currently sqlite doesn’t differentiate between these two operators
zwass

zwass

02/01/2019, 4:41 PM
Please read the docs
4:42 PM
There is a variety of misinformation you are providing.
zwass

zwass

02/01/2019, 4:43 PM
The SQLite team will not fix this because it is caused by the osquery table implementation violating laws of the relational algebra.
v

Vibhor

02/01/2019, 4:45 PM
I see. My comments were based on general differences between two operators
zwass

zwass

02/01/2019, 4:49 PM
You are linking images for SQL server as far as I can tell. From the SQLite docs:
All joins in SQLite are based on the cartesian product of the left and right-hand datasets.
If the join-operator is "CROSS JOIN", "INNER JOIN", "JOIN" or a comma (",") and there is no ON or USING clause, then the result of the join is simply the cartesian product of the left and right-hand datasets.
4:51 PM
Also
The special handling of CROSS JOIN is an SQLite-specific feature and is not a part of standard SQL.
zwass

zwass

02/01/2019, 4:54 PM
Any join without a constraint generates a Cartesian product.
4:55 PM
It’s up to the query optimizer to apply any constraints appropriately so as to not actually waste the cpu generating the Cartesian product and then selecting on it.
5:06 PM
@gguli one more thing I realize is that you must run osquery as the root user in order to see the bug (because the table implementations only violate the relational algebra laws when running as root).
gguli

gguli

02/01/2019, 5:10 PM
Thank you for looking into that @zwass. As you mentioned, osquery is definitely very strange case for sqlite, because of required, additional, optimization queries. (enum ColumnOptions). We, trick sqlite to provide us required column constraints when possible virtual_table.cpp:767 . Do you think, similar thing can help sqlite in case of ADDITIONAL and OPTIMIZED?
zwass

zwass

02/01/2019, 5:11 PM
I think it may be possible to nudge sqlite into using the correct ordering for these tables that require the constraint.
5:46 PM
Actually @gguli they even give some explicit advice that looks relevant to our situation: https://www.sqlite.org/vtab.html#return_value. Tables like
chrome_extensions
actually are table valued functions with required constraints.
gguli

gguli

02/01/2019, 6:03 PM
Did not know about that. It's cool, we can use SQLITE_CONSTRAINT for required columns and manipulate with high costs in case of ADDITIONAL and OPTIMIZED. Are you be working on that or should I put it in my todo list? 😄
zwass

zwass

02/01/2019, 6:04 PM
I'm taking a look into it right now. Currently struggling with my buck build which seems to be picking up incorrect dependencies from homebrew.
6:07 PM
Oh, anything I can help?
zwass

zwass

02/01/2019, 6:07 PM
Let's take it to #core