Title
#sql
w

Wes

11/10/2021, 7:34 PM
Hey all, I'm diving in to the implementation details of SQL in osquery to solve #7314. I see there's both some mention of sqlite, as well as
list<map<string,string>>
data structures that represent the tables independently of sqlite. In particular, I'm looking into how
JOIN
works. Any pointers would be much appreciated! Is there a primer on how tables work under the hood?
zwass

zwass

11/10/2021, 7:42 PM
Osquery is built on SQLite's Virtual Table construct. Essentially, SQLite handles all of the query planning and execution and calls the table implementation with the appropriate parameters to generate the data when it's needed. So SQLite is doing the JOIN and providing the constraints in the query context.
s

seph

11/10/2021, 9:25 PM
I’d start by stepping back slightly — ignoring the JOIN, does disk_encryption even report correctly on these disks? I’,m not sure it does. You’d probably need to root around https://github.com/osquery/osquery/blob/master/osquery/tables/system/linux/disk_encryption.cpp
w

Wes

11/11/2021, 2:11 AM
Disk encryption by itself does report correctly (not sure how to correlate the names, but the UUIDs match):
backdoor@backdoor-virtual-machine:~$ sudo osqueryi --line 'select * from disk_encryption where encrypted=1'
  Configuration setting "devices/scan_lvs" unknown.
  Configuration setting "devices/allow_mixed_block_sizes" unknown.
             name = /dev/dm-0
             uuid = 1dEYER-a40T-nn7k-UJ9V-PuYw-fTNI-WypOyE
        encrypted = 1
             type = aes-xts-plain64
encryption_status = encrypted

             name = /dev/dm-1
             uuid = 6da0cb49-fecc-4a12-877f-fa69428189c3
        encrypted = 1
             type = aes-xts-plain64
encryption_status = encrypted

             name = /dev/dm-2
             uuid = 7503f0bf-233e-4fd1-8e37-061b815669f0
        encrypted = 1
             type = aes-xts-plain64
encryption_status = encrypted

sda                                                                                                    
├─sda1                                                                                                 
├─sda2                vfat        FAT32          F120-674E                               506.8M     1% /boot/efi
├─sda3                ext4        1.0            41fddf5d-7746-48e7-a527-c2fe57a0e5b2    398.2M    36% /boot
└─sda4                crypto_LUKS 2              126b908d-9a06-4c0c-af65-8ebf6bd1988f                  
  └─sda4_crypt        LVM2_member LVM2 001       1dEYER-a40T-nn7k-UJ9V-PuYw-fTNI-WypOyE                
    ├─vgubuntu-root   ext4        1.0            6da0cb49-fecc-4a12-877f-fa69428189c3     10.8G    33% /
    └─vgubuntu-swap_1 swap        1              7503f0bf-233e-4fd1-8e37-061b815669f0                  [SWAP]
This query is interesting to me -- it returns one row!
$ sudo osqueryi --line 'select encrypted from disk_encryption as d join mounts as m on m.device_alias = d.name where d.name in (select name from disk_encryption where encrypted=1)'
  Configuration setting "devices/scan_lvs" unknown.
  Configuration setting "devices/allow_mixed_block_sizes" unknown.
encrypted = 0
But the more obvious query returns zero:
$ sudo osqueryi --line 'select name,d.type,encrypted from disk_encryption as d join mounts as m on m.device_alias = d.name where encrypted=1'|wc -l
  Configuration setting "devices/scan_lvs" unknown.
  Configuration setting "devices/allow_mixed_block_sizes" unknown.
0
s

seph

11/11/2021, 2:39 AM
The latter one is a JOIN. Which does different things —I’m never sure which way the join goes. Do you get different things if you reverse the table order, for example?
2:40 AM
For former one is more functional. Do X, then do Y with the output.
w

Wes

11/11/2021, 6:26 PM
I've narrowed this down a bit, and I don't think it's caused by SQL -- though I might ask for some help making the SQL more performant. I think the bug in question is caused by how the constraint in the virtual table is being handled, specifically, the EQUALS constraint is checked as an optimization, but it prunes too many entries. In the disk_encryption case, the filesystem is hierarchical, and to arrive at the correct conclusion about children, information about the parents is needed. So when there's an EQUALS constraint, the parent information is pruned too early. When there's a LIKE constraint, the parent information is available, and the correct conclusion is reached. I much appreciate your responses here!