hi everyone, I had a question about osquery and ya...
# general
j
hi everyone, I had a question about osquery and yara performance. I'm trying to perform a scan on a large section of the filesystem with wildcards and it has multiple gb of content. the performance between just using the yara cli (with -r) and osquery with like and wildcards is pretty different. does anyone have any insights on why this is / could help evaluate tuning this? please let me know if there is a better place to reach out, thanks!
s
You may want to look into reducing the
--yara_delay
value, which by default is 50ms
Copy code
osqueryd --help | grep yara
    --yara_delay VALUE                               Time in ms to sleep after scan of each file (default 50) to reduce memory spikes
f
there are also some resources that can help you to test the actual performance of your yara query, seemingly insignificant coding choices can make huge differences. can you share you current query with or without the yara?
d
Also what version of YARA cli did you use? Current osquery YARA version is 4.2.3
j
ah, thanks for all the help; seems like yara_delay was the trick here (setting it to 0 with the cli made the performance basically identical), I missed that there were more options besides the ones in the online doc 😞 for context, my yara rule was extremely simple (just a specific string match in the file) so I would've expected performance to be pretty much identical to the yara cli tool
not sure if there is another setting / flag but now I'm noticing that the osquery yara command doesn't return results for wildcard matches when the file is a few directories deep (even with the %%) wildcard, any ideas what I can do there? similar to before I couldn't find any docs both online and from the cli so any pointers would be helpful. one example of the situation is where this command returns a result within a/a/b/c/d/e/file
Copy code
select * from yara where path like '/a/b/c/%%' and sigfile='sig.yara' and count > 0;
but this command (which should include everything returned from the previous query) doesn't have any results
Copy code
select * from yara where path like '/a/%%' and sigfile='sig.yara' and count > 0;
it's especially weird because the first query (which executes deeper), returns matches at e/file.txt and e/something/file2.txt
ah following up here again, it looks like this may be because when there is a recursive symlink osquery will just exit and the directory I was working with, small hidden footgun 😭 but thanks for all the guidance and help!
f
you could try joining this query against the file table which could give you added conditions like "type = 'regular'" that may be an option to avoid the symlink stuff at the expense of a potentially slower query
👍 1
s
I think they are referring to this problem: https://github.com/osquery/osquery/issues/7291
The yara table as the file table uses the same logic to get the list of files to process when recursion is requested via
%%
, so they'll have the same issue
👍 1
f
so this wouldn't avoid the deep traversal bug?
Copy code
select count(*) from file where path like '/Users/%%' and symlink = 0;
s
No, paths are pre-expanded
f
gotcha
s
A giveaway is also in the full schema/spec: https://github.com/osquery/osquery/blob/master/specs/utility/file.table. Since
symlink
doesn't have any special attribute (
index
,
required
,
additional
), it means that it's not processed by the logic of the table, and it just behaves as a filter applied by sqlite on top of the results from the table.