:thread: on the `unified_log` table
# general
g
🧵 on the
unified_log
table
It looks like we are limited to 100 rows no matter what
timestamp
clause we use. How can we get all of the results out of this table? Or is the recommendation to run it every 60 or 30 seconds or something?
Copy code
osquery> select count(*) from unified_log where subsystem="com.apple.SoftwareUpdate" AND timestamp > (SELECT unix_time from time) - 3600;
count(*) = 100
s
The
max_rows
rows column is how many it’ll fetch from the underlying API
But it’s worth noting the unified log is huge. I don’t think you can get everything out of it with a simple select.
So it has an awkward not-quite-event model.
The suggestion is to set timestamp to
-1
, and it will behave like it’s evented, and a suitable max_rows and query frequency.
How large
max_rows
can be depends a lot on your specific installation environment
g
Can you expand on the “behave like it's evented” statement? So we should just set the time stamp to -1 and call it every x seconds?
With the asl table we scheduled our query for five minutes, which worked okay.
s
The
unified_log
is unique as a table. If you query it normally, it searches the log. The log is very large, thus there is a max_rows parameter. Otherwise an exploratory
select * from unified_log
would overwhelm something. If you include
timestamp = -1
it behaves like a log follower. It tracks the last returned timestamp, and appends that. This feature was contributed to allow someone to pull the whole log. It’s a bit like it’s evented, but implemented very differently. (It’s somewhat beta, it may not work quite right, and it may change, etc) As for what you should do… I don’t know. I imagine up the max_rows to as high as your pipeline can handle, and set timestamp to -1. Then fetch on a suitable interval.
b
If you include
timestamp = -1
it behaves like a log follower. It tracks the last returned timestamp, and appends that.
What do you mean when you say "the last returned timestamp"? we have to run the query multiple times in a row in order to get more and more (hence appened) results?
In any case, I can't figure out how to get any results from unified_log right now:
Copy code
select * from unified_log WHERE process="kernel" AND timestamp=-1 AND max_rows=1000;
Console shows results from this as of 10s ago but not seeing it in the table
z
cc @Daniel Bretón Suárez who wrote the table
IIRC the use case Daniel had in mind was to be able to continually query for the full unified log. Just looked at the code and Seph's syntax is slightly off. What you need is
timestamp > -1
. If you schedule that query with a sufficiently high
max_rows
you should eventually get the entire log and then be essentially "streaming" new results as they come in.
(if your
max_rows
is too small then the log would grow faster than you were collecting it)
d
@Brandon Kurtz as Zach and Joseph said, if you add the condition
timestamp > -1
, the table uses a mode that saves the last log entry it has returned, so the next time you query the table, it will return logs starting from the entry after the last it has return. Imagine we have entries like: | ***last (0)*** | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 |
select * from unified_log where timestamp > -1 and max_rows = 10
It returns the entries ***1st to 10th*** | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 ***last*** | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | And again,
select * from unified_log where timestamp > -1 and max_rows = 10
It returns the entries ***11th to 20th*** | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 ***last*** | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | If the OS deletes entries (we can't control that), you will still be getting sequential logs, but you could lose some. | ***last (20)*** | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | My advice is to set max_rows so that eventually the number of rows returned is less than max_rows. I mean, if max_rows = 100 and you're always getting 100 entries, it means there were more entries in the buffer. If max_rows = 500, and eventually you get i.e. 450 entries, you know you have "emptied" the buffer.
b
Is it possible to go “backwards” with the results or will the pointer moving through the “buffer” always go forward?
I.e if I want to see historical info, I need to record the results of the query myself in soem external system?
z
You can still run a historical search if you leave out the
timestamp
column. If you want the entire log, you would have to record it in an external system.