Hi, everyone. I’m facing some issues with the file...
# general
f
Hi, everyone. I’m facing some issues with the filesystem logging option, on which hopefully someone can shed some light: • In a fairly heavy load environment, I’ve set osquery to log all events from a Windows machine in the local filesystem as the only output. The result is a
osqueryd.results.log
file that grows in size very quickly. Ideally, the size of the results file should be kept under control. • I configured the -
-logger_rotate
and associated options to keep a maximum number of files and their respective sizes in a predictive way (e.g., 10 files of up to 250MiB each). This works well and I see the files created correctly, moving the older ones to the
.zst
archives, etc. • The actual problem is once I hit the total limits (maximum number of files and archives). As stated in the documentation, osquery drops the overflowing events. As much as this being the designed behaviour, I would expect the possibility for osquery to manage the housekeeping of the existing files, giving the choice of working as stated before -i.e., dropping any new events-, or automatically rotating the files, deleting the older ones and always logging newer events. • Furthermore, in the documentation, it is mentioned the possibility of older files to be removed but I’m not quite getting how this can be enabled… or if I’m interpreting this correctly. The actual reference in https://osquery.readthedocs.io/en/stable/installation/cli-flags/ is under the `--logger_rotate_max_files = 25`: “_[…] If a rotation happens after hitting this max, the oldest file will be removed_.“. So, the questions, after such long explanations, are: is rotated logs deletion delegated to external tools (as suggested, e.g.,, in here ? What are the recommended best practices in multi-OS environments (e.g., use task manager + cron)? Is there any chance this option to be incorporated into osquery e.g., as an extra flag to the baseline
--logger_rotate
functionality?
👍 1
👀 2
g
Hey, General question what is the value proposition you’re trying to achieve with this solution. What happens to the log files once they’re on disk and how are these useful ? Also can you clarify the following
I've set osquery to log all events from a Windows machine in the local filesystem as the only output
What events are you referring to here ?
s
A couple of caveats…. First, I’m not deeply familiar with these options, so I might not understand their behaviors. Second, the logger rotate stuff is pretty new, so there might be rough edges.
s
Hey @Francisco Huerta, I'm unsure about this statement too: "The actual problem is once I hit the total limits (maximum number of files and archives). As stated in the documentation, osquery drops the overflowing events." osquery drops the overflowing events in the database, when they go over the amount specified in
events_max
the log file are "separate" in the sense that they come into action when there's a query that retrieves the events and which causes the logger to write the rows as results in the files.
s
Okay, that said. I would not expect osquery to drop events when the max number of files are reached. Where in the docs does it say that? I would expect osquery to drop events if the maximum number of buffered events is reached. I don’t know how likely that is if you’re writing to the file system, but that’s the case the event loss is for. Internal buffer overflows while waiting to send to remote logging systems
s
(oups haha timings)
s
Same answer at least 🙂
What “best practice” is is very site specific. It might be to not use the filesystem logger, and ship logs centrally. It might be to use osquery’s log rotation. With the log rotation, it’s probably not to use cron+logrotate.
f
@Gavin @seph @Stefano Bonicatti, first of all, thanks a ton for your replies and feedback.
To elaborate a bit more on the use case I’m pursuing: I decided to give it a shot to a combination of filesystem logging + fluent bit as log shipping engine. The reason for that is twofold: • Trying to find a higher performance solution for a particular problem we have, which is the centralization of Windows Events in scenarios where the number of those to be processed is > 200EPS. • Commonly, our deployment scenarios combine the need to collect data from the results of osquery queries plus the contents of some additional log files (e.g., ISS or Apache, etc.).
s
My gut sense is that writing 200EPS to disk and having osquery juggle log files is going to be worse than finding a way to stream it off the machine. On unix, remote syslog would be common. I’m not sure what options for windows are. I could brainstorm, but I haven’t done that work. I suspect what you’re describing would work, though I’m not quite sure how the intersection of osquery’s log rotation and fluent bit intersect. I’d expect fluent to have some reasonable guides about this
f
My interpretation of the events ‘loss’ issue was that, once I reach the configured log rotation limits -say, 5 files, 25MiB each-, I see something like
osqueryd.results.log
,
losqueryd.results.log.[1..4]
and the respective
.zst
files. Then, I stop seeing any new events flowing to the central collection platform.
s
I would not expect that behavior. I would expect a log file to be deleted, and the rest rotated up.
f
I considered fluent-bit not being able to detect the rotation in the log files for some reason (indexes not updated properly, or something like that), but since I don’t see any changes in the file sizes I kind of stopped looking into that possibility).
In any case, and as per your comments, it seems that my initial thinking of osquery actually taking care of rotating files and purging older events automatically without losses in newer events should be the expected behavior. I’ll give it a couple of tries to see if I can get some extra details.
Hi all. For what is worth, I continued with some tests and verified that the rotation is working perfectly. The issue I was experiencing was related to an erroneous configuration of the fluentbit agent.