:wave: Hey - OpenBSM audit system is crashing for ...
# macos
n
👋 Hey - OpenBSM audit system is crashing for some users in my fleet on macOS 10.15.4 (reproduced on various 4.x.y osquery versions up to 4.3.0). I can detect this problem by looking for the presence of
<some_timestamp>.crash_recovery
log files in
/var/audit
. When the audit system crashes, osquery stops receiving events from
process_events
table. When the system is restarted,
process_events
will start going through again, since the audit subsystem is restarted. 1. (for a temporary fix) Is there a way to make the audit subsystem recover without rebooting the machine? The
man audit
suggests you should be able to do
sudo audit -i
to reinitialize the system. However, on doing this - it doesn't clear out the crash_recovery file, and process_events don't actually start getting processed again, including after restarting osquery. 2. (troubleshooting) Are there any good tools that can parse the audit binary log files? Trying to see if I can find any meaningful leads on why it crashed. 3. Has anyone else run into this and have any suggestions?
2) on macOS, praudit can be used to view them. 🙂 so hey answered that question. If anyone has a Linux version / knows how to read on Linux, that would also be helpful.
t
Interesting, so is it osquery’s usage of OpenBSM that influences the crash? Do you know if it then crashes for every process using audit/does it stop the logging on the system or just for osquery?
t
Here is an additional data point. I currently run osquery with the disable-audit flag set to true and I have many crashes in my folder just from this month
My
audit_control
file also has not been modified since the OS was first installed earlier this year
b
We're not actually able to determine if osquery's usage is the cause. The data provided in the _crash_recovery_ files does not indicate what caused the crash, just that a crash happened and it recovered. In the failure state, events continue to emit to the
.not_terminated
file, but log volume is severely reduced and with only events values of: • SecSrvr AuthEngine • user authentication Examples of those two events with user info redacted:
Copy code
<record version="11" event="user authentication" modifier="0" time="Thu Apr 30 16:28:19 2020" msec=" + 158 msec" >
<subject audit-uid="502" uid="502" gid="20" ruid="502" rgid="20" pid="11031" sid="100011" tid="2686386 0.0.0.0" />
<text>Verify password for record type Users &apos;user1&apos; node &apos;/Local/Default&apos;</text>
<return errval="failure: Unknown error: 255" retval="5000" />
<identity signer-type="1" signing-id="com.apple.opendirectoryd" signing-id-truncated="no" team-id="" team-id-truncated="no" cdhash="0x1f5920de3532b6fae4f8050f2c7f507b5bbe838a" />
</record>

<record version="11" event="SecSrvr AuthEngine" modifier="0" time="Thu Apr 30 17:22:08 2020" msec=" + 661 msec" >
<subject audit-uid="-1" uid="0" gid="0" ruid="0" rgid="0" pid="16775" sid="100000" tid="2701830 0.0.0.0" />
<text>begin evaluation</text>
<return errval="success" retval="0" />
<identity signer-type="1" signing-id="com.apple.authd" signing-id-truncated="no" team-id="" team-id-truncated="no" cdhash="0xda52fe385f41ebc0f7fb14140bea0dfc97ac5644" />
</record>
n
I'm not saying it's osquery's fault per se. I just am not that familiar with macOS internals and struggling to understand how to dig into it properly. And I hoped maybe there was someone around with a bit deeper expertise in this area. But regardless of it being caused by osquery or not, it certainly affects osquery and our usage of it a fair bit. I actually have seen it on other macOS versions, but the majority of our mac fleet is on 10.15.4. Seen (in decreasing order of frequency on our fleet): 10.15.4, 10.14.6, 10.15.3, 10.15.5, 10.13.6, 10.14.3, 10.15.1, 10.15.2 But also like... the vast majority of our fleet is on 10.15.4 and 10.14.6 so it's hard to say if it shows up on all versions and if so, if they're the same cause...
https://github.com/osquery/osquery/issues/6431 Raised this for tracking purposes
@terracatta - do you have any other process running that would read the audit socket that might trigger this crash? Trying to get more info about what's going on 🤷
t
@nyanshak I just went over to my wife's iMac which is just used for web browsing, has a few games on it, adobe programs and it has about 15 crash reports in it
never has run osquery or any other security software
n
when you say "15 crash reports" - what do you mean? Like
/var/audit/*.crash_recovery
files? Or something else?
t
Like 
/var/audit/*.crash_recovery
Yes
n
Oh interesting, the ones I've checked have only had one crash file in the cases I checked. I didn't check all my hosts though