Starting with 4.1.1, I am unable to use the system...
# linux
s
Starting with 4.1.1, I am unable to use the systemd
ProtectSystem=strict
feature. The daemon does not start, but no logs are produced. Does anyone know what changed between 4.0.2 and 4.1.1 that may have caused this? Here’s my full service unit config:
Copy code
[Unit]
Description=The osquery Daemon
After=network.service syslog.service

[Service]
TimeoutStartSec=0
EnvironmentFile=/etc/default/osqueryd
ExecStartPre=/bin/sh -c "if [ ! -f $CONFIG_FILE ]; then echo {} > $CONFIG_FILE; fi"
ExecStartPre=/bin/sh -c "if [ ! -f $FLAG_FILE ]; then touch $FLAG_FILE; fi"
ExecStartPre=/bin/sh -c "if [ -f $LOCAL_PIDFILE ]; then mv $LOCAL_PIDFILE $PIDFILE; fi"
ExecStart=/usr/bin/osqueryd \
  --flagfile $FLAG_FILE \
  --config_path $CONFIG_FILE
Restart=on-failure
KillMode=process
KillSignal=SIGTERM
ProtectSystem=strict
ReadWritePaths=/var/osquery /var/run /var/tmp /tmp

[Install]
WantedBy=multi-user.target
t
Do you mean the daemon does start?
s
It does not start. This is all I get from journald
Copy code
Apr 03 23:29:50 HOST systemd[1]: osqueryd.service: Start request repeated too quickly.
Apr 03 23:29:50 HOST systemd[1]: osqueryd.service: Failed with result 'exit-code'.
Apr 03 23:29:50 HOST systemd[1]: Failed to start The osquery Daemon.
t
Recently we switched the kill mode to be control-group
Sorry for terse responses (hands are full right now). The PR for that change has more context on the underlying issue
s
On 4.1.1 it’s still set to process, and neither 4.2.0 or 4.1.1 work
t
I think changing your service will fix the issue (fingers crossed)
This is bug that shows up rarely
s
Or 4.1.2 FWIW
The weird thing is, if I remove the
ProtectSystem=strict
option the service starts. Then, after it has run successfully, I can add the strict setting back and restart the service and it continues to work.
s
I’m not super familiar with that option, but as documented:
Copy code
If set to "strict" the entire file system hierarchy is mounted read-only, except for the API file system subtrees /dev, /proc and /sys
Doesn’t that mean it can’t write it’s pid file or local database files? Does adding
--verbose
add any clues?
s
I have granted it write access to
/var/osquery /var/run /var/tmp /tmp
(see the immediately following line in my config) which works on 3.3.2, 3.3.4, and 4.0.2. It is running as verbose in my test VM.
t
Definitely change the KillMode regardless it fixes a but that was introduced somewhere in the 3.x line
Fixes a bug*
s
I have tested on 4.2.0 as well which includes the KillMode change
I was using 4.1.1 as an example because the bug was introduced somewhere between 4.0.2 and 4.1.1