Hey Folks i noticed that osquery when you stop/res...
# general
Hey Folks i noticed that osquery when you stop/restart it still locking files and cant start clean untill you kill that process.
Copy code
Feb 26 09:37:33 osquery-1 systemd[1]: Stopping The osquery Daemon...
Feb 26 09:37:33 osquery-1 systemd[1]: Stopped The osquery Daemon.
Copy code
lsof |grep LOCK
osqueryd   3892           root    4uW     REG                8,1         0  109052187 log/osquery/db/osquery.db/LOCK
rocksdb:b  3892  3894     root    4uW     REG                8,1         0  109052187 log/osquery/db/osquery.db/LOCK
rocksdb:b  3892  3895     root    4uW     REG                8,1         0  109052187 /log/osquery/db/osquery.db/LOCK
rocksdb:b  3892  3896     root    4uW     REG                8,1         0  109052187 /log/osquery/db/osquery.db/LOCK
rocksdb:b  3892  3897     root    4uW     REG                8,1         0  109052187 /log/osquery/db/osquery.db/LOCK
rocksdb:b  3892  3971     root    4uW     REG                8,1         0  109052187 /log/osquery/db/osquery.db/LOCK
Extension  3892  3972     root    4uW     REG                8,1         0  109052187 /log/osquery/db/osquery.db/LOCK
Extension  3892  3973     root    4uW     REG                8,1         0  109052187 /log/osquery/db/osquery.db/LOCK
ConfigRef  3892  3974     root    4uW     REG                8,1         0  109052187 /log/osquery/db/osquery.db/LOCK
Copy code
ps aux|grep osquery
root      3892  0.0  0.4 324152 18112 ?        SNl  Feb24   0:02 /usr/bin/osqueryd
when starting the service without killing the old one.
Copy code
Feb 26 09:42:16 osquery-1 osqueryd[11789]: I0226 09:42:16.969327 11791 rocksdb.cpp:149] Rocksdb open failed (5:0) IO error: While lock file: /log/osquery/db/osquery.db/LOCK: Resource temporarily unavailable
Only a single process can use the underlying database at a time. So that you can’t start a new one till the old one exits, is expected.
Are you saying that the service does not correctly stop under systemd?
yes, that what i meant. when you restart the service and check the status it give you the last error i shared, and when checking
you see that the old process still there locking the resources
I’m not sure what’s up with that, but I wanted to clarify it
this behaviour i have noticed on 4 different machines with agent
i’ll check the older version i have in the env
which check them and confirm
seems to work fine.
Does 4.6.0 have this issue?
@theopolis When you get to it, you might want to know…
I have seen this similar behavior sometimes in Windows actually in 4.6.0. For me it happened when osquery was failing tls handshake with fleet, it was not able to stop properly. Typically there are 2 processes running and 1 of them keeps running even when the service is succesfully stopped.
In versions prior to 4.5.1, if osquery failed to open the backing storage RW it would attempt RO. This was not a good experience because it causes unexpected behavior (running without permanent storage). We want osquery to fail explicitly when something like a stale process still maintain the LOCK. We should debug why those processes linger.
A next step includes installing the debug symbols on that host and then attaching to the hung process and looking at the back traces, at where the threads are hung.
Copy code
$ sudo apt-get install osquery-dbgsym
$ pgrep -a osqueryd
4951 osqueryd --ephemeral --disable_database --disable_logging
4953 osqueryd
If there are two osqueryd processes, pick the one that has no command line arguments.
Copy code
$ sudo gdb -p 4953
(gdb) thread apply all bt
Then let me know what you see. You should for example see that
Thread 1
is waiting similar to the following:
Copy code
Thread 1 (Thread 0x7f0333f118c0 (LWP 4953)):
#0  0x00007f03334e3ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5557cc43c588 <osquery::(anonymous namespace)::kShutdownData+40>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x5557cc43c590 <osquery::(anonymous namespace)::kShutdownData+48>, cond=0x5557cc43c560 <osquery::(anonymous namespace)::kShutdownData>) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x5557cc43c560 <osquery::(anonymous namespace)::kShutdownData>, mutex=0x5557cc43c590 <osquery::(anonymous namespace)::kShutdownData+48>) at pthread_cond_wait.c:655
#3  0x00005557cc2111ab in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) ()
#4  0x00005557cbecd545 in std::__1::condition_variable::wait<osquery::waitForShutdown()::$_0>(std::__1::unique_lock<std::__1::mutex>&, osquery::waitForShutdown()::$_0) (__lk=..., this=<optimized out>, __pred=...) at /usr/local/osquery-toolchain/usr/bin/../include/c++/v1/__mutex_base:364
#5  osquery::waitForShutdown () at /build/usr/src/debug/osquery/osquery/core/shutdown.cpp:52
#6  0x00005557cbd355bd in osquery::Initializer::waitForShutdown (this=<optimized out>) at /build/usr/src/debug/osquery/osquery/core/init.cpp:643
#7  0x00005557ca985df0 in osquery::startDaemon (runner=...) at /build/usr/src/debug/osquery/osquery/main/main.cpp:113
#8  0x00005557ca9862ae in osquery::startOsquery (argc=<optimized out>, argv=<optimized out>) at /build/usr/src/debug/osquery/osquery/main/main.cpp:189
#9  0x00005557ca9859bb in main (argc=-867973752, argv=0x80) at /build/usr/src/debug/osquery/osquery/main/posix/main.cpp:31