I've been struggling with a launcher/osquery probl...
# kolide
m
I've been struggling with a launcher/osquery problem for a few days now. We have a few Server 2019 systems that this is happening on, and from what I can tell it appears that launcher can't communicate with the osquery process, but I'm definitely no expert at reading the debug information. I'll post more in replies.
The affected systems look very similar to me compared to the systems where things are working, but I'm assuming there has to be some difference so I'm still searching. Same OS/Build and same patch level.
Here's a debug file output from the following attempt to run launcher manually:
Copy code
"C:\Program Files\Kolide\Launcher-launcher\bin\launcher.exe" --autoupdate --debug --debug_log_file "launcher-debug.log" --osquery_verbose --update_channel stable --hostname <http://uscolo1fleet01.corp.dir.spartech.com:8080|uscolo1fleet01.corp.dir.spartech.com:8080> --enroll_secret_path "C:\Program Files\Kolide\Launcher-launcher\conf\secret" --root_directory "C:\Program Files\Kolide\Launcher-launcher" --osquery_flag host_identifier=hostname
I've tried completely removing our AV/endpoint product, just in case that was interfering, but there's no change in behavior.
I've tried uninstalling and wiping the Kolide program files folder, and then re-installing.
Also I've tried with and without autoupdate, with no change. It's almost as if some pipe communication is not happening between launcher and osqueryd? I see the osqueryd.exe and osquery-extension.exe processes running while launcher is retrying to communicate with them.
s
That’s a weird
--root_directory
— it’s not the root of the launcher install, it should be the data directory where launcher and osquery write their db to
The basic thing I see, is that osquery starts, but when launcher tries to start using it, it’s not there on the socket. I don’t see any osquery errors, but I wonder if its just taking a weirdly long time to start up
One thing to try, would be manually launching osquery. Does it start up promptly, or does it take ages.
I know there’s some timing sensitive startup stuff
m
Thank you Seph those are good observations and I'll give those things a try and get back to you here.
Do you recommend that when running osqueryd manually I mimic the arguments that launcher is using, based on what I see in the log output? I did that on both systems (working launcher and non-working launcher) and I'm a little surprised to find that it does not run on either.
s
Honestly it's been awhile since I've done that. What's the error?
m
Here's the output that I capture in a file. This looks pretty similar on both systems but I'm going to compare a little closer. The output I'm sending you here is from the system where launcher is failing.
Copy code
I0527 19:57:25.045593  5312 init.cpp:342] osquery initialized [version=4.7.0]
I0527 19:57:25.076864  5312 system.cpp:342] Found stale process for osqueryd (4076)
I0527 19:57:25.076864  5312 system.cpp:374] Writing osqueryd pid (8712) to C:\Program Files\Kolide\Launcher-launcher\data\osquery.pid
I0527 19:57:25.076864  5312 extensions.cpp:438] Found autoloadable extension: C:\Program Files\Kolide\Launcher-launcher\bin\osquery-extension.exe
I0527 19:57:25.076864  5312 dispatcher.cpp:78] Adding new service: WatcherRunner (0000014492AFD010) to thread: 4548 (0000014492AF7AC0) in process 8712
I0527 19:57:25.076864  5312 rocksdb.cpp:132] Opening RocksDB handle: C:\Program Files\Kolide\Launcher-launcher\data\osquery.db
I0527 19:57:25.092486  4548 watcher.cpp:659] Created and monitoring extension child (4160): C:\Program Files\Kolide\Launcher-launcher\bin\osquery-extension.exe
[C:\Program Files\Kolide\Launcher-launcher\bin\osquery-extension.exe --verbose --socket \\.\pipe\kolide-osquery-01F6QZD64HQ6J03NNP01ZREWSK --timeout 10 --interval 3]I0527 19:57:25.352573  5312 dispatcher.cpp:78] Adding new service: ExtensionWatcher (0000014492B159E0) to thread: 7896 (0000014492B56630) in process 8712
I0527 19:57:25.352573  5312 dispatcher.cpp:78] Adding new service: ExtensionRunnerCore (0000014492AB18F0) to thread: 2504 (0000014492B56530) in process 8712
I0527 19:57:25.352573  2504 interface.cpp:270] Extension manager service starting: \\.\pipe\kolide-osquery-01F6QZD64HQ6J03NNP01ZREWSK
E0527 19:57:40.675662  5312 shutdown.cpp:69] Cannot activate kolide_grpc config plugin: Unknown registry plugin: kolide_grpc
I0527 19:57:40.675662  5312 auto_constructed_tables.cpp:97] Removing stale ATC entries
W0527 19:57:40.675662  5312 init.cpp:589] Error reading config: Missing config plugin 
E0527 19:57:56.008266  5312 shutdown.cpp:69] Cannot activate kolide_grpc logger plugin: Unknown registry plugin: kolide_grpc
E0527 19:58:11.259007  5312 shutdown.cpp:69] Cannot activate kolide_grpc distributed plugin: Unknown registry plugin: kolide_grpc
I0527 19:58:11.259007  5312 eventfactory.cpp:156] Event publisher not enabled: ntfs_event_publisher: NTFS event publisher disabled via configuration
I0527 19:58:11.259007  5312 events.cpp:36] Skipping subscriber: powershell_events: Required publisher is disabled by configuration
I0527 19:58:11.259007  5312 events.cpp:36] Skipping subscriber: windows_events: Required publisher is disabled by configuration
I0527 19:58:11.259007  5260 eventfactory.cpp:391] Starting event publisher run loop: WindowsEventLogPublisher
I0527 19:58:11.259007  5260 eventfactory.cpp:411] Event publisher WindowsEventLogPublisher run loop terminated for reason: Publisher disabled by configuration
I0527 19:58:11.259007  5312 dispatcher.cpp:149] Thread: 5312 requesting a stop
I0527 19:58:11.259007  5312 dispatcher.cpp:156] Service: 0000014492AFD010 has been interrupted
I0527 19:58:11.259007  5312 dispatcher.cpp:156] Service: 0000014492B159E0 has been interrupted
I0527 19:58:11.259007  5312 dispatcher.cpp:156] Service: 0000014492AB18F0 has been interrupted
I0527 19:58:11.259007  5312 dispatcher.cpp:122] Thread: 5312 requesting a join
I0527 19:58:11.259007  5312 dispatcher.cpp:140] Service thread: 0000014492B56530 has joined
I0527 19:58:11.259007  5312 dispatcher.cpp:140] Service thread: 0000014492B56630 has joined
I0527 19:58:11.259007  5312 dispatcher.cpp:140] Service thread: 0000014492AF7AC0 has joined
I0527 19:58:11.259007  5312 dispatcher.cpp:144] Services and threads have been cleared
It appears to bomb here:
Copy code
Cannot activate kolide_grpc config plugin: Unknown registry plugin: kolide_grpc
s
Hrm. It may be harder to make it just work. Some of that is coming in via launcher
m
That's sort of what I thought. Maybe I can get some debug going on osquery and run it through launcher.
s
You’re already getting that — the osquery args include
--verbose
.
m
Ok, gotcha. Well, there's got to be some difference between these systems. It seems like these might be newer OS's that I've deployed, so I'll continue to hunt for differences.
s
I’d generally bet something in permissions. But I’m still surprised there’s no logging.
I’d also tinker with trying to start osquery from the command line
You don’t need it to connect to the server, so maybe drop that too
Just trying to see if the OS is doing something weoird
m
Good ideas, thanks.
s
I keep wondering what’s up with
Extension socket not available: \\\\.\\pipe\\kolide-osquery-01F6QPMW4JVZVQZAWQS94WTCEJ.30828"
That path is, AFAIK, reasonable
\\.\pipe\<whatever>
is where windows keeps named pipes. And the rest is a random ulid to ensure uniqueness. So, why would that fail? It’s weird
(I might also be chasing the wrong idea)
m
I think you may be on to something with the pipe. If I look at a working system, there are two pipes:
Copy code
\\.\\pipe\\kolide-osquery-01F6RS33G9Y895NC1HWN79ZDAD
\\.\\pipe\\kolide-osquery-01F6RS33G9Y895NC1HWN79ZDAD.13599
Note that one of them has the 5 digit extension. If I watch the pipe list on a non-working system right after I start the launcher service, I only see 1 pipe:
Copy code
\\.\\pipe\\kolide-osquery-01F6RTP7R619269W0ZS6ZWDVAN
Is there any significance to that? It seems to be the pipe name with the 5 digit extension that the error message indicates launcher can't connect to, and on my non-working systems that pipe name is never created.
Also by the way I was able to get osqueryd to run and connect to my fleet server on both systems (working and non-working) so it does appear that osqueryd is stable on both. It really seems to be something around that pipe communication.