Hi team In many production servers orbit agent is trying to osquery #fleet

Hi team. In many production servers orbit-agent is...

Lili

05/20/2023, 6:43 PM

Hi team. In many production servers orbit-agent is trying to start cyclically, but can't do that. We get errors like :

Copy code

May 19 15:10:39 HOST systemd[1]: Started Orbit osquery.
May 19 15:10:39 HOST orbit[2667636]: 2023-05-19T15:10:39+03:00 INF running with auto updates disabled
May 19 15:10:39 HOST orbit[2667636]: 2023-05-19T15:10:39+03:00 INF token rotation is enabled
May 19 15:10:39 HOST orbit[2667636]: 2023-05-19T15:10:39+03:00 INF start osqueryd cmd="/opt/orbit/bin/osqueryd/linux/5.7.0/osqueryd --pidfile=/opt/orbit/osquery.pid --database_path=/opt/orbit/osquery.db --extensions_socket=/opt/orbit/orbit-osquery.em --logger_path=/opt/orbit/osquery_log --enroll_secret_env ENROLL_SECRET --host_identifier=uuid --tls_hostname=HOSTNAME --enroll_tls_endpoint=/api/v1/osquery/enroll --config_plugin=tls --config_tls_endpoint=/api/v1/osquery/config --config_refresh=60 --disable_distributed=false --distributed_plugin=tls --distributed_tls_max_attempts=10 --distributed_tls_read_endpoint=/api/v1/osquery/distributed/read --distributed_tls_write_endpoint=/api/v1/osquery/distributed/write --logger_plugin=tls,filesystem --logger_tls_endpoint=/api/v1/osquery/log --disable_carver=false --carver_disable_function=false --carver_start_endpoint=/api/v1/osquery/carve/begin --carver_continue_endpoint=/api/v1/osquery/carve/block --carver_block_size=2000000 --tls_server_certs /opt/orbit/certs.pem --augeas_lenses /opt/orbit/lenses --force --flagfile /opt/orbit/osquery.flags"
May 19 15:10:39 HOST osqueryd[2667654]: osqueryd started [version=5.7.0]
May 19 15:11:09 HOST orbit[2667636]: 2023-05-19T15:11:09+03:00 INF calling flags update
May 19 15:11:49 HOST orbit[2667636]: 2023-05-19T15:11:49+03:00 ERR unexpected exit error="extension socket stat timeout"

In Fleet server status of agents is "offline" . What we should do with this error? How can we start agents? Orbit: 1.5.0 Osquery: 5.7.0 Debian: 11 .

Rachel Perkins

05/22/2023, 7:42 PM

Hey Lili, can you clarify more what you were trying to do and what command you were running?

Lili

05/23/2023, 7:44 AM

Hello @Rachel Perkins . Everything worked fine until the Fleet server went down and was not available for about 10 hours. After that, some agents started having problems that they lost contact with the fleet server. And orbit-agent is trying to start cyclically, but can't do that. Restarting the agent on its own does not help. kill -9 <pid orbit> doesn't help either.

roberto

05/24/2023, 6:23 PM

Hey @Lili, seems like Orbit can't find the extension socket that should be created by osquery. From the log you pasted, the socket file is

/opt/orbit/orbit-osquery.em

Could you: 1. Try deleting the file? if that doesn't work, inspecting permissions? 2. See if there's something in the osquery logs?

opt/orbit/osquery_log/*

Lili

05/24/2023, 6:50 PM

@roberto hello! Directory

/opt/orbit/osquery_log/

is empty and extension file

/opt/orbit/orbit-osquery.em

not exist

Copy code

# ls -lt /opt/orbit/orbit-osquery.em
ls: cannot access '/opt/orbit/orbit-osquery.em': No such file or directory

Lucas Rodriguez

05/24/2023, 7:07 PM

Something we can try: 1. Stop orbit:

sudo systemctl stop orbit

2. Try starting osquery manually:

Copy code

# As root

/opt/orbit/bin/osqueryd/linux/5.7.0/osqueryd --pidfile=/opt/orbit/osquery.pid --database_path=/opt/orbit/osquery.db --extensions_socket=/opt/orbit/orbit-osquery.em --logger_path=/opt/orbit/osquery_log --enroll_secret_env ENROLL_SECRET --host_identifier=uuid --tls_hostname=HOSTNAME --enroll_tls_endpoint=/api/v1/osquery/enroll --config_plugin=tls --config_tls_endpoint=/api/v1/osquery/config --config_refresh=60 --disable_distributed=false --distributed_plugin=tls --distributed_tls_max_attempts=10 --distributed_tls_read_endpoint=/api/v1/osquery/distributed/read --distributed_tls_write_endpoint=/api/v1/osquery/distributed/write --logger_plugin=tls,filesystem --logger_tls_endpoint=/api/v1/osquery/log --disable_carver=false --carver_disable_function=false --carver_start_endpoint=/api/v1/osquery/carve/begin --carver_continue_endpoint=/api/v1/osquery/carve/block --carver_block_size=2000000 --tls_server_certs /opt/orbit/certs.pem --augeas_lenses /opt/orbit/lenses --force --flagfile /opt/orbit/osquery.flags

to know if it's hanging or crashing during start up.

roberto

05/24/2023, 8:06 PM

thanks Lucas, that sounds like a great idea, we'll might also get some useful info from stdout

Lili

05/26/2023, 6:43 AM

@Lucas Rodriguez hello! I stop orbit and start osquery manually. After start osquery I get:

Copy code

W0526 09:38:55.125365 2774620 watcher.cpp:397] osqueryd worker (2774621) stopping: Memory limits exceeded: 1099509760
W0526 09:38:59.159560 2774620 watcher.cpp:435] osqueryd worker (2774621) could not be stopped. Sending kill signal.

Flag watchdog_memory_limit is set to 1024 . For information: osquery.db folder size is 21G . After remove this folder and restart orbit agent all works good.

Lili

05/26/2023, 7:07 AM

I made mistake when set buffered_log_max: 0 . Seems like after disconnect with FleetDM server, orbit agent write to disk all results of query(1) , after that can't load all buffered logs(2) and failed on start because there is not enough memory(3). Thank you for help ! @roberto @Lucas Rodriguez

4 Views

Open in Slack

Previous Next