i rolled out osquery to some immutable hosts in gc...
# fleet
a
i rolled out osquery to some immutable hosts in gcp, i found that i’m getting only 1 machine in fleet, i did run osqueryd with options
Copy code
/usr/bin/osqueryd --flagfile /etc/osquery/osquery.flags --verbose --tls_dump
i noticed that they have the same node_key, and UUID, what would be the solution for this ?
👀 1
this is my flags file
Copy code
--enroll_secret_path=/etc/osquery/osquery_enroll_secret
--tls_server_certs=/etc/osquery/osquery_cert.pem
--tls_hostname=<http://fleet.example.com:443|fleet.example.com:443>
--host_identifier=hostname
--enroll_tls_endpoint=/api/v1/osquery/enroll
--config_plugin=tls
--config_tls_endpoint=/api/v1/osquery/config
--config_tls_refresh=360
--config_tls_max_attempts=360
--disable_distributed=false
--disable_logging=false
--distributed_plugin=tls
--distributed_interval=60
--distributed_tls_max_attempts=3
--distributed_tls_read_endpoint=/api/v1/kolide/distributed/read
--distributed_tls_write_endpoint=/api/v1/kolide/distributed/write
--logger_plugin=filesystem
--logger_path=/var/log/osquery/logs
--database_path=/var/log/osquery/db/osquery.db
--schedule_splay_percent=10
--pack_refresh_interval=360
--watchdog_level=0
--config_refresh=360
--utc
--force=true
i noticed there is something like
--tls_client_cert
 and 
--tls_client_key
that could be used (link) but havenot used that before, hopefully you have some suggestions and also would tls client be usefull and how to generate that cert/key to be accepted by fleet/osquery
a
The same node key and UUID indicates to me that the RocksDB database in /var/osquery/osquery.db has been duplicated between the GCP instances. Is osquery part of your GCP instance image? On already running instances, you could do a one time erase of the RocksDB with:
sudo service osqueryd stop; sudo osqueryctl clean; sudo service osqueryd start
On then when recreating your GCP image, erase the RocksDB before creating the image:
sudo service osqueryd stop; sudo osqueryctl clean
a
after running that command, i see i’m getting new UUID, but i’m still getting the same node_key from fleet for both machines.
how can we get a new node_key?
and for the creation of image would it be sufficient to delete the db file or make sure it doesnot exist before starting ? i’n my current setup, i had a check in puppet to stop the service if its in the build process
Copy code
$service_ensure = $::built_by_packer ? {
    true    => 'stopped',
    default => $osquery::agent::service,
  }
i guess that didnot prevent the db from creation,
a
I don't know why this would lead to a duplicate node key. You could try deleting the host from the Fleet UI and see if that helps. If osqueryd starts even once, it will probably generate a RocksDB. So I'd suggest doing a service stop and removal of the RocksDB before creating your image. You could do this in Puppet.
Also, I see your RocksDB path is not standard:
Copy code
--database_path=/var/log/osquery/db/osquery.db
So I'm not sure if that
osqueryctl clean
command would work.
a
i found that deleting the entire db, fixed that issue and now its appearing in fleet
i would just need to figure out now, how to not create that db in the build process
the clean works, it get you a new UUID, but not a new node_key which comes in the reponse from fleet (i saw that when i did tls dump) but deleting the db completely and restart osqueryd it then get a new node_key
z
Awesome, thank you for helping out @Avi Norowitz!
a
🙂
z
Best practice is to delete the osquery DB before cloning images. Otherwise multiple instances will share the same node key and Fleet sees them all as the "same" instance.
a
why doesn’t the
osqueryctl clean
is not cleaning the node key as well? i have been trying to delete the db in the build process when packer runs, but still getting the same. if you have tried that before and could point me to something similar that would be great.
a
osqueryctl clean
removes
/var/osquery/osquery.db
. But in your config, you have:
Copy code
--database_path=/var/log/osquery/db/osquery.db
Is there any particular reason you need to use the non-default path for this?
a
it was a convention in the environment, but i dont think there is a reason not to change it. but as you mentioned this will work for live instance, i’m trying to figure out something that prevent this issue from happening. trying to get the puppet to delete this file in the build process, but it seems there are time where the image is actually running before it’s used to create an instance.
Shouldn’t this be a bug, when osquery is not deleting non standard location, i tested it, i found it deletes the location you mentioned but not the location i had in my configs.
a
Maybe it would be considered a bug with osquery. You could ask in #general maybe or file a bug report here https://github.com/osquery/osquery/issues. (Just a disclaimer, I'm not affiliated with either osquery or Fleet. I'm just a user of both.)
a
i know, Thanks a lot for the help Avi
z
@Ahmed I'd recommend shutting down osquery and deleting the database (
rm -rf
on that directory should work fine) before creating the image. Make sure that osquery is configured to start on boot.
a
Thanks zack, i overcome this by stopping the service inside build process which didnot create the db files, and as a second step i did clean command on the default directory of db which will clean db if exist. that fixed the problem. thanks a lot.
💯 1