so it looks like my scheduled automated queries stopped func osquery #fleet

so it looks like my scheduled/automated queries st...

mason kemmerer

01/04/2024, 9:05 PM

so it looks like my scheduled/automated queries stopped functioning sometime after upgrading to v4.41.1 then suddenly i started seeing my results going to mysql binlogs instead of going to the fleet servers result.log? not sure if anyone else is seen anything like this. The binlogs ended up causing my server disk to reach capacity. I had to expand the disk to even authenticate and take a look at what happened config wise. since then I've gone into mysql and disabled binlogs to prevent the disk from being consumed again, but not sure where to go from here to ensure my scheduled queries to send results back to the result.log on the fleet server

Kathy Satterlee

01/09/2024, 9:59 PM

Hi @mason kemmerer! Are you seeing any errors related to logging or query ingestion in your Fleet server logs? Do you have any queries that are scheduled outside of Fleet (set up manually in osquery config)?

mason kemmerer

01/10/2024, 3:02 PM

not anymore, once i went to 4.41.1 i disabled the legacy packs at the URL: myfleetsvr.com/packs/manage

mason kemmerer

01/10/2024, 3:04 PM

i can also see my queries are running on my hosts when i run a live query:

SELECT * from osquery_schedule

and can see the results going to the query_results table in mysql with current timestamps... just ever since this upgrade the result.log has not been updated, when i reviewed the mysql binlogs it looked like result data that was base64 encoded

mason kemmerer

01/10/2024, 3:05 PM

happy to run any additional queries with osqueryd or my fleetserver to troubleshoot if theres errors somewhere i am missing?

mason kemmerer

01/10/2024, 6:39 PM

fwiw i am monitoring Darshal Shah's thread as well (since it seemed we have a similar issue) and am also returning no results when running the same query:

SELECT * from osquery_schedule WHERE query LIKE "select * from rpm_packages"

mason kemmerer

01/10/2024, 6:40 PM

i also upgraded to v4.43.0 if that helps with the troubleshooting

Kathy Satterlee

01/11/2024, 10:04 PM

Are there any errors in the Fleet server logs?

Kathy Satterlee

01/11/2024, 10:11 PM

Can you confirm that the query has a frequency set and that automations are enabled for that query?

mason kemmerer

01/12/2024, 3:01 PM

Yes, I can confirm using the rpm_packages query as an example. It is set to run daily and automations are on. what would be the best way to get the fleet server logs you are looking for?

Kathy Satterlee

01/12/2024, 11:23 PM

That depends on how you have Fleet deployed. The server logs are sent to stderr and stdout, so wherever those go in your environment.

mason kemmerer

01/13/2024, 12:41 AM

im using docker based on this project: https://github.com/CptOfEvilMinions/FleetDM-Automation

mason kemmerer

01/13/2024, 12:46 AM

ah i think i figured it out the fleet-webgui container when i tail the docker logs is riddled with these:

Copy code

{
  "component": "http",
  "err": "error writing result logs (if the logging destination is down, you can reduce frequency/size of osquery logs by increasing logger_tls_period and decreasing logger_tls_max_lines): writing log: can't rename log file: rename /var/log/osquery/result.log /var/log/osquery/result-2024-01-13T00-45-30.101.log: permission denied",
  "ip_addr": "172.18.38.205",
  "level": "error",
  "method": "POST",
  "took": "231.858647ms",
  "ts": "2024-01-13T00:45:30.101966418Z",
  "uri": "/api/v1/osquery/log",
  "uuid": "c8c7e695-684d-414c-8973-ece403f823c5",
  "x_for_ip_addr": "172.18.38.205"
}

mason kemmerer

01/13/2024, 12:49 AM

i have that path specified in my fleetdm.yml file which i point to when preparing the database

Copy code

filesystem:
  status_log_file: /var/log/osquery/status.log
  result_log_file: /var/log/osquery/result.log
  enable_log_rotation: true

mason kemmerer

01/13/2024, 12:53 AM

did something change wrt how logs are written? in one of the latest updates to fleet or osquery?

mason kemmerer

01/13/2024, 1:01 AM

the Dockerfile-Fleetdm has result.lgo and status.log specified, so not sure where the result-YYYY-MM-DDT-HH-MM-SS-SEC.log came from

Copy code

### Setup logging directory ###
RUN mkdir /var/log/osquery && \
	chown root:root /var/log/osquery && \
	touch /var/log/osquery/status.log && \
	touch /var/log/osquery/result.log && \
	chown fleet:fleet /var/log/osquery/result.log && \
	chown fleet:fleet /var/log/osquery/status.log

mason kemmerer

01/13/2024, 1:20 AM

verified nothing has changed perms wise in the fleet-webgui container:

mason kemmerer

01/13/2024, 1:20 AM

Copy code

/var/log/osquery $ ls -l
total 536648
-rw-r--r--    1 root     root      25176131 Nov  7 19:40 masonk@10.0.0.5
-rw-r--r--    1 fleet    fleet    524287805 Jan  2 17:30 result.log
-rw-r--r--    1 fleet    fleet        52335 Jan  5 16:47 status.log

mason kemmerer

01/13/2024, 1:23 AM

is it perhaps the result.log file finally reached its capacity and to prepare for log rotation (since i have that enabled above), and now fleet wants to rename the file (adding the timestamp at the end) and failed to do the rename on the fly? perhaps this may stemmed from when my disk had reached capacity?

mason kemmerer

01/13/2024, 1:56 AM

as a test.. i went into the fleet-webgui container and ran:

cp result.log result.log.bk

then

touch result.log

and

chown fleet:fleet result.log

within moments the "new" result.log filled up, so I believe I am on the right track:

Copy code

/var/log/osquery # ls -l
total 1048652
-rw-r--r--    1 root     root      25176131 Nov  7 19:40 masonk@10.0.0.5
-rw-r--r--    1 fleet    fleet    524287744 Jan 13 01:55 result.log
-rw-r--r--    1 root     root     524287805 Jan 13 01:50 result.log.bk
-rw-r--r--    1 fleet    fleet        52951 Jan 13 01:53 status.log

mason kemmerer

01/13/2024, 2:08 AM

i swapped the owner of the osquery folder from root to fleet and now it is able to write the timestamped result.logs...

Copy code

/var/log/osquery # ls -l
total 1153456
-rw-r--r--  1 root   root   25176131 Nov 7 19:40 mkemmerer@10.64.121.211
-rw-r--r--  1 fleet  fleet  524287744 Jan 13 01:55 result-2024-01-13T02-06-40.494.log
-rw-r--r--  1 fleet  fleet  107312777 Jan 13 02:07 result.log
-rw-r--r--  1 root   root   524287805 Jan 13 01:50 result.log.bk
-rw-r--r--  1 fleet  fleet    52951 Jan 13 01:53 status.log

mason kemmerer

01/13/2024, 2:09 AM

@Kathy Satterlee thank you for the direction (it was the eureka moment i needed) i guess my final question is does fleet have any configuration to remove / delete result.logs of a certain age? or is that something administrators/customers are expected to address?

mason kemmerer

01/13/2024, 2:10 AM

i presume it is 28 days? since I do not have

Copy code

filesystem:
   max_age: 0

specified in my fleetdm.yml config file?

mason kemmerer

01/13/2024, 2:25 AM

or would the default value of 3 for max_backups cover me here so these rotated logs dont just consume the entire disk?

Copy code

filesystem:
   max_backups: 0

Open in Slack

Previous Next