Title
#fleet
Ayan

Ayan

07/30/2021, 2:53 PM
folks, how do custom labels work? When we write a query for the label, does it run on all of the devices and groups together the ones that match the
WHERE
clause of the query? if that's the case, wouldn't having a larger set of labels affect the performance of all of the devices enrolled in fleet even though the labels are not intended to target all of them?
zwass

zwass

07/30/2021, 7:29 PM
That's right -- Label queries run on a configurable (hourly by default) interval on all hosts with a matching platform. Most queries folks use for labels are quite low resource utilization but if you are thinking of having hundreds, or utilizing potentially expensive queries this is worth being aware of.
Ayan

Ayan

07/30/2021, 7:32 PM
Thank you for the response Zwass. We're running fleet with about 6000 devices, and there seems to be a storm of label queries executing on the agents. The
osquery_status
logs seems to suggest the refresh interval might be way less than 1 hour which I think is the default. It is quite strange because we have not specified an interval for it at the moment and it should run default 1 hr.
Rachel Perkins

Rachel Perkins

07/30/2021, 9:09 PM
Hey Ayan, which version of Fleet are you on and did you recently upgrade?
Ayan

Ayan

07/30/2021, 9:11 PM
Hello Rachel. We’re running fleet 3.11 in our network. We’re planning to upgrade to v4. Is that something that’s covered in the latest version?
zwass

zwass

07/30/2021, 9:32 PM
I don't think we've changed anything related to this between those versions. Is this something that started recently?
Ayan

Ayan

07/30/2021, 9:51 PM
It seems to have been going on for a while. We started to notice it recently when we enabled quite large labels and these labels keep running very frequently per the
osquery_status
logs on the fleet server. Here's an example log with redacted output:
9:53 PM
{
  "hostIdentifier": "<UUID>",
  "calendarTime": "Fri Jul 30 14:01:42 2021 UTC",
  "unixTime": "1627653702",
  "severity": "0",
  "filename": "distributed.cpp",
  "line": "121",
  "message": "Executing distributed query: fleet_label_query_36: SELECT address FROM interface_addresses\nWHERE address IN (ABOUT 100 IPs);",
  "version": "4.8.0",
  "decorations": {
    "host_uuid": "<UUID>",
    "hostname": "<endpoint>"
  }
}
11:41 PM
I tried to increase the label timers in fleet and weirdly it does not seem to be able to stop the label queries that are constantly executing from the fleet server. There seems to be another ongoing series of errors which looks like this
Jul 30 19:35:25 <http://kolide-fleet1.xyz.com|kolide-fleet1.xyz.com> fleet[1441]: {"component":"service","err":"failed to save labels: insert label query executions: Error 1452: Cannot add or update a child row: a foreign key constraint fails (`kolide_infra`.`label_membership`, CONSTRAINT `fk_lm_label_id` FOREIGN KEY (`label_id`) REFERENCES `labels` (`id`) ON DELETE CASCADE ON UPDATE CASCADE)","ip_addr":"[IPv6 address of an device]:35338","level":"info","method":"SubmitDistributedQueryResults","took":"3.852295ms","ts":"2021-07-30T23:35:25.610662952Z","x_for_ip_addr":""}
Any thoughts if the two issues might be directly related?
zwass

zwass

08/03/2021, 7:32 PM
I think there's an issue with osquery continuing to send results even after they are supposed to be discarded. Since we can't fix that on the Fleet side we are working to handle this situation more gracefully. See https://github.com/fleetdm/fleet/pull/1535.