just upgraded our fleet to v4 3 0 seeing some new errors ``` osquery #fleet

just upgraded our fleet to v4.3.0, seeing some new...

Jocelyn Bothe

09/14/2021, 6:29 PM

just upgraded our fleet to v4.3.0, seeing some new errors:

Copy code

Sep 14 18:28:30 <http://osquery-service-orc20.ec2.vzbuilders.com|osquery-service-orc20.ec2.vzbuilders.com> fleet[13403]: ts=2021-09-14T18:28:30.884003459Z component=service method=ingestDiskSpace err="detail_query_disk_space expected single result got 2"

Tomas Touceda

09/14/2021, 6:43 PM

hm, that's odd, do you know what host is causing that error?

Jocelyn Bothe

09/14/2021, 6:52 PM

it is happening thousands of times and that's all the detail in the log

Jocelyn Bothe

09/14/2021, 6:52 PM

eventually fleet runs out of memory and cores

Jocelyn Bothe

09/14/2021, 6:53 PM

I'll try putting it in debug and see if that gets any more details

Tomas Touceda

09/14/2021, 7:08 PM

it's concerning that it runs out of cpu and memory, the error is logged and then ignored basically

Tomas Touceda

09/14/2021, 7:09 PM

could you show me the output of

fleetctl get config

Jocelyn Bothe

09/14/2021, 7:26 PM

we don't use fleetctl, but here's our config yaml

Copy code

[root@osquery-service-ora48 ouser]# cat /etc/kolide/fleet.yml
mysql:
  address:           <http://rds-global.ec2.posq.com:3306|rds-global.ec2.posq.com:3306>
  database:          osq
  username:          [REDACTED]
  password:          [REDACTED]
  max_open_conns:    100
  max_idle_conns:    100
  conn_max_lifetime: 0
mysql_read_replica:
  address:           <http://replica.rds-global.ec2.posq.com:3306|replica.rds-global.ec2.posq.com:3306>
  database:          osq
  username:          [REDACTED]
  password:          [REDACTED]
  max_open_conns:    100
  max_idle_conns:    100
  conn_max_lifetime: 0
redis:
  address: <http://redis-global.ec2.posq.com:6380|redis-global.ec2.posq.com:6380>
  connect_timeout: 30s
  keep_alive: 60s
server:
  tls: true
  cert: /etc/kolide/paranoids.c-osq.tls.cert.pem
  key: /etc/kolide/paranoids.c-osq.tls.privatekey.pem
  address: 0.0.0.0:8090
session:
  duration: 12h
osquery:
  status_log_plugin: filesystem
  result_log_plugin: firehose
#  result_log_plugin: filesystem
  host_identifier: instance
  enroll_cooldown: 1440m
  detail_update_interval: 1440m
  osquery_label_update_interval: 120m
vulnerabilities:
  current_instance_checks: no
logging:
#  debug: true
filesystem:
  status_log_file: /var/log/osquery/status.log
  result_log_file: /var/log/osquery/results.log
  enable_log_rotation: true
firehose:
  region: us-west-2
  sts_assume_role_arn: [REDACTED]
  result_stream: osquery-kinesis-firehose-stream-us-west-2

Jocelyn Bothe

09/14/2021, 7:28 PM

I got it to stabilize after doubling our instances

Tomas Touceda

09/14/2021, 8:19 PM

there's a set of configuration options that are visible through fleetctl, could you by any chance run that command I mention? these configuration options are not the same as those you pasted, btw

Jocelyn Bothe

09/14/2021, 8:53 PM

Copy code

# fleetctl get config
get config received status 404 unknown

Tomas Touceda

09/14/2021, 8:56 PM

hm, you're probably missing the config:

Copy code

contexts:
  default:
    address: <your fleet UI URL here>

~/.fleet/config

Jocelyn Bothe

09/14/2021, 8:56 PM

nope, we use SSO, so I had to configure that first

Jocelyn Bothe

09/14/2021, 9:00 PM

I checked fleetctl get --help but there doesn't look like any way to increase verbosity, so I don't have any more info about why it's not working

Tomas Touceda

09/14/2021, 9:03 PM

could you show me the contents of

~/.fleet/config

Jocelyn Bothe

09/14/2021, 9:03 PM

Copy code

contexts:
  default:
    address: https://[URL]:8080
    email: <mailto:jocelyn.bothe@verizonmedia.com|jocelyn.bothe@verizonmedia.com>
    token: [TOKEN]

Tomas Touceda

09/14/2021, 9:06 PM

fleetctl --version

how about that one?

Jocelyn Bothe

09/14/2021, 9:07 PM

Copy code

# fleetctl --version
fleetctl - version 4.3.0
  branch: 	HEAD
  revision: 	86044eb0369e27b68e313d33a280f73a332a9994
  build date: 	2021-09-13
  build user: 	runner
  go version: 	go1.16.5

Jocelyn Bothe

09/14/2021, 9:13 PM

ach, I was sending to our daemon endpoint not our GUI endpoint

Jocelyn Bothe

09/14/2021, 9:14 PM

Copy code

[root@osquery-service-ora48 .fleet]# fleetctl get config
---
apiVersion: v1
kind: config
spec:
  agent_options:
    config:
      decorators:
        load:
        - SELECT COALESCE((select instance_id FROM ec2_instance_metadata), hostname)
          as hostname FROM system_info;
      file_paths:
        docker:
        - /etc/docker/%%
        - /etc/default/docker
        - /etc/docker/daemon.json
        - /usr/bin/containerd
        - /usr/sbin/runc
        - /etc/sysconfig/docker
        - /usr/lib/systemd/system/docker.service
        - /usr/lib/systemd/system/docker.socket
        etc:
        - /etc/group
        - /etc/passwd
        - /etc/shadow
        - /etc/services
        - /etc/sudoers
        - /etc/ld.so.preload
        - /etc/ld.so.conf
        - /etc/ld.so.conf.d/%%
        - /etc/pam.d/%%
        - /etc/resolv.conf
        - /etc/modules
        - /etc/hosts
        - /etc/hostname
        - /etc/fstab
        - /etc/rsyslog.conf
        firewalls:
        - /etc/sysconfig/iptables
        - /home/y/conf/yakl/%%
        - /etc/yakl/conf/%%
        logs:
        - /var/log/secure
        osquery:
        - /etc/osquery/%%
        - /usr/share/osquery/packs/%%
        ssh:
        - /root/.ssh/%%
        - /home/%/.ssh/%%
        - /etc/ssh/%%
        - /var/lib/sia/keys/
        - /var/lib/sia/certs/
      options:
        host_identifier: instance
    overrides: {}
  host_expiry_settings:
    host_expiry_enabled: true
    host_expiry_window: 1
  host_settings:
    enable_host_users: true
    enable_software_inventory: false
  org_info:
    org_logo_url: ""
    org_name: Verizon Media LLC
  server_settings:
    enable_analytics: false
    live_query_disabled: false
    server_url: https://[URL]
  smtp_settings:
    authentication_method: "0"
    authentication_type: "0"
    configured: true
    domain: ""
    enable_smtp: false
    enable_ssl_tls: true
    enable_start_tls: true
    password: '********'
    port: 587
    sender_address: [EMAIL]
    server: <http://email-smtp.us-east-1.amazonaws.com|email-smtp.us-east-1.amazonaws.com>
    user_name: [REDACTED]
    verify_ssl_certs: true
  sso_settings:
    enable_sso: true
    enable_sso_idp_login: true
    entity_id: http://[OKTA]
    idp_image_url: ""
    idp_name: Okta
    issuer_uri: http://[OKTA]
    metadata: ""
    metadata_url: https://[OKTA]
  vulnerability_settings:
    databases_path: ""
  webhook_settings:
    host_status_webhook:
      days_count: 0
      destination_url: ""
      enable_host_status_webhook: false
      host_percentage: 0
    interval: 24h0m0s

Tomas Touceda

09/14/2021, 9:17 PM

do you use the user list in the host details page?

Jocelyn Bothe

09/14/2021, 9:18 PM

nope, I thought I'd disabled it, but apparently not

Jocelyn Bothe

09/14/2021, 9:18 PM

I was just looking up how to turn it off

Tomas Touceda

09/14/2021, 9:19 PM

you can put this in a file:

Copy code

---
apiVersion: v1
kind: config
spec:
  host_settings:
    enable_host_users: false

and then

fleetctl apply -f path/to/that/file

Tomas Touceda

09/14/2021, 9:20 PM

the next release will handle the error you saw above

👍 1

Jocelyn Bothe

09/14/2021, 9:20 PM

there's not a yaml config option for enable_host_users?

Jocelyn Bothe

09/14/2021, 9:20 PM

I don't want to add a fleetctl command to our bootstrap scripts, that's pretty clunky

Tomas Touceda

09/14/2021, 9:21 PM

not for fleet serve, these are two different sets of configs. You don't have to run it at bootstrap, it'll be disable from this point on unless you drop the db

Jocelyn Bothe

09/14/2021, 9:23 PM

Copy code

fleetctl apply -f file
applying fleet config: apply config received status 500 Mail Error: sending mail: could not issue mail to provided address: 530 Authentication required

Jocelyn Bothe

09/14/2021, 9:24 PM

Copy code

# cat file
---
apiVersion: v1
kind: config
spec:
  host_settings:
    enable_host_users: false

Tomas Touceda

09/14/2021, 9:25 PM

right, it validates all the configs and the smtp ones don't seem to be good

Tomas Touceda

09/14/2021, 9:25 PM

we saw another user with the same issue, something is off with the migration

Jocelyn Bothe

09/14/2021, 9:28 PM

ah, didn't have auth type or method set

Jocelyn Bothe

09/14/2021, 9:30 PM

that did it

Jocelyn Bothe

09/14/2021, 9:30 PM

Copy code

# fleetctl apply -f test
[+] applied fleet config

Jocelyn Bothe

09/14/2021, 9:30 PM

Copy code

host_settings:
    enable_host_users: false

Tomas Touceda

09/14/2021, 9:32 PM

great, give it a bit for hosts to refetch config, and see if that helps managing the resource usage

Jocelyn Bothe

09/17/2021, 4:30 PM

I failed over our DB to a new region, and it immediately blew up. One of the things I noticed looking at the SQL statements was a high volume of "UPDATE

host_users

SET

removed_at

= CURRENT_TIMESTAMP WHERE

host_id

= ?" If I've got enable_host_users set to false, why is it still doing a host_users update?

Tomas Touceda

09/17/2021, 4:53 PM

the config changes what queries we give osquery, but it doesn't disable how we store the host. We can improve that, of course

Tomas Touceda

09/17/2021, 4:53 PM

we're working on the label_membership one of those, we don't have a good workaround just yet

Tomas Touceda

09/17/2021, 4:57 PM

https://github.com/fleetdm/fleet/pull/2127

Jocelyn Bothe

09/17/2021, 5:10 PM

sweet, that'll def help

25 Views

Open in Slack

Previous Next