just upgraded our fleet to v4.3.0, seeing some new...
# fleet
j
just upgraded our fleet to v4.3.0, seeing some new errors:
Copy code
Sep 14 18:28:30 <http://osquery-service-orc20.ec2.vzbuilders.com|osquery-service-orc20.ec2.vzbuilders.com> fleet[13403]: ts=2021-09-14T18:28:30.884003459Z component=service method=ingestDiskSpace err="detail_query_disk_space expected single result got 2"
t
hm, that's odd, do you know what host is causing that error?
j
it is happening thousands of times and that's all the detail in the log
eventually fleet runs out of memory and cores
I'll try putting it in debug and see if that gets any more details
t
it's concerning that it runs out of cpu and memory, the error is logged and then ignored basically
could you show me the output of
fleetctl get config
?
j
we don't use fleetctl, but here's our config yaml
Copy code
[root@osquery-service-ora48 ouser]# cat /etc/kolide/fleet.yml
mysql:
  address:           <http://rds-global.ec2.posq.com:3306|rds-global.ec2.posq.com:3306>
  database:          osq
  username:          [REDACTED]
  password:          [REDACTED]
  max_open_conns:    100
  max_idle_conns:    100
  conn_max_lifetime: 0
mysql_read_replica:
  address:           <http://replica.rds-global.ec2.posq.com:3306|replica.rds-global.ec2.posq.com:3306>
  database:          osq
  username:          [REDACTED]
  password:          [REDACTED]
  max_open_conns:    100
  max_idle_conns:    100
  conn_max_lifetime: 0
redis:
  address: <http://redis-global.ec2.posq.com:6380|redis-global.ec2.posq.com:6380>
  connect_timeout: 30s
  keep_alive: 60s
server:
  tls: true
  cert: /etc/kolide/paranoids.c-osq.tls.cert.pem
  key: /etc/kolide/paranoids.c-osq.tls.privatekey.pem
  address: 0.0.0.0:8090
session:
  duration: 12h
osquery:
  status_log_plugin: filesystem
  result_log_plugin: firehose
#  result_log_plugin: filesystem
  host_identifier: instance
  enroll_cooldown: 1440m
  detail_update_interval: 1440m
  osquery_label_update_interval: 120m
vulnerabilities:
  current_instance_checks: no
logging:
#  debug: true
filesystem:
  status_log_file: /var/log/osquery/status.log
  result_log_file: /var/log/osquery/results.log
  enable_log_rotation: true
firehose:
  region: us-west-2
  sts_assume_role_arn: [REDACTED]
  result_stream: osquery-kinesis-firehose-stream-us-west-2
I got it to stabilize after doubling our instances
t
there's a set of configuration options that are visible through fleetctl, could you by any chance run that command I mention? these configuration options are not the same as those you pasted, btw
j
Copy code
# fleetctl get config
get config received status 404 unknown
t
hm, you're probably missing the config:
Copy code
contexts:
  default:
    address: <your fleet UI URL here>
in
~/.fleet/config
j
nope, we use SSO, so I had to configure that first
I checked fleetctl get --help but there doesn't look like any way to increase verbosity, so I don't have any more info about why it's not working
t
could you show me the contents of
~/.fleet/config
?
j
Copy code
contexts:
  default:
    address: https://[URL]:8080
    email: <mailto:jocelyn.bothe@verizonmedia.com|jocelyn.bothe@verizonmedia.com>
    token: [TOKEN]
t
fleetctl --version
how about that one?
j
Copy code
# fleetctl --version
fleetctl - version 4.3.0
  branch: 	HEAD
  revision: 	86044eb0369e27b68e313d33a280f73a332a9994
  build date: 	2021-09-13
  build user: 	runner
  go version: 	go1.16.5
ach, I was sending to our daemon endpoint not our GUI endpoint
Copy code
[root@osquery-service-ora48 .fleet]# fleetctl get config
---
apiVersion: v1
kind: config
spec:
  agent_options:
    config:
      decorators:
        load:
        - SELECT COALESCE((select instance_id FROM ec2_instance_metadata), hostname)
          as hostname FROM system_info;
      file_paths:
        docker:
        - /etc/docker/%%
        - /etc/default/docker
        - /etc/docker/daemon.json
        - /usr/bin/containerd
        - /usr/sbin/runc
        - /etc/sysconfig/docker
        - /usr/lib/systemd/system/docker.service
        - /usr/lib/systemd/system/docker.socket
        etc:
        - /etc/group
        - /etc/passwd
        - /etc/shadow
        - /etc/services
        - /etc/sudoers
        - /etc/ld.so.preload
        - /etc/ld.so.conf
        - /etc/ld.so.conf.d/%%
        - /etc/pam.d/%%
        - /etc/resolv.conf
        - /etc/modules
        - /etc/hosts
        - /etc/hostname
        - /etc/fstab
        - /etc/rsyslog.conf
        firewalls:
        - /etc/sysconfig/iptables
        - /home/y/conf/yakl/%%
        - /etc/yakl/conf/%%
        logs:
        - /var/log/secure
        osquery:
        - /etc/osquery/%%
        - /usr/share/osquery/packs/%%
        ssh:
        - /root/.ssh/%%
        - /home/%/.ssh/%%
        - /etc/ssh/%%
        - /var/lib/sia/keys/
        - /var/lib/sia/certs/
      options:
        host_identifier: instance
    overrides: {}
  host_expiry_settings:
    host_expiry_enabled: true
    host_expiry_window: 1
  host_settings:
    enable_host_users: true
    enable_software_inventory: false
  org_info:
    org_logo_url: ""
    org_name: Verizon Media LLC
  server_settings:
    enable_analytics: false
    live_query_disabled: false
    server_url: https://[URL]
  smtp_settings:
    authentication_method: "0"
    authentication_type: "0"
    configured: true
    domain: ""
    enable_smtp: false
    enable_ssl_tls: true
    enable_start_tls: true
    password: '********'
    port: 587
    sender_address: [EMAIL]
    server: <http://email-smtp.us-east-1.amazonaws.com|email-smtp.us-east-1.amazonaws.com>
    user_name: [REDACTED]
    verify_ssl_certs: true
  sso_settings:
    enable_sso: true
    enable_sso_idp_login: true
    entity_id: http://[OKTA]
    idp_image_url: ""
    idp_name: Okta
    issuer_uri: http://[OKTA]
    metadata: ""
    metadata_url: https://[OKTA]
  vulnerability_settings:
    databases_path: ""
  webhook_settings:
    host_status_webhook:
      days_count: 0
      destination_url: ""
      enable_host_status_webhook: false
      host_percentage: 0
    interval: 24h0m0s
t
do you use the user list in the host details page?
j
nope, I thought I'd disabled it, but apparently not
I was just looking up how to turn it off
t
you can put this in a file:
Copy code
---
apiVersion: v1
kind: config
spec:
  host_settings:
    enable_host_users: false
and then
fleetctl apply -f path/to/that/file
the next release will handle the error you saw above
👍 1
j
there's not a yaml config option for enable_host_users?
I don't want to add a fleetctl command to our bootstrap scripts, that's pretty clunky
t
not for fleet serve, these are two different sets of configs. You don't have to run it at bootstrap, it'll be disable from this point on unless you drop the db
j
Copy code
fleetctl apply -f file
applying fleet config: apply config received status 500 Mail Error: sending mail: could not issue mail to provided address: 530 Authentication required
Copy code
# cat file
---
apiVersion: v1
kind: config
spec:
  host_settings:
    enable_host_users: false
t
right, it validates all the configs and the smtp ones don't seem to be good
we saw another user with the same issue, something is off with the migration
j
ah, didn't have auth type or method set
that did it
Copy code
# fleetctl apply -f test
[+] applied fleet config
Copy code
host_settings:
    enable_host_users: false
t
great, give it a bit for hosts to refetch config, and see if that helps managing the resource usage
j
I failed over our DB to a new region, and it immediately blew up. One of the things I noticed looking at the SQL statements was a high volume of "UPDATE
host_users
SET
removed_at
= CURRENT_TIMESTAMP WHERE
host_id
= ?" If I've got enable_host_users set to false, why is it still doing a host_users update?
t
the config changes what queries we give osquery, but it doesn't disable how we store the host. We can improve that, of course
we're working on the label_membership one of those, we don't have a good workaround just yet
j
sweet, that'll def help