Title
#fleet
k

koba

04/06/2021, 7:17 PM
Hi, Most of the time the queries in the UI are just not reliable. The results never show up no matter how small the query is (
SELECT * FROM osquery_info
). Has anyone else face this issue? Am I doing something wrong?
zwass

zwass

04/06/2021, 7:22 PM
Which Fleet version?
k

koba

04/06/2021, 7:23 PM
3.0.0
zwass

zwass

04/06/2021, 7:24 PM
That is quite old. The first debugging step is usually to upgrade to the latest version.
7:25 PM
Live query works reliably for tens of thousands of hosts on recent versions of Fleet.
k

koba

04/06/2021, 7:26 PM
last i checked i couldn't find latest image on docker hub so I though i'll just wait for it
7:26 PM
I'll check again after upgrading
a

arod

04/06/2021, 7:35 PM
@koba I'm using 3.9.0 in production. No issues. Latest is 3.10.0 https://github.com/fleetdm/fleet/releases
docker pull fleetdm/fleet:3.9.0
k

koba

04/06/2021, 8:26 PM
hmm, so I am trying the latest image
3.10
, But my web-server pods aren't coming up after
kubectl apply
8:26 PM
❯ kubectl get pods
NAME                                    READY   STATUS             RESTARTS   AGE
fleet-cache-redis-master-0              1/1     Running            0          3d11h
fleet-cache-redis-slave-0               1/1     Running            0          3d10h
fleet-cache-redis-slave-1               1/1     Running            9          74d
fleet-database-mysql-75ff9c4fff-ntt7p   1/1     Running            0          3d11h
fleet-prepare-db-rh78r                  0/1     Completed          0          5m13s
fleet-webserver-5548999cc6-2rcjk        0/1     CrashLoopBackOff   5          4m43s
fleet-webserver-5548999cc6-57zks        0/1     Pending            0          4m43s
fleet-webserver-5548999cc6-7hnw7        0/1     CrashLoopBackOff   5          4m43s
fleet-webserver-5548999cc6-nq26m        0/1     CrashLoopBackOff   5          4m43s
fleet-webserver-5548999cc6-wsdcv        0/1     Pending            0          4m43s
zwass

zwass

04/06/2021, 9:02 PM
You probably need to run the database migrations. Make sure you check out https://github.com/fleetdm/fleet/blob/master/docs/1-Using-Fleet/7-Updating-Fleet.md.
k

koba

04/07/2021, 3:53 AM
I did that here's my
fleet-migrations.yml
❯ cat  fleet-migrations.yml
apiVersion: batch/v1
kind: Job
metadata:
  name: fleet-prepare-db
spec:
  template:
    metadata:
      name: fleet-prepare-db
    spec:
      containers:
      - name: fleet
        image: fleetdm/fleet:latest
        command: ["fleet",  "prepare", "db"]
        env:
          - name: FLEET_MYSQL_ADDRESS
            value: fleet-database-mysql:3306
          - name: FLEET_MYSQL_PASSWORD
            valueFrom:
              secretKeyRef:
                name: fleet-database-mysql
                key: mysql-password
      restartPolicy: Never
  backoffLimit: 4
5:19 AM
So I was able to upgrade to 3.10. I had to add few more env variable in my
fleet-deployment.yml
5:20 AM
3.10 looks very noice!!
5:22 AM
So i ran a test query
select * from osquery_info
and this is what i get
4:21 PM
So I tried many queries after the upgrade. Only one (that I ran on a single machine) showed up the results. Rest all kept "spinning". Also, as you can see in the screenshot the progress status text says
0 out of 0 online hosts responding
...which is definitely not true because there were some 500+ hosts online at that point.
zwass

zwass

04/07/2021, 6:42 PM
Do hosts show up in the All Hosts label when you are on the home page? Does it work consistently when targeting a single host? If you open the network inspector are you able to see the websocket connection? Like in screenshot:
k

koba

04/08/2021, 6:44 PM
Do hosts show up in the All Hosts label when you are on the home page? 
Yes.
6:45 PM
Does it work consistently when targeting a single host?
Yes. Fails to establish web socket connection and show any results for the same host during multiple attempts.
6:46 PM
6:47 PM
zwass

zwass

04/08/2021, 6:49 PM
Oh. This is likely a load balancer issue. You'll want to dig into why the websocket connection fails. It does try to fall back to XHR instead of websockets but you really want websockets working for reliability.
k

koba

04/08/2021, 6:54 PM
i see. it makes some sense now, I tried running query on a host 3-4 time in the morning and websocket failed every single time, However it gave results later during the day.
6:54 PM
I'll investigate further from the LB side
7:01 PM
It does try to fall back to XHR instead of websockets
But does that help though? Because I still didn't see any results...
zwass

zwass

04/08/2021, 7:04 PM
It used to work mostly. I don't think many folks have LBs these days that don't support websockets. I think we may want to disable that functionality and just show you an error.
k

koba

04/09/2021, 12:25 PM
I realised i was running AWS' classic load balancer that doesn't support websockets. So I migrated to Application LB. But i still face the same issue (websocket error in dev console and XHR fall back fails with 404 Page not found). XHR fallback, however, works as soon as I set
replicas: 1
in my
fleet-deployment.yml
. Any thing more that 1, XHR fallback fails with 404.
zwass

zwass

04/09/2021, 3:24 PM
Ahh, if you want XHR to work with multiple servers I think you'll need to make sure the LB has sticky sessions so that each request goes to the same server. Best move is just to get websockets working though.