Hey folks I raised this issue <https github com kolide fleet osquery #kolide

Hey folks - I raised this issue <https://github.co...

Jon O'Brien

03/07/2019, 4:59 PM

Hey folks - I raised this issue https://github.com/kolide/fleet/issues/2008 yesterday regarding

fleetctl

not returning any results and updated it today with some more info... I'm afraid I'm new to

fleetctl

so I could well be doing something wrong but afaik it seems to be setup correctly. Unfortunately I don't get any error messages just a blank line with no results - any suggestions or guidance on how to get more debug info (

--debug

doesn't show anything) that could point me in the direction of what might be going wrong would be gratefully appreciated! Thanks in advance!

zwass

03/07/2019, 5:43 PM

Are you able to query via the Fleet UI?

Jon O'Brien

03/07/2019, 5:47 PM

yes

zwass

03/07/2019, 5:51 PM

When you do that, look in the network panel in the devtools and see whether it is using a websocket or XHR requests.

zwass

03/07/2019, 5:51 PM

Is your Fleet server behind a load balancer?

Jon O'Brien

03/07/2019, 5:51 PM

yes - I believe it is

zwass

03/07/2019, 5:51 PM

Ah yes I see that it is

zwass

03/07/2019, 5:52 PM

Does your load balancer support websockets?

Jon O'Brien

03/07/2019, 5:52 PM

...checking

zwass

03/07/2019, 5:52 PM

I suspect that your load balancer doesn't support websockets... On the browser we have a good library for getting around this with XHR requests. In Go (where fleetctl is written) we don't.

ralph

03/07/2019, 10:32 PM

Zack, if we run fleetctl directly on 1 backend kolide server, would it help?

zwass

03/07/2019, 10:41 PM

Yeah if you connect fleetctl directly to one of the servers it ought to work fine.

ralph

03/07/2019, 10:41 PM

one more thing, if I enable session persistence on the L7 LB , would that help too?

zwass

03/07/2019, 11:20 PM

What matters is whether the LB supports websockets. It doesn't matter which Fleet server the client connects to as long as (1) Redis is working and (2) websockets are supported for communication with the client.

Jon O'Brien

03/08/2019, 12:31 PM

It does appear that websockets is supported in the LB - is there a configuration setting which is required to ensure fleetctl can communicate through them? Also, how might I be able to see more debug information coming back from

fleetctl

- i'm sure I've seen in some issues an amount of debug info being returned after running a command very similar to mine which returns nothing?

zwass

03/08/2019, 4:48 PM

I did some testing and it does seem that fleetctl should print an error if it is unable to establish the websocket connection. It will also print any errors it encounters from the server, and errors reading from the websocket connection. Can you check a couple more things please? What version of fleetctl are you running (

fleetctl --version

)? Do you see any logs on the server indicating that the client connected? Any errors?

Jon O'Brien

03/08/2019, 4:50 PM

Copy code

fleetctl - version 2.0.2
  branch:       master
  revision:     8ca0358bf28173685815b79d8683a4239d629a14
  build date:   2019-01-18T00:39:59Z
  build user:   zwass
  go version:   go1.11.3

Do you mean logs on the fleet server? If so, not sure where I'd find them (sorry)?

zwass

03/08/2019, 5:19 PM

Yeah, on the Fleet server. They are printed to stderr of the Fleet process.

zwass

03/08/2019, 6:31 PM

Can you please clarify one more thing... In the examples in https://github.com/kolide/fleet/issues/2008 are both of those commands exiting immediately (or near immediately)?

Jon O'Brien

03/08/2019, 7:52 PM

I will try and get the stderr output - in the meantime, yes, they exit immediately.

zwass

03/08/2019, 7:54 PM

Can you show the output of

fleetctl get label 'All Hosts'

? Also, a screenshot of the Fleet dashboard page sidebard with the labels. Like this:

zwass

03/08/2019, 7:54 PM

Jon O'Brien

03/08/2019, 7:56 PM

Copy code

$ fleetctl get label 'All Hosts'
apiVersion: v1
kind: label
spec:
  ID: 0
  description: All hosts which have enrolled in Kolide
  label_type: 1
  name: All Hosts
  query: select 1;
$

Jon O'Brien

03/08/2019, 8:13 PM

zwass

03/09/2019, 8:57 PM

I am still interested in the Fleet server logs. Also, are you able to create a label with a query like

select 1

(which should include all hosts) and query against that label?

Jon O'Brien

03/11/2019, 11:57 AM

OK - I can see that although the

fleetctl

fails, the results are being written to

/var/log/kolide/status.log

on each of the 3 load balanced kolide servers.

Jon O'Brien

03/11/2019, 11:58 AM

The number of hosts online has greatly reduced now (not sure why)

Jon O'Brien

03/11/2019, 11:58 AM

Jon O'Brien

03/11/2019, 11:59 AM

...and now, queries are working as expected for 171 hosts online

6% responded (99% online) | 170/2680 targeted hosts (170/171 online)

Jon O'Brien

03/11/2019, 12:04 PM

I created a label `test label`:

Copy code

$ fleetctl get label --context my_context 'test label'
apiVersion: v1
kind: label
spec:
  ID: 0
  description: ""
  label_type: 0
  name: test label
  query: select 1
$

Jon O'Brien

03/13/2019, 3:22 PM

@zwass - any more thoughts on this please? Current situation is that when hosting fleet on multiple servers through a load balancer,

fleetctl

will execute and immediately fail:

Copy code

[----------]$ fleetctl query --timeout 25m --query 'select name,version from os_version' --labels 'All Hosts' --exit
⠋ [----------]$
[----------]$ fleetctl query --timeout 25m --query 'select name,version from os_version' --labels 'All Hosts'
0% responded (0% online) | 0/0 targeted hosts (0/0 online)
[----------]$

On closer inspection of the

/var/log/kolide/status.log

on each of the 3 load balanced kolide servers I can see that data is actually being captured - it's just not being returned to the

fleetctl

call. Is there a timing setting that might be causing

fleetctl

to drop out before any results have been captured?

zwass

03/13/2019, 6:52 PM

Are you able to query that test label?

Jon O'Brien

03/14/2019, 10:38 AM

Yes - I can query against my 'test label' and 'All Hosts'.

fleetctl

seems to work more reliably with just a small number of hosts online (even through our load balancer). It appears to be more of an issue when we have a couple thousand hosts online.

zwass

03/15/2019, 7:04 PM

When you query the test label you get results? What about if you query individual hosts?

Jon O'Brien

03/18/2019, 10:19 AM

I don't think the problem is to do with the label or individual host queries - it appears to be related to how load balanced fleet servers communicate back to

fleetctl

. As a workaround, we have reduced the number of fleet servers down to 1 and early testing appears to indicate that

fleetctl

is working reliably now.

Jon O'Brien

03/25/2019, 10:36 AM

Hi @zwass - I'm still having issues with this environment despite reducing the number of fleet servers to 1 through the load balancer. I've managed to look in the fleet stderr log in docker and I get a few of the following errors occurring around the time that

fleetctl

fails:

Copy code

{"log":"{\"err\":\"sending: sockjs: session not in open state\",\"msg\":\"error writing to channel\",\"ts\":\"2019-03-24T04:08:13.081053265Z\"}\n","stream":"stderr","time":"2019-03-24T04:08:13.08115086Z"}
{"log":"{\"component\":\"service\",\"err\":null,\"ip_addr\":\"192.168.20.3:60635\",\"method\":\"SubmitDistributedQueryResults\",\"took\":\"8.070283ms\",\"ts\":\"2019-03-24T04:08:13.086910133Z\"}\n","stream":"stderr","time":"2019-03-24T04:08:13.087039196Z"}
{"log":"{\"err\":\"write status\",\"msg\":\"error updating status\",\"ts\":\"2019-03-24T04:08:13.322847484Z\"}\n","stream":"stderr","time":"2019-03-24T04:08:13.32299169Z"}

Any idea what might be causing this?

zwass

03/25/2019, 11:19 PM

Have you tried connecting directly to Fleet rather than through the LB? This seems like an LB issue.

3 Views

Open in Slack

Previous Next