Hey friends! I have a RDS MySQL DB up and running ...
# fleet
p
Hey friends! I have a RDS MySQL DB up and running with a custom username but whenever I run fleet serve command with the username env variable it keeps saying
Unknown database 'fleet'
. Works fine otherwise. Any ideas why?
Copy code
apiVersion: <http://platform.segment.com/v1alpha1|platform.segment.com/v1alpha1>
kind: SegmentApplicationExperimental
metadata:
  name: fleetdm-webserver
  namespace: fleetdm-webserver
  labels:
    app: fleetdm-webserver
spec:
  targets:
  - name: fleetdm-stage-usw2
    cluster: sec-tooling-stage:us-west-2:fleetdm
    targetGroupBinding:
      autoDiscover: true
    replicatedService: &replicatedService
      iamRoleName: fleetdm-webserver
      autoScale: &autoScale
          minReplicas: 1
          maxReplicas: 6
          resources:
            - resource: cpu
              utilization: 15
            - resource: memory
              utilization: 15
      maincontainers:
        - name: fleetdm-webserver
          ctlstore:
            disabled: true
          imageRegistry: <http://528451384384.dkr.ecr.us-west-2.amazonaws.com|528451384384.dkr.ecr.us-west-2.amazonaws.com>
          imageName: fleetdm/fleet
          command:
            - chamber
            - exec
            - fleetdm
            - --
            - fleet
            - serve
          ports:
            - containerPort: 443
          env:
            - name: FLEET_MYSQL_ADDRESS
              value: <http://fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com:3306|fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com:3306>
            - name: FLEET_MYSQL_USERNAME
              value: fleet_rds
            - name: FLEET_REDIS_ADDRESS
              value: <http://master.fleet-cache.l5ryax.usw2.cache.amazonaws.com|master.fleet-cache.l5ryax.usw2.cache.amazonaws.com>
            - name: FLEET_SERVER_ADDRESS
              value: "0.0.0.0:443"
This is my Kubernetes App config file
m
p
oh so you mean instead of FLEET_MYSQL_ADDRESS , I should use FLEET_MYSQL_DATABASE?
Copy code
env:
            - name: FLEET_MYSQL_ADDRESS
              value: <http://fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com:3306|fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com:3306>
            - name: FLEET_MYSQL_DATABASE
              value: <http://fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com:3306|fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com:3306>
            - name: FLEET_MYSQL_USER
              value: "fleet_rds"
            - name: FLEET_REDIS_ADDRESS
              value: <http://master.fleet-cache.l5ryax.usw2.cache.amazonaws.com|master.fleet-cache.l5ryax.usw2.cache.amazonaws.com>
            - name: FLEET_SERVER_ADDRESS
              value: "0.0.0.0:443"
Added it like so. It doesn't seem to be picking up the username and keeps saying
Copy code
ts=2021-10-07T19:30:24.561132517Z mysql="could not connect to db: Error 1045: Access denied for user 'fleet'@'10.80.94.227' (using password: YES), sleeping 14s"
Copy code
% kc exec fleetdm-webserver-58c85b9bf7-kqfdr -n fleetdm-webserver -- fleet serve                          
ts=2021-10-07T20:03:13.427339705Z mysql="could not connect to db: Error 1045: Access denied for user 'fleet'@'10.80.79.187' (using password: YES)
however just after adding the mysql_username flag it starts complaining about the database being unknown
Copy code
% kc exec fleetdm-webserver-58c85b9bf7-kqfdr -n fleetdm-webserver -- fleet serve --mysql_username=fleet_rds
ts=2021-10-07T20:04:42.621780497Z mysql="could not connect to db: Error 1049: Unknown database '<http://fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com:3306|fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com:3306>', sleeping 0s"
is the only way to solve this is to not use a custom username?
z
No, both are supported. Perhaps you need to actually
CREATE DATABASE fleet
within your MySQL instance?
FLEET_MYSQL_ADDRESS
refers to the address the MySQL server is listening on, while
FLEET_MYSQL_DATABASE
refers to the logical database within MySQL (eg. one created in MySQL with
CREATE DATABASE
and accessed with
USE DATABASE
)
p
Ah got it!
It's up and running 🎉 Thanks folks!
suuuper silly question. how do I find out which URL it's up and running at?
b
How are you running fleet? If you have access to the host that is running fleet you should be able to access it at localhost:8080
p
Im running it on an EKS cluster and I think I know what's broken 😄 I need to make the app aware of the ALB that it should be serving off
hey guys I was on this basically all of last night still debugging. Now at a point where I need to rule out the possibility that the App isn't up and running before going any further. What's the best way to do that? Once I can verify that, I'll continue to debug AWS networking issues.
z
Can you connect into a pod in the cluster and
curl
the internal address?
In my experience this kind of issue on AWS is often associated with an improperly configured security group. Very painful to debug.
p
yeahhhh that's literally what I was doing last night and the first thing I made sure was that My Load Balancer was allowing inbound traffic from all IPs on port 80 and 443
pod info incoming
Copy code
% kc logs -n fleetdm-webserver -f fleetdm-webserver-85484d6c79-v4h6h 
warning: service fleetdm overwriting environment variable FLEET_MYSQL_PASSWORD
warning: service fleetdm overwriting environment variable FLEET_REDIS_PASSWORD
Copy code
% kc exec fleetdm-webserver-85484d6c79-v4h6h -n fleetdm-webserver -- fleet config_dump
mysql:
  protocol: tcp
  address: <http://fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com:3306|fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com:3306>
  username: fleet
  password: ""
  password_path: ""
  database: fleetdatabase
  tls_cert: ""
  tls_key: ""
  tls_ca: ""
  tls_server_name: ""
  tls_config: ""
  max_open_conns: 50
  max_idle_conns: 50
  conn_max_lifetime: 0
mysql_read_replica:
  protocol: tcp
  address: ""
  username: fleet
  password: ""
  password_path: ""
  database: fleet
  tls_cert: ""
  tls_key: ""
  tls_ca: ""
  tls_server_name: ""
  tls_config: ""
  max_open_conns: 50
  max_idle_conns: 50
  conn_max_lifetime: 0
redis:
  address: <http://master.fleet-cache.l5ryax.usw2.cache.amazonaws.com:6379|master.fleet-cache.l5ryax.usw2.cache.amazonaws.com:6379>
  password: ""
  database: 0
  use_tls: false
  duplicate_results: false
  connect_timeout: 5s
  keep_alive: 10s
  connect_retry_attempts: 0
  cluster_follow_redirections: false
server:
  address: 0.0.0.0:443
  cert: ./tools/osquery/fleet.crt
  key: ./tools/osquery/fleet.key
  tls: true
  tls_compatibility: intermediate
  url_prefix: ""
  keepalive: true
auth:
  bcrypt_cost: 12
  salt_key_size: 24
app:
  token_key_size: 24
  invite_token_validity_period: 120h0m0s
session:
  key_size: 64
  duration: 4h0m0s
osquery:
  node_key_size: 24
  host_identifier: provided
  enroll_cooldown: 0s
  status_log_plugin: filesystem
  result_log_plugin: filesystem
  label_update_interval: 1h0m0s
  policy_update_interval: 1h0m0s
  detail_update_interval: 1h0m0s
  status_log_file: ""
  result_log_file: ""
  enable_log_rotation: false
  max_jitter_percent: 10
logging:
  debug: false
  json: false
  disable_banner: false
firehose:
  region: ""
  endpoint_url: ""
  access_key_id: ""
  secret_access_key: ""
  sts_assume_role_arn: ""
  status_stream: ""
  result_stream: ""
kinesis:
  region: ""
  endpoint_url: ""
  access_key_id: ""
  secret_access_key: ""
  sts_assume_role_arn: ""
  status_stream: ""
  result_stream: ""
lambda:
  region: ""
  access_key_id: ""
  secret_access_key: ""
  sts_assume_role_arn: ""
  status_function: ""
  result_function: ""
s3:
  bucket: ""
  prefix: ""
  access_key_id: ""
  secret_access_key: ""
  sts_assume_role_arn: ""
pubsub:
  project: ""
  status_topic: ""
  result_topic: ""
  add_attributes: false
filesystem:
  status_log_file: /tmp/osquery_status
  result_log_file: /tmp/osquery_result
  enable_log_rotation: false
  enable_log_compression: false
license:
  key: ""
vulnerabilities:
  databases_path: ""
  periodicity: 1h0m0s
  cpe_database_url: ""
  cve_feed_prefix_url: ""
  current_instance_checks: auto
  disable_data_sync: false
z
Maybe something like
kc exec fleetdm-webserver-85484d6c79-v4h6h -n fleetdm-webserver -- curl <http://localhost:8080>
?
p
yeah can't curl lol coz no perms. updating the dockerfile with run add apk url just now and pushing the image ... then will spin up a new pod gimme a few
👍 1
Copy code
% kc exec fleetdm-webserver-7fd4b65998-5njdc -n fleetdm-webserver -- curl <https://0.0.0.0:443>  
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 0.0.0.0 port 443 after 0 ms: Connection refused
command terminated with exit code 7
omfg ... of all the things I checked last night, one that I didn't was aws-controller-lb logs. Just did that and tada 🎉 now figuring out how to make it find port https on the server and not http
Copy code
{
  "level": "error",
  "ts": 1633716912.7189045,
  "logger": "controller",
  "msg": "Reconciler error",
  "reconcilerGroup": "elbv2.k8s.aws",
  "reconcilerKind": "TargetGroupBinding",
  "controller": "targetGroupBinding",
  "name": "fleetdm-webserver",
  "namespace": "fleetdm-webserver",
  "error": "unable to find port http on service fleetdm-webserver/fleetdm-webserver"
}
z
Were you able to verify that the server is listening?
p
yeah so I tried curling to permutation combinations of http/https: localhost/0.0.0.0 : 443/80/8080 and ain't able to curl into nothing
so yeah looks like the server is not listening either
z
Do you have logs from the Fleet server pod?
p
quick question .. Is Fleet set up such that that webserver won't get up and running until the mysql DB connection is healthy?
yes I do
z
Can you paste some of those logs?
webserver won't get up and running until the mysql DB connection is healthy?
Yes it should not start up if it can't connect to MySQL
Logs would look like
Copy code
ts=2021-10-08T20:45:24.715901Z mysql="could not connect to db: dial tcp [::1]:3306: connect: connection refused, sleeping 0s"
ts=2021-10-08T20:45:24.716481Z mysql="could not connect to db: dial tcp [::1]:3306: connect: connection refused, sleeping 1s"
p
so fyi I got the logs only after I se
--logging_debug=true
this flag. Without it it was showing nothing
Copy code
% kc exec fleetdm-webserver-7cdc49c6d6-z7h54 -n fleetdm-webserver -- chamber exec fleetdm -- fleet serve --logging_debug=true --server_tls=false
warning: service fleetdm overwriting environment variable FLEET_REDIS_PASSWORD
warning: service fleetdm overwriting environment variable FLEET_MYSQL_PASSWORD
ts=2021-10-08T20:32:26.026929776Z mysql="could not connect to db: Error 1045: Access denied for user 'fleet'@'10.80.93.203' (using password: YES), sleeping 0s"
ts=2021-10-08T20:32:26.030594949Z mysql="could not connect to db: Error 1045: Access denied for user 'fleet'@'10.80.93.203' (using password: YES), sleeping 1s"
z
Looks like this exec command is probably not working as expected with your env vars
p
ensured the credentials are loading correct
it is
wait i'll show you da proof
z
Wasn't your db username something other than
fleet
?
p
it was ... but I changed it to
fleet
and redeployed to keep things simple
z
Ah I see
p
but I guess things still aren't simple 'enough' ... coz my db password is still a long complex string with special characters (like security engineers recommend it to be 😄) and I am thinking of making that super simple now as well for debugging purposes
z
Yeah, making it simple for debugging could help
Maybe try
kc exec fleetdm-webserver-7cdc49c6d6-z7h54 -n fleetdm-webserver -- chamber exec fleetdm -- fleet serve --logging_debug=true --server_tls=false --mysql_password='<PASTE HERE>'
(you'll want to change the password after doing that once you get it working as it's going to leak the password into shell history)
p
yeah for sure
Ran this ->
kc exec fleetdm-webserver-7cdc49c6d6-z7h54 -n fleetdm-webserver -- chamber exec fleetdm -- fleet serve --logging_debug=true --server_tls=false --mysql_password='<new_pass>'
... there are no errors now but curl still doesn't work on 0.0.0.0:8080
The earlier DB connection error has disappeared and logs also show only these two lines now
Copy code
% kc logs fleetdm-webserver-7cdc49c6d6-z7h54 -n fleetdm-webserver
warning: service fleetdm overwriting environment variable FLEET_MYSQL_PASSWORD
warning: service fleetdm overwriting environment variable FLEET_REDIS_PASSWORD
(sec-tooling-stage-write/sec-tooling-stage:us-west-2:fleetdm) prima.virani@Primas-MacBook-Pro terracode-ops %
but curling still doesn't work
Copy code
% kc exec fleetdm-webserver-7cdc49c6d6-z7h54 -n fleetdm-webserver -- curl <http://0.0.0.0:8080>
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 0.0.0.0 port 8080 after 0 ms: Connection refused
command terminated with exit code 7
z
After you run
kc exec fleetdm-webserver-7cdc49c6d6-z7h54 -n fleetdm-webserver -- chamber exec fleetdm -- fleet serve --logging_debug=true --server_tls=false --mysql_password='<new_pass>'
do you get any logs? You should see something like
Copy code
level=info ts=2021-10-08T21:34:57.476844Z component=crons cron=vulnerabilities vulnerabilityscanning="not configured"
ts=2021-10-08T21:34:57.501393Z transport=https address=0.0.0.0:8080 msg=listening
p
no logs
at all
z
Does the process exit?
p
I believe so. but how do I confirm?
z
Do you end up back at a shell prompt?
p
yes I do
z
Can you just exec
sh
on that pod?
p
let me try
z
And then try the
fleet serve
command from that shell
p
btw I changed the redis password too and made it like as simple redis would let me make it. let it finish deploying and I will try redeploying the pod and restarting everything
would it hurt if I remove this flag from my yaml file? no right? coz it defaults to 0.0.0.0:8080 anyways?
Copy code
- name: FLEET_SERVER_ADDRESS
              value: "0.0.0.0:8080"
👍 1
z
Yeah that should be the default
p
now the logs aren't populating at all
new config
Copy code
% kc exec fleetdm-webserver-57bc6745-kj6h4 -n fleetdm-webserver -- chamber exec fleetdm -- printenv                       
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=fleetdm-webserver-57bc6745-kj6h4
FLEET_MYSQL_ADDRESS=<http://fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com:3306|fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com:3306>
FLEET_MYSQL_DATABASE=fleetdatabase
FLEET_SERVER_TLS=false
NODE_IP=10.80.65.190
AWS_DEFAULT_REGION=us-west-2
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
FLEET_REDIS_ADDRESS=<http://master.fleet-cache.l5ryax.usw2.cache.amazonaws.com:6379|master.fleet-cache.l5ryax.usw2.cache.amazonaws.com:6379>
FLEET_LOGGING_DEBUG=true
SEGMENT_CELL=core
AWS_REGION=us-west-2
AWS_ROLE_ARN=arn:aws:iam::169172804835:role/fleetdm.usw2.eks.fleetdm-webserver
KUBERNETES_PORT_443_TCP=<tcp://172.20.0.1:443>
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_ADDR=172.20.0.1
KUBERNETES_SERVICE_HOST=172.20.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT=<tcp://172.20.0.1:443>
HOME=/home/fleet
FLEET_MYSQL_PASSWORD=***
FLEET_REDIS_PASSWORD=***
Copy code
% kc logs -n fleetdm-webserver -f fleetdm-webserver-57bc6745-kj6h4
Copy code
% kc exec fleetdm-webserver-57bc6745-kj6h4 -n fleetdm-webserver -- sh
no output. nothing
z
Stretching my k8s knowledge here, but I think you need
-i -t
to connect to an interactive shell
p
Copy code
% kc exec -ti fleetdm-webserver-57bc6745-tw4kh -n fleetdm-webserver -- /bin/sh
/ $
z
Can you then try your full fleet serve command?
p
so the service seems running when I do top
Copy code
Load average: 0.29 0.18 0.18 2/995 106
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    1     0 fleet    S    1084m  14%   1   0% fleet serve
   94     0 fleet    S     1668   0%   0   0% /bin/sh
  106    94 fleet    R     1600   0%   0   0% top
on doing netstat I do see the connections with mysql/ redis working
Copy code
/ $ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 fleetdm-webserver-57bc6745-tw4kh:38904 ip-10-80-140-145.us-west-2.compute.internal:mysql ESTABLISHED 
tcp        0      0 fleetdm-webserver-57bc6745-tw4kh:39298 ip-10-80-50-194.us-west-2.compute.internal:redis ESTABLISHED 
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node Path
z
netstat -tulpn
do you see anything in state
LISTEN
on 8080?
p
nope
and apparently status S above means the service is in interruptible sleep right now.
in which directory do I find the service logs?
z
Fleet logs to stderr -- I don't know what k8s would do with that
Can you try running the
fleet serve
command in this shell and see if there's any output?
p
yeah doing it now
Copy code
$ fleet serve
ts=2021-10-08T23:57:38.521866342Z mysql="could not connect to db: Error 1045: Access denied for user 'fleet'@'10.80.94.227' (using password: NO), sleeping 0s"
ts=2021-10-08T23:57:38.525573097Z mysql="could not connect to db: Error 1045: Access denied for user 'fleet'@'10.80.94.227' (using password: NO), sleeping 1s"
ts=2021-10-08T23:57:39.529444031Z mysql="could not connect to db: Error 1045: Access denied for user 'fleet'@'10.80.94.227' (using password: NO), sleeping 2s"
ts=2021-10-08T23:57:41.533580427Z mysql="could not connect to db: Error 1045: Access denied for user 'fleet'@'10.80.94.227' (using password: NO), sleeping 3s"
ts=2021-10-08T23:57:44.537292093Z mysql="could not connect to db: Error 1045: Access denied for user 'fleet'@'10.80.94.227' (using password: NO), sleeping 4s"
ts=2021-10-08T23:57:48.541150984Z mysql="could not connect to db: Error 1045: Access denied for user 'fleet'@'10.80.94.227' (using password: NO), sleeping 5s"
z
Now try adding the flags for the db connection?
p
yeah constructing the whole flags rn
Copy code
~ $ fleet serve --mysql_database=fleetdatabase --redis_address=<http://master.fleet-cache.l5ryax.usw2.cache.amazonaws.com:6379|master.fleet-cache.l5ryax.usw2.cache.amazonaws.com:6379> --mysql_address=fleetdatabase.crq
<http://bc9r8uf32.us-west-2.rds.amazonaws.com:3306|bc9r8uf32.us-west-2.rds.amazonaws.com:3306> --mysql_password=*** --redis_password=*** --logging_debug=true --serve
r_tls=false
nothing ... no results ... doesn't fail either. it's just sitting there
z
Can you
ping <http://master.fleet-cache.l5ryax.usw2.cache.amazonaws.com|master.fleet-cache.l5ryax.usw2.cache.amazonaws.com>
?
and also
ping <http://fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com|fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com>
p
Copy code
% ping <http://master.fleet-cache.l5ryax.usw2.cache.amazonaws.com|master.fleet-cache.l5ryax.usw2.cache.amazonaws.com> 
PING <http://fleet-cache-001.fleet-cache.l5ryax.usw2.cache.amazonaws.com|fleet-cache-001.fleet-cache.l5ryax.usw2.cache.amazonaws.com> (10.80.50.194): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
^C
--- <http://fleet-cache-001.fleet-cache.l5ryax.usw2.cache.amazonaws.com|fleet-cache-001.fleet-cache.l5ryax.usw2.cache.amazonaws.com> ping statistics ---
6 packets transmitted, 0 packets received, 100.0% packet loss
(sec-tooling-stage-write/sec-tooling-stage:us-west-2:fleetdm) prima.virani@Primas-MacBook-Pro terracode-security % ping  <http://fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com|fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com>
PING <http://fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com|fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com> (10.80.140.145): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
^C
--- <http://fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com|fleetdatabase.crqbc9r8uf32.us-west-2.rds.amazonaws.com> ping statistics ---
6 packets transmitted, 0 packets received, 100.0% packet loss
z
Oh, well that probably explains some things. Looks like there's no connectivity to the MySQL or Redis instances. Fleet should definitely be helping you out more here though. I'll file an issue for that.
p
oh wait wait
that was run from my localhost ... let me run that fromt the pod
z
So do the security groups allow outbound connections from the cluster to the db/redis? And do db/redis allow inbound connections from the cluster?
oh
lmk how it goes then
p
yes I've set the redis and DB inbound rules to allow any any for the meanwhile
and all the security groups allow outbound any any
ugh no perms to ping ... fleet user has limited perms
deploying a new image with sudo permissions for the fleet user at least for debugging purposes because it's limiting during debugging otherwise
z
Sounds good. Alternatively could deploy an
alpine
image or similar
p
it's already running on Alpine
Copy code
FROM segment/chamber:2.10.6 as chamber
FROM fleetdm/fleet:v4.3.2 as fleet
FROM fleetdm/fleetctl:v4.3.2 as fleetctl
FROM alpine:3.14.2

RUN apk --update add ca-certificates
RUN apk add curl
RUN apk add --no-cache su-exec

# Create FleetDM group and user
RUN addgroup -S fleet && adduser -S fleet -G fleet

# Add Chamber Binary
COPY --from=chamber /chamber /usr/local/bin/chamber

# Add Fleet Binary
COPY --from=fleet /usr/bin/ /usr/bin/
COPY --from=fleetctl /usr/bin /usr/bin/

USER fleet
CMD ["fleet", "serve"]
I'll stop pinging if Im the only thing between you and the weekend right now 😄 Thanks for all the help thus far
z
Ha thank you! I'm still wrapping up some other things so feel free to keep it up.
I'll let you know when I gotta go
I was thinking an
alpine
base image where you'd be
root
. But your idea of removing the
USER fleet
line should work fine.
p
actually I don't need to ping to check the connection
Copy code
$ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 fleetdm-webserver-57bc6745-tw4kh:38904 ip-10-80-140-145.us-west-2.compute.internal:mysql ESTABLISHED 
tcp        0      0 fleetdm-webserver-57bc6745-tw4kh:39298 ip-10-80-50-194.us-west-2.compute.internal:redis ESTABLISHED
we already know the connection is established
and it stays established
z
I think those are different addresses -- is that intentional?
p
which addresses ?
the addresses are of the redis and mysql hosted in aws
z
I see "Foreign Address"
ip-10-80-140-145.us-west-2.compute.internal:mysql
and
ip-10-80-50-194.us-west-2.compute.internal:redis
-- those don't seem to match the addresses used in https://osquery.slack.com/archives/C01DXJL16D8/p1633738072026500?thread_ts=1633632344.474100&amp;cid=C01DXJL16D8
Looks like they are probably some DNS internal to the cluster or something? I don't know if that could make a difference
p
yeah so I turned on logging on our database and here's a warning I saw. May or may not mean the database can't connect back to the pod.
Copy code
2021-10-09T00:12:50.627-07:00	2021-10-09T07:12:50.627260Z 231 [Warning] [MY-010055] [Server] IP address '10.80.94.227' could not be resolved: Name or service not known
but also confused by the fact that netstat shows connection established when
fleet serve
is running (we hardcoded all the flags in env for now). If it isn't able to connect successfully how would the connection be in 'established' state 🤔
Copy code
/ $ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 fleetdm-webserver-589fc79fdf-5ltl9:38136 ip-10-80-50-194.us-west-2.compute.internal:redis ESTABLISHED 
tcp        0      0 fleetdm-webserver-589fc79fdf-5ltl9:37742 ip-10-80-140-145.us-west-2.compute.internal:mysql ESTABLISHED 
tcp        0      0 fleetdm-webserver-589fc79fdf-5ltl9:47692 ip-10-80-50-194.us-west-2.compute.internal:redis ESTABLISHED
then
fleet serve
runs and runs and runs without any errors or any info about where it's getting stuck and we ctrl+c out of it. and then I run netstat again and look what happens!
Copy code
/ $ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 fleetdm-webserver-589fc79fdf-5ltl9:47692 ip-10-80-50-194.us-west-2.compute.internal:redis ESTABLISHED 
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node Path
the connection with redis remains but the one with mysql goes away! I don't know if any of this even related or points to anything. Just sharing it here for my own better understanding for most parts. I've just been RTFMing as hard as possible
Hey team! I just confirmed the database connection is working fine. I tried connecting using the following commands and here's what worked and what didn't work
Copy code
/ $ mysql --host <http://fleet.crqbc9r8uf32.us-west-2.rds.amazonaws.com|fleet.crqbc9r8uf32.us-west-2.rds.amazonaws.com> --port 3306 --database fleetdatabase --password <my_password>
^^ This did not work and gave an error saying
ERROR 1049 (42000): Unknown database <my_password>
Copy code
$ mysql --host <http://fleet.crqbc9r8uf32.us-west-2.rds.amazonaws.com|fleet.crqbc9r8uf32.us-west-2.rds.amazonaws.com> --port 3306 --database fleetdatabase --password
Enter password: 
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 1002
Server version: 8.0.25 Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [fleetdatabase]>
^^^ This worked
but fleet serve still does not work 😞
m
What is the error for
fleet serve
?
z
Still just hangs with no output?
p
Copy code
% kc logs -n fleetdm-webserver fleetdm-webserver-5576655d6f-9vsd7
level=info ts=2021-10-13T17:33:43.324015438Z component=crons cron=vulnerabilities vulnerabilityscanning="not configured"
ts=2021-10-13T17:33:43.725825861Z transport=http address=0.0.0.0:80 msg=listening
Finally!!! 🎉 but
Copy code
/ $ curl <http://0.0.0.0:80>
<a href="/setup">Temporary Redirect</a>.
The problem was Redis. In AWS Redis lets you set auth token only if TLS is enabled and I luckily noticed a question about redis TLS support in Fleet that someone had posted here yesterday which kind of gave me a lead to start looking in that direction although netstat output saying the redis connection is 'ESTABLISHED' was a bit misleading initially. But yeah I tried connecting to the redis cache cluster with
redis-cli
and it showed the same behaviour as the app. No logs and the connection was simply hanging which made me even surer that it has to be redis-related. Today I deployed a new cluster without auth and TLS disabled and it worked 🙂 Now figuring out what's up with that temporary redirect
z
🎉
You're being redirected to
/setup
to create the initial user
I did not realize that AWS Redis had that limitation with auth. We will work on getting TLS support into 4.5.0.
💯 1
🙏🏽 1
p
ohhhh cool .... can I setup via cli as well?
z
Yep,
fleetctl setup
🚀 1
🎊 1
fleet 1
p
Copy code
/ $ fleetctl setup --email <mailto:pvirani@twilio.com|pvirani@twilio.com> --name Prima --password *** --org-name Twilio
error creating Fleet API client handler: address must start with https:// for remote connections
I don't want to set up TLS on fleet server for now. The traffic from the internet will hit the load balancer on https but from load balancer -> fleetdm-webserver I want it to be on http only so I've set up my webserver on http://0.0.0.0:80 ... but I get this error at the time of setup ... what do I do?
z
Try this
Copy code
fleetctl config set --tls-skip-verify=true --address=<http://localhost:80>
and then try it again.
🎉 1
👍🏽 1
p
🎉
Copy code
/ $ curl <http://localhost:80>
<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <meta name="robots" content="noindex">

    <link rel="stylesheet" type="text/css" href="/assets/bundle-10a9feb55abb44f6de13.css">
    <link rel="shortcut icon" href="/assets/favicon.ico">

    <title>Fleet for osquery</title>
    <script type="text/javascript">
        var urlPrefix = "";
    </script>
  </head>
  <body>
    <div id="app"></div>
    <script async defer src="/assets/bundle-f79228ceeede741c7007.js" onload="this.parentElement.removeChild(this)"></script>
    
    <script>document.addEventListener("touchstart", function() {},false);</script>
    
  </body>
</html>
z
Nice!
p
last piece of the puzzle (I think) ... html ain't rendering on the webpage 🤔
z
Can you open the browser devtools and see what the HTTP responses look like?
p
yeah was literally doing that. this is what the page source looks like. looking further now.
z
Ah sorry, the network tab?
Also any errors in the JS console?
p
yeah was just fetching you the screenshot lol ... I see a bunch of 400s
first thing I did was comparing the curl response on the webserver to the one on my local machine which is running successfully rn with fleetctl preview
that one shows no 400s ... the webserver one shows a bunch of 400s
z
Is there any response body in one of those 400s?
Actually can you paste me that URL? I can just look for myself
(and typing it out manually from your screenshot seems... painful)
p
Copy code
Request URL: <https://fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com/assets/bundle-10a9feb55abb44f6de13.css>
Request Method: GET
Status Code: 400 
Remote Address: 52.13.90.185:443
Referrer Policy: strict-origin-when-cross-origin
Looks like it's got to do with traffic between my loadbalancer and the app
Copy code
$ nslookup <http://fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com|fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com>
Server:		169.254.20.10
Address:	169.254.20.10:53

Non-authoritative answer:
*** Can't find <http://fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com|fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com>: No answer

Non-authoritative answer:
Name:	<http://fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com|fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com>
Address: 52.40.25.210
Name:	<http://fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com|fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com>
Address: 54.203.88.105
Name:	<http://fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com|fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com>
Address: 52.13.90.185
Name:	<http://fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com|fleet20210924001411691500000003-825359854.us-west-2.elb.amazonaws.com>
Address: 44.236.167.10
z
Ah yeah thank you
I can see the 400s
p
yeah Imm looking at the security groups again
z
Are there any errors in the Fleet server logs?
I suspect this 400 is from the load balancer
p
yeah exactly
z
Seems like any request to a path other than
/
results in a 400
I think it's probably not security group because the
/
request is definitely making it to the Fleet server
Is there something in the ingress for k8s that needs to be configured to allow certain paths? Again at the limit of my k8s knowledge.
p
yeah you know when we were on the setup stage, the LB url was also successfully getting redirected to /setup even ... so Im also a little confused there
looking at k8s stuff now
z
I just took a double-check and there's nowhere that Fleet explicitly returns a 400. It's possible something in the underlying library code could be doing so.
p
I have a suspicion of what might be causing it. Imma flipping it now and then will report back if/when it works
👍 1
z
!!!
What was the missing piece?
p
TL;DR The Load Balancer was set up with a custom internal module and it was set to forward only
/
path to my webserver and for all the paths inside that path it would return a 400. Fixing that fixed everything else long bumpy road but finally we here 🎊🎉🚀 Thanks for all the help Zach and Martavis! Definitely could not have done this without both of you ... can't thank you enough. Blogpost incoming soon to help the community set up a similar structure
👏 1
🍻 1
z
Thank you for your patience and persistence! Blog post will be very much appreciated 😄 let us know when it's up and we will help promote.
💯 1
p
Im so excited honestly! I was so flustered there in between for a while that I was literally forgetting my debugging basics even .... but now we here!