Been following the read me but can t quite get there osquery #fleet

Join Slack

Been following the read me but can’t quite get the...

# fleet

Rod Christiansen

06/17/2022, 4:04 PM

Been following the read me but can’t quite get there

zwass

06/17/2022, 4:06 PM

Where are you stuck? Any error messages?

Rod Christiansen

06/17/2022, 4:11 PM

Hey Zach, thanks for jumping in

Rod Christiansen

06/17/2022, 4:11 PM

Let me get you a log

Rod Christiansen

06/17/2022, 4:21 PM

So when I run

terraform show

I get:

Rod Christiansen

06/17/2022, 4:22 PM

If I re-run

terraform apply -var-file=prod.tfvars

I get a lot of

Rod Christiansen

06/17/2022, 4:23 PM

Copy code

│ Error: failed creating IAM Role (fleetdm-role): EntityAlreadyExists: Role with name fleetdm-role already exists.

Rod Christiansen

06/17/2022, 4:23 PM

It’s trying to re-create some resources, is there a way to skip it?

Rod Christiansen

06/17/2022, 4:27 PM

Still happens after running a

terraform destroy

as well

Rod Christiansen

06/17/2022, 4:30 PM

Essentially, I’m not sure if I have everything up

zwass

06/17/2022, 4:30 PM

Hmm, sounds like you first ran it without the vars file, then tried again with the vars file?

Rod Christiansen

06/17/2022, 4:31 PM

Mmmm that’s a possibility? I might’ve in one of a few attempts

zwass

06/17/2022, 4:32 PM

If you do the

terraform destroy

, does it seem to destroy everything?

Rod Christiansen

06/17/2022, 4:32 PM

So it created resources not attached to my

vars

and now it can’t interact with them?

Rod Christiansen

06/17/2022, 4:32 PM

Sorry kind of new to Terraform

zwass

06/17/2022, 4:32 PM

Yeah no prob. I'm not our best Terraform expert, but I have some experience and I'll try to help.

zwass

06/17/2022, 4:33 PM

Mind sharing your vars file with me via DM?

Rod Christiansen

06/17/2022, 4:33 PM

Sure

zwass

06/17/2022, 4:40 PM

Let's see if we can get back to a clean slate... What's the output of your

terraform destroy

Rod Christiansen

06/17/2022, 4:40 PM

Let me run that

Rod Christiansen

06/17/2022, 4:45 PM

Destroy goes through normally

terraform destroy.txt

Rod Christiansen

06/17/2022, 4:45 PM

Destroy complete! Resources: 53 destroyed.

zwass

06/17/2022, 4:46 PM

Okay let's retry

terraform apply -var-file=prod.tfvars

Rod Christiansen

06/17/2022, 4:46 PM

Okay 👍

Rod Christiansen

06/17/2022, 4:47 PM

Do you know if these deprecation warnings matter?

zwass

06/17/2022, 4:48 PM

I think you'll be okay with those warnings, but I'm going to file an issue so that we can quiet them on our end.

Rod Christiansen

06/17/2022, 4:48 PM

K thanks, its running

Rod Christiansen

06/17/2022, 4:48 PM

usually takes 15 min

Rod Christiansen

06/17/2022, 4:55 PM

Yea same results

Rod Christiansen

06/17/2022, 4:55 PM

I think you might be right here, where the conflict comes from: https://osquery.slack.com/archives/C01DXJL16D8/p1655483457765809?thread_ts=1655481863.996759&cid=C01DXJL16D8

Rod Christiansen

06/17/2022, 4:56 PM

terraform apply.txt

Rod Christiansen

06/17/2022, 4:56 PM

Copy code

aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Creating...
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [10s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [20s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [30s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [40s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [50s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Creation complete after 52s [id=Z0153789AGKAV73DDKKN__3b82f3c76c9877eb0905c1f97d84050c.fleet.ecuad.ca._CNAME]
aws_acm_certificate_validation.dogfood_fleetdm_com: Creating...
aws_acm_certificate_validation.dogfood_fleetdm_com: Creation complete after 0s [id=2022-06-17 16:48:19.658 +0000 UTC]
╷
│ Warning: Argument is deprecated
│ 
│   with aws_s3_bucket.osquery-results,
│   on <http://firehose.tf|firehose.tf> line 7, in resource "aws_s3_bucket" "osquery-results":
│    7: resource "aws_s3_bucket" "osquery-results" { #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01  #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
│ 
│ Use the aws_s3_bucket_lifecycle_configuration resource instead
│ 
│ (and 8 more similar warnings elsewhere)
╵
╷
│ Error: failed creating IAM Role (fleetdm-role): EntityAlreadyExists: Role with name fleetdm-role already exists.
│ 	status code: 409, request id: d2dcb2ad-bfa4-49e9-8362-6a5fb48d5fdd
│ 
│   with aws_iam_role.main,
│   on <http://ecs-iam.tf|ecs-iam.tf> line 74, in resource "aws_iam_role" "main":
│   74: resource "aws_iam_role" "main" {
│ 
╵
╷
│ Error: error creating application Load Balancer: DuplicateLoadBalancerName: A load balancer with the same name 'fleetdm' exists, but with different settings
│ 	status code: 400, request id: e49a8a95-68ef-4ec6-9b04-5860b251dab2
│ 
│   with aws_alb.main,
│   on <http://ecs.tf|ecs.tf> line 14, in resource "aws_alb" "main":
│   14: resource "aws_alb" "main" {
│ 
╵
╷
│ Error: Creating CloudWatch Log Group failed: ResourceAlreadyExistsException: The specified log group already exists:  The CloudWatch Log Group 'fleetdm' already exists.
│ 
│   with aws_cloudwatch_log_group.backend,
│   on <http://ecs.tf|ecs.tf> line 114, in resource "aws_cloudwatch_log_group" "backend":
│  114: resource "aws_cloudwatch_log_group" "backend" { #tfsec:ignore:aws-cloudwatch-log-group-customer-key:exp:2022-07-01
│ 
╵
╷
│ Error: error creating S3 Bucket (ca-ecuad-queryops-fleet-osquery-results-archive-dev): BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
│ 	status code: 409, request id: 8QKQAWKMKFPG8V3T, host id: cGLTOq4ot2jEU2WRo6W7KOFHMqUUGEDol93rope13+e2btUrMvzII5SEHItCYT+99PGKR53PQcU=
│ 
│   with aws_s3_bucket.osquery-results,
│   on <http://firehose.tf|firehose.tf> line 7, in resource "aws_s3_bucket" "osquery-results":
│    7: resource "aws_s3_bucket" "osquery-results" { #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01  #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
│ 
╵
╷
│ Error: error creating S3 Bucket (ca-ecuad-queryops-fleet-osquery-status-archive-dev): BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
│ 	status code: 409, request id: 8QKS5J9XHNHQ6939, host id: SRTEVZrLHrsc9qt4Sii5tNnZtahUO60eEhaDZdW2KcaYjFKRqMWOdLX7trgWHh4kv8Guk7hmkpY=
│ 
│   with aws_s3_bucket.osquery-status,
│   on <http://firehose.tf|firehose.tf> line 41, in resource "aws_s3_bucket" "osquery-status":
│   41: resource "aws_s3_bucket" "osquery-status" { #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01 #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
│ 
╵
╷
│ Error: error creating Secrets Manager Secret: ResourceExistsException: The operation failed because the secret /fleet/database/password/master already exists.
│ 
│   with aws_secretsmanager_secret.database_password_secret,
│   on <http://rds.tf|rds.tf> line 7, in resource "aws_secretsmanager_secret" "database_password_secret":
│    7: resource "aws_secretsmanager_secret" "database_password_secret" { #tfsec:ignore:aws-ssm-secret-use-customer-key:exp:2022-07-01
│ 
╵
╷
│ Error: Error creating DB Parameter Group: DBParameterGroupAlreadyExists: Parameter group fleetdm-aurora-db-mysql-parameter-group already exists
│ 	status code: 400, request id: 58d16106-fe62-4906-b133-cccfabdb4d42
│ 
│   with aws_db_parameter_group.example_mysql,
│   on <http://rds.tf|rds.tf> line 107, in resource "aws_db_parameter_group" "example_mysql":
│  107: resource "aws_db_parameter_group" "example_mysql" {
│ 
╵
╷
│ Error: Error creating DB Cluster Parameter Group: DBParameterGroupAlreadyExists: Parameter group fleetdm-aurora-mysql-cluster-parameter-group already exists
│ 	status code: 400, request id: 96730c78-873b-4e42-bfca-3a34895dbcdd
│ 
│   with aws_rds_cluster_parameter_group.example_mysql,
│   on <http://rds.tf|rds.tf> line 113, in resource "aws_rds_cluster_parameter_group" "example_mysql":
│  113: resource "aws_rds_cluster_parameter_group" "example_mysql" {
│ 
╵
╷
│ Error: error creating S3 Bucket (osquery-carve-default): BucketAlreadyExists: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.
│ 	status code: 409, request id: BYZ1KZ1KNRCAJ0N6, host id: 5gY0n/SnlpvTsDu0zsCishhV3c4GUrirj3knAb7cTmc8MNCEcj50oKHkiIqkNAhv7bC+rcKeFdE=
│ 
│   with aws_s3_bucket.osquery-carve,
│   on <http://s3.tf|s3.tf> line 9, in resource "aws_s3_bucket" "osquery-carve":
│    9: resource "aws_s3_bucket" "osquery-carve" { #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01 #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
│ 
╵
╷
│ Error: Error creating DB Subnet Group: DBSubnetGroupAlreadyExists: The DB subnet group 'fleetdm-mysql-iam' already exists.
│ 	status code: 400, request id: 4ec6cead-a4a7-45a2-a46b-50af416b34fb
│ 
│   with module.aurora_mysql.aws_db_subnet_group.this[0],
│   on .terraform/modules/aurora_mysql/main.tf line 38, in resource "aws_db_subnet_group" "this":
│   38: resource "aws_db_subnet_group" "this" {
│ 
╵
╷
│ Error: Error creating DB Subnet Group: DBSubnetGroupAlreadyExists: The DB subnet group 'fleet-vpc' already exists.
│ 	status code: 400, request id: 6785406d-6e31-4016-b3ba-61df708bb2ec
│ 
│   with module.vpc.aws_db_subnet_group.database[0],
│   on .terraform/modules/vpc/main.tf line 458, in resource "aws_db_subnet_group" "database":
│  458: resource "aws_db_subnet_group" "database" {
│ 
╵
╷
│ Error: creating ElastiCache Subnet Group (fleet-vpc): CacheSubnetGroupAlreadyExists: Cache subnet group fleet-vpc already exists.
│ 	status code: 400, request id: 1d124b61-29b4-46e1-a372-fc2e5fcb4b77
│ 
│   with module.vpc.aws_elasticache_subnet_group.elasticache[0],
│   on .terraform/modules/vpc/main.tf line 542, in resource "aws_elasticache_subnet_group" "elasticache":
│  542: resource "aws_elasticache_subnet_group" "elasticache" {
│ 
╵
╷
│ Error: Error creating EIP: AddressLimitExceeded: The maximum number of addresses has been reached.
│ 	status code: 400, request id: a840976c-a6a1-45fe-8f1e-c155a458d6b1
│ 
│   with module.vpc.aws_eip.nat[0],
│   on .terraform/modules/vpc/main.tf line 1001, in resource "aws_eip" "nat":
│ 1001: resource "aws_eip" "nat" {
│ 
╵
Releasing state lock. This may take a few moments...
rod@RodChristiansen aws %

Rod Christiansen

06/17/2022, 4:57 PM

I prob need to manually delete all these resources that already exist

Rod Christiansen

06/17/2022, 4:58 PM

Wonder if just changing some variables would get it to run a cleaner run

Rod Christiansen

06/17/2022, 4:58 PM

New prefix

zwass

06/17/2022, 5:15 PM

Sounds like @Benjamin Edwards (our expert who wrote the configs) will be around soon. Let's wait for him to come and advise 🙂

Rod Christiansen

06/17/2022, 5:16 PM

Oh amazing!

Benjamin Edwards

06/17/2022, 5:27 PM

Hey Rod! Sorry you are having issues. I'm just waiting for my wife to get home and take over kiddo duty. Happy to help after that. I'll try to catch up in the meantime.

Rod Christiansen

06/17/2022, 5:42 PM

All good. Thanks for both of you

Benjamin Edwards

06/17/2022, 7:23 PM

@Rod Christiansen I think Zach had the right idea, seems like terraform might have run with the default values, which will work for some resources, but will conflict with others (like S3, since bucket names are globally unique across all AWS accounts). I think another thing to consider, especially if you are new to Terraform, is to omit the remote state steps in the guide. Remote state is great for bigger projects or projects where multiple users (or CI/CD systems) are attempting to alter infrastructure at the same time. To start with it might be easier to just get rid of this, something I should have considered in the guide, but unfortunately didn't. So first in

<http://main.tf|main.tf>

edit the terraform block to look like:

Copy code

terraform {
#  backend "s3" {
#    bucket         = "fleet-terraform-remote-state"
#    region         = "us-east-2"
#    key            = "fleet"
#    dynamodb_table = "fleet-terraform-state-lock"
#  }
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "3.63.0"
    }

    tls = {
      source = "hashicorp/tls"
      version = "3.3.0"
    }
  }
}

Then make sure you follow this step of the guide:

We’ll also need a
tfvars
file to make some environment-specific variable overrides. Create a file in the same directory named
prod.tfvars
and paste the contents (note the bucket names will have to be unique for your environment):

Copy code

fleet_backend_cpu         = 1024
fleet_backend_mem         = 4096 //software inventory requires 4GB
redis_instance            = "cache.t3.micro"
domain_fleetdm            = "<http://fleet.queryops.com|fleet.queryops.com>" // YOUR DOMAIN HERE
osquery_results_s3_bucket = "foo-results-bucket" // UNIQUE BUCKET NAME
osquery_status_s3_bucket  = "bar-status-bucket" // UNIQUE BUCKET NAME
file_carve_bucket         = "qux-file-carve-bucket" // UNIQUE BUCKET NAME

If you run into trouble, maybe we can screen share on zoom and get it sorted out?

Rod Christiansen

06/17/2022, 7:29 PM

Hi @Benjamin Edwards thanks so much, this helps. Thanks for offering to jump on a screen share. I’m going to go through my AWS and see if I can find and delete all the resources listed with errors

Rod Christiansen

06/17/2022, 7:31 PM

I’ll pipe back in here with results

Rod Christiansen

06/17/2022, 7:33 PM

One quick other question you might know @Benjamin Edwards — the NS records get re-created on every run? I’ll end up getting new NS records from AWS hosted zone hey? I got mine set by our network person, who is great, but slow to respond (ha) and I try to avoid asking him for help — I prob won’t be able to use existing ones set in our domain side?

Rod Christiansen

06/17/2022, 7:34 PM

Ah sorry these aren’t really Fleet questions but more AWS Qs 😬

Benjamin Edwards

06/17/2022, 8:01 PM

if you destroy them, they'll be recreated. Otherwise TF knows they already exist

👍 1

Rod Christiansen

06/20/2022, 5:58 AM

Got it now to the point that it doesn’t complain about any existing resources, but still getting a single error at the end of the build, looks like the tcp call is getting refused and its hanging up the phone

Copy code

╷
│ Error: error creating ElastiCache Replication Group (fleetdm-redis): waiting for completion: RequestError: send request failed
│ caused by: Post "<https://elasticache.ca-central-1.amazonaws.com/>": read tcp 192.168.1.137:55529->52.94.100.101:443: read: connection reset by peer
│ 
│   with aws_elasticache_replication_group.default,
│   on <http://redis.tf|redis.tf> line 13, in resource "aws_elasticache_replication_group" "default":
│   13: resource "aws_elasticache_replication_group" "default" {
│ 
╵

Rod Christiansen

06/20/2022, 5:58 AM

Full log:

latest build.txt

Rod Christiansen

06/20/2022, 11:43 PM

Hey! I got a fully clean run! \o/

🚀 1

💯 1

Rod Christiansen

06/21/2022, 2:32 AM

Sorry me calling again ☎️

Rod Christiansen

06/21/2022, 2:33 AM

Any quick idea of why the cluster tasks fail? It’s reaching a limit on docker.com ?

zwass

06/21/2022, 3:57 AM

Ah, yes, this is an annoying thing with Docker's rate limiting. Now that the migration completed, do the other tasks successfully start up?

Rod Christiansen

06/21/2022, 4:02 AM

Doesn't look like it

Rod Christiansen

06/21/2022, 4:02 AM

Rod Christiansen

06/21/2022, 4:02 AM

I was just surprised to see docker in the mix

Rod Christiansen

06/21/2022, 4:02 AM

I need to get an account?

zwass

06/21/2022, 4:03 AM

No you don't need an account -- Docker is just where the container images are hosted.

👍 1

zwass

06/21/2022, 4:04 AM

Can you set the target tasks on the service down to 0? Then try running the migrate again?

Rod Christiansen

06/21/2022, 4:08 AM

Will do. Thanks.

zwass

06/21/2022, 4:10 AM

You might still get rate limited for a while -- sorry about that, Docker Hub can really be a pain.

Rod Christiansen

06/21/2022, 4:16 AM

Got it. As long it's not a config error on my end. Just try until it works (not too often 😅)

Benjamin Edwards

06/21/2022, 2:49 PM

Zach is right. Migration task needs to execute. Scaling down the fleet service to zero for 15-30 minutes should clear your rate limit. While the service is scaled down attempt the migration task, which is the ecs run-task command. Once that runs successfully, you can scale the service back up.

✅ 1

Rod Christiansen

06/21/2022, 6:27 PM

Thank you so much to both of you for bearing with my basic questions! We have lift off!

🚀 2

💯 1

zwass

06/21/2022, 6:28 PM

Woooo hooo!

Rod Christiansen

06/21/2022, 6:29 PM

Very exciting. Just gotta get my network guy to set those NameServers now ha

Benjamin Edwards

06/21/2022, 6:29 PM

Rod feel free to hit me up with feedback about the process. Happy to make updates to the blog or modify readme. Your perspective, someone relatively new to TF, was something I was clearly missing when I drafted the first version.

Rod Christiansen

06/21/2022, 7:12 PM

Thanks Benjamin, looking back at your guide now it does follow pretty closely to the steps. I did write an internal doc for documentation of what steps I took and its fairly similar. See if anything here would help:

Fleet - Build Steps.txt

💯 1

Rod Christiansen

06/22/2022, 5:59 PM

Couple things that came to mind that you could add to blog post: • warning about not running multiple times like I did, I setup from scratch a few times ‘to start fresh’ but hadn’t run a

terraform apply -destroy

• warning about unique naming for the buckets/dynamo since they are global • warning on the migration step about docker limit like I hit and how to retry • the

hashicorp/aws

version needs to match •

18 Views

Open in Slack

Previous Next