Title
#fleet
r

Rod Christiansen

06/17/2022, 4:04 PM
Been following the read me but can’t quite get there
zwass

zwass

06/17/2022, 4:06 PM
Where are you stuck? Any error messages?
r

Rod Christiansen

06/17/2022, 4:11 PM
Hey Zach, thanks for jumping in
4:11 PM
Let me get you a log
4:21 PM
So when I run
terraform show
I get:
4:22 PM
If I re-run
terraform apply -var-file=prod.tfvars
I get a lot of
4:23 PM
│ Error: failed creating IAM Role (fleetdm-role): EntityAlreadyExists: Role with name fleetdm-role already exists.
4:23 PM
It’s trying to re-create some resources, is there a way to skip it?
4:27 PM
Still happens after running a
terraform destroy
as well
4:30 PM
Essentially, I’m not sure if I have everything up
zwass

zwass

06/17/2022, 4:30 PM
Hmm, sounds like you first ran it without the vars file, then tried again with the vars file?
r

Rod Christiansen

06/17/2022, 4:31 PM
Mmmm that’s a possibility? I might’ve in one of a few attempts
zwass

zwass

06/17/2022, 4:32 PM
If you do the
terraform destroy
, does it seem to destroy everything?
r

Rod Christiansen

06/17/2022, 4:32 PM
So it created resources not attached to my
vars
and now it can’t interact with them?
4:32 PM
Sorry kind of new to Terraform
zwass

zwass

06/17/2022, 4:32 PM
Yeah no prob. I'm not our best Terraform expert, but I have some experience and I'll try to help.
4:33 PM
Mind sharing your vars file with me via DM?
r

Rod Christiansen

06/17/2022, 4:33 PM
Sure
zwass

zwass

06/17/2022, 4:40 PM
Let's see if we can get back to a clean slate... What's the output of your
terraform destroy
?
r

Rod Christiansen

06/17/2022, 4:40 PM
Let me run that
4:45 PM
Destroy goes through normally
4:45 PM
Destroy complete! Resources: 53 destroyed.
zwass

zwass

06/17/2022, 4:46 PM
Okay let's retry
terraform apply -var-file=prod.tfvars
r

Rod Christiansen

06/17/2022, 4:46 PM
Okay 👍
4:47 PM
Do you know if these deprecation warnings matter?
zwass

zwass

06/17/2022, 4:48 PM
I think you'll be okay with those warnings, but I'm going to file an issue so that we can quiet them on our end.
r

Rod Christiansen

06/17/2022, 4:48 PM
K thanks, its running
4:48 PM
usually takes 15 min
4:55 PM
Yea same results
4:56 PM
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Creating...
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [10s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [20s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [30s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [40s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [50s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Creation complete after 52s [id=Z0153789AGKAV73DDKKN__3b82f3c76c9877eb0905c1f97d84050c.fleet.ecuad.ca._CNAME]
aws_acm_certificate_validation.dogfood_fleetdm_com: Creating...
aws_acm_certificate_validation.dogfood_fleetdm_com: Creation complete after 0s [id=2022-06-17 16:48:19.658 +0000 UTC]
╷
│ Warning: Argument is deprecated
│ 
│   with aws_s3_bucket.osquery-results,
│   on <http://firehose.tf|firehose.tf> line 7, in resource "aws_s3_bucket" "osquery-results":
│    7: resource "aws_s3_bucket" "osquery-results" { #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01  #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
│ 
│ Use the aws_s3_bucket_lifecycle_configuration resource instead
│ 
│ (and 8 more similar warnings elsewhere)
╵
╷
│ Error: failed creating IAM Role (fleetdm-role): EntityAlreadyExists: Role with name fleetdm-role already exists.
│ 	status code: 409, request id: d2dcb2ad-bfa4-49e9-8362-6a5fb48d5fdd
│ 
│   with aws_iam_role.main,
│   on <http://ecs-iam.tf|ecs-iam.tf> line 74, in resource "aws_iam_role" "main":
│   74: resource "aws_iam_role" "main" {
│ 
╵
╷
│ Error: error creating application Load Balancer: DuplicateLoadBalancerName: A load balancer with the same name 'fleetdm' exists, but with different settings
│ 	status code: 400, request id: e49a8a95-68ef-4ec6-9b04-5860b251dab2
│ 
│   with aws_alb.main,
│   on <http://ecs.tf|ecs.tf> line 14, in resource "aws_alb" "main":
│   14: resource "aws_alb" "main" {
│ 
╵
╷
│ Error: Creating CloudWatch Log Group failed: ResourceAlreadyExistsException: The specified log group already exists:  The CloudWatch Log Group 'fleetdm' already exists.
│ 
│   with aws_cloudwatch_log_group.backend,
│   on <http://ecs.tf|ecs.tf> line 114, in resource "aws_cloudwatch_log_group" "backend":
│  114: resource "aws_cloudwatch_log_group" "backend" { #tfsec:ignore:aws-cloudwatch-log-group-customer-key:exp:2022-07-01
│ 
╵
╷
│ Error: error creating S3 Bucket (ca-ecuad-queryops-fleet-osquery-results-archive-dev): BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
│ 	status code: 409, request id: 8QKQAWKMKFPG8V3T, host id: cGLTOq4ot2jEU2WRo6W7KOFHMqUUGEDol93rope13+e2btUrMvzII5SEHItCYT+99PGKR53PQcU=
│ 
│   with aws_s3_bucket.osquery-results,
│   on <http://firehose.tf|firehose.tf> line 7, in resource "aws_s3_bucket" "osquery-results":
│    7: resource "aws_s3_bucket" "osquery-results" { #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01  #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
│ 
╵
╷
│ Error: error creating S3 Bucket (ca-ecuad-queryops-fleet-osquery-status-archive-dev): BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
│ 	status code: 409, request id: 8QKS5J9XHNHQ6939, host id: SRTEVZrLHrsc9qt4Sii5tNnZtahUO60eEhaDZdW2KcaYjFKRqMWOdLX7trgWHh4kv8Guk7hmkpY=
│ 
│   with aws_s3_bucket.osquery-status,
│   on <http://firehose.tf|firehose.tf> line 41, in resource "aws_s3_bucket" "osquery-status":
│   41: resource "aws_s3_bucket" "osquery-status" { #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01 #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
│ 
╵
╷
│ Error: error creating Secrets Manager Secret: ResourceExistsException: The operation failed because the secret /fleet/database/password/master already exists.
│ 
│   with aws_secretsmanager_secret.database_password_secret,
│   on <http://rds.tf|rds.tf> line 7, in resource "aws_secretsmanager_secret" "database_password_secret":
│    7: resource "aws_secretsmanager_secret" "database_password_secret" { #tfsec:ignore:aws-ssm-secret-use-customer-key:exp:2022-07-01
│ 
╵
╷
│ Error: Error creating DB Parameter Group: DBParameterGroupAlreadyExists: Parameter group fleetdm-aurora-db-mysql-parameter-group already exists
│ 	status code: 400, request id: 58d16106-fe62-4906-b133-cccfabdb4d42
│ 
│   with aws_db_parameter_group.example_mysql,
│   on <http://rds.tf|rds.tf> line 107, in resource "aws_db_parameter_group" "example_mysql":
│  107: resource "aws_db_parameter_group" "example_mysql" {
│ 
╵
╷
│ Error: Error creating DB Cluster Parameter Group: DBParameterGroupAlreadyExists: Parameter group fleetdm-aurora-mysql-cluster-parameter-group already exists
│ 	status code: 400, request id: 96730c78-873b-4e42-bfca-3a34895dbcdd
│ 
│   with aws_rds_cluster_parameter_group.example_mysql,
│   on <http://rds.tf|rds.tf> line 113, in resource "aws_rds_cluster_parameter_group" "example_mysql":
│  113: resource "aws_rds_cluster_parameter_group" "example_mysql" {
│ 
╵
╷
│ Error: error creating S3 Bucket (osquery-carve-default): BucketAlreadyExists: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.
│ 	status code: 409, request id: BYZ1KZ1KNRCAJ0N6, host id: 5gY0n/SnlpvTsDu0zsCishhV3c4GUrirj3knAb7cTmc8MNCEcj50oKHkiIqkNAhv7bC+rcKeFdE=
│ 
│   with aws_s3_bucket.osquery-carve,
│   on <http://s3.tf|s3.tf> line 9, in resource "aws_s3_bucket" "osquery-carve":
│    9: resource "aws_s3_bucket" "osquery-carve" { #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01 #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
│ 
╵
╷
│ Error: Error creating DB Subnet Group: DBSubnetGroupAlreadyExists: The DB subnet group 'fleetdm-mysql-iam' already exists.
│ 	status code: 400, request id: 4ec6cead-a4a7-45a2-a46b-50af416b34fb
│ 
│   with module.aurora_mysql.aws_db_subnet_group.this[0],
│   on .terraform/modules/aurora_mysql/main.tf line 38, in resource "aws_db_subnet_group" "this":
│   38: resource "aws_db_subnet_group" "this" {
│ 
╵
╷
│ Error: Error creating DB Subnet Group: DBSubnetGroupAlreadyExists: The DB subnet group 'fleet-vpc' already exists.
│ 	status code: 400, request id: 6785406d-6e31-4016-b3ba-61df708bb2ec
│ 
│   with module.vpc.aws_db_subnet_group.database[0],
│   on .terraform/modules/vpc/main.tf line 458, in resource "aws_db_subnet_group" "database":
│  458: resource "aws_db_subnet_group" "database" {
│ 
╵
╷
│ Error: creating ElastiCache Subnet Group (fleet-vpc): CacheSubnetGroupAlreadyExists: Cache subnet group fleet-vpc already exists.
│ 	status code: 400, request id: 1d124b61-29b4-46e1-a372-fc2e5fcb4b77
│ 
│   with module.vpc.aws_elasticache_subnet_group.elasticache[0],
│   on .terraform/modules/vpc/main.tf line 542, in resource "aws_elasticache_subnet_group" "elasticache":
│  542: resource "aws_elasticache_subnet_group" "elasticache" {
│ 
╵
╷
│ Error: Error creating EIP: AddressLimitExceeded: The maximum number of addresses has been reached.
│ 	status code: 400, request id: a840976c-a6a1-45fe-8f1e-c155a458d6b1
│ 
│   with module.vpc.aws_eip.nat[0],
│   on .terraform/modules/vpc/main.tf line 1001, in resource "aws_eip" "nat":
│ 1001: resource "aws_eip" "nat" {
│ 
╵
Releasing state lock. This may take a few moments...
rod@RodChristiansen aws %
4:57 PM
I prob need to manually delete all these resources that already exist
4:58 PM
Wonder if just changing some variables would get it to run a cleaner run
4:58 PM
New prefix
zwass

zwass

06/17/2022, 5:15 PM
Sounds like @Benjamin Edwards (our expert who wrote the configs) will be around soon. Let's wait for him to come and advise 🙂
r

Rod Christiansen

06/17/2022, 5:16 PM
Oh amazing!
Benjamin Edwards

Benjamin Edwards

06/17/2022, 5:27 PM
Hey Rod! Sorry you are having issues. I'm just waiting for my wife to get home and take over kiddo duty. Happy to help after that. I'll try to catch up in the meantime.
r

Rod Christiansen

06/17/2022, 5:42 PM
All good. Thanks for both of you
Benjamin Edwards

Benjamin Edwards

06/17/2022, 7:23 PM
@Rod Christiansen I think Zach had the right idea, seems like terraform might have run with the default values, which will work for some resources, but will conflict with others (like S3, since bucket names are globally unique across all AWS accounts). I think another thing to consider, especially if you are new to Terraform, is to omit the remote state steps in the guide. Remote state is great for bigger projects or projects where multiple users (or CI/CD systems) are attempting to alter infrastructure at the same time. To start with it might be easier to just get rid of this, something I should have considered in the guide, but unfortunately didn't. So first in
<http://main.tf|main.tf>
edit the terraform block to look like:
terraform {
#  backend "s3" {
#    bucket         = "fleet-terraform-remote-state"
#    region         = "us-east-2"
#    key            = "fleet"
#    dynamodb_table = "fleet-terraform-state-lock"
#  }
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "3.63.0"
    }

    tls = {
      source = "hashicorp/tls"
      version = "3.3.0"
    }
  }
}
Then make sure you follow this step of the guide:
We’ll also need a
tfvars
file to make some environment-specific variable overrides. Create a file in the same directory named
prod.tfvars
and paste the contents (note the bucket names will have to be unique for your environment):
fleet_backend_cpu         = 1024
fleet_backend_mem         = 4096 //software inventory requires 4GB
redis_instance            = "cache.t3.micro"
domain_fleetdm            = "<http://fleet.queryops.com|fleet.queryops.com>" // YOUR DOMAIN HERE
osquery_results_s3_bucket = "foo-results-bucket" // UNIQUE BUCKET NAME
osquery_status_s3_bucket  = "bar-status-bucket" // UNIQUE BUCKET NAME
file_carve_bucket         = "qux-file-carve-bucket" // UNIQUE BUCKET NAME
If you run into trouble, maybe we can screen share on zoom and get it sorted out?
r

Rod Christiansen

06/17/2022, 7:29 PM
Hi @Benjamin Edwards thanks so much, this helps. Thanks for offering to jump on a screen share. I’m going to go through my AWS and see if I can find and delete all the resources listed with errors
7:31 PM
I’ll pipe back in here with results
7:33 PM
One quick other question you might know @Benjamin Edwards — the NS records get re-created on every run? I’ll end up getting new NS records from AWS hosted zone hey? I got mine set by our network person, who is great, but slow to respond (ha) and I try to avoid asking him for help — I prob won’t be able to use existing ones set in our domain side?
7:34 PM
Ah sorry these aren’t really Fleet questions but more AWS Qs 😬
Benjamin Edwards

Benjamin Edwards

06/17/2022, 8:01 PM
if you destroy them, they'll be recreated. Otherwise TF knows they already exist
r

Rod Christiansen

06/20/2022, 5:58 AM
Got it now to the point that it doesn’t complain about any existing resources, but still getting a single error at the end of the build, looks like the tcp call is getting refused and its hanging up the phone
╷
│ Error: error creating ElastiCache Replication Group (fleetdm-redis): waiting for completion: RequestError: send request failed
│ caused by: Post "<https://elasticache.ca-central-1.amazonaws.com/>": read tcp 192.168.1.137:55529->52.94.100.101:443: read: connection reset by peer
│ 
│   with aws_elasticache_replication_group.default,
│   on <http://redis.tf|redis.tf> line 13, in resource "aws_elasticache_replication_group" "default":
│   13: resource "aws_elasticache_replication_group" "default" {
│ 
╵
5:58 AM
11:43 PM
Hey! I got a fully clean run! \o/
2:32 AM
Sorry me calling again ☎️
2:33 AM
Any quick idea of why the cluster tasks fail? It’s reaching a limit on docker.com ?
zwass

zwass

06/21/2022, 3:57 AM
Ah, yes, this is an annoying thing with Docker's rate limiting. Now that the migration completed, do the other tasks successfully start up?
r

Rod Christiansen

06/21/2022, 4:02 AM
Doesn't look like it
4:02 AM
4:02 AM
I was just surprised to see docker in the mix
4:02 AM
I need to get an account?
zwass

zwass

06/21/2022, 4:03 AM
No you don't need an account -- Docker is just where the container images are hosted.
4:04 AM
Can you set the target tasks on the service down to 0? Then try running the migrate again?
r

Rod Christiansen

06/21/2022, 4:08 AM
Will do. Thanks.
zwass

zwass

06/21/2022, 4:10 AM
You might still get rate limited for a while -- sorry about that, Docker Hub can really be a pain.
r

Rod Christiansen

06/21/2022, 4:16 AM
Got it. As long it's not a config error on my end. Just try until it works (not too often 😅)
Benjamin Edwards

Benjamin Edwards

06/21/2022, 2:49 PM
Zach is right. Migration task needs to execute. Scaling down the fleet service to zero for 15-30 minutes should clear your rate limit. While the service is scaled down attempt the migration task, which is the ecs run-task command. Once that runs successfully, you can scale the service back up.
r

Rod Christiansen

06/21/2022, 6:27 PM
Thank you so much to both of you for bearing with my basic questions! We have lift off!
zwass

zwass

06/21/2022, 6:28 PM
Woooo hooo!
r

Rod Christiansen

06/21/2022, 6:29 PM
Very exciting. Just gotta get my network guy to set those NameServers now ha
Benjamin Edwards

Benjamin Edwards

06/21/2022, 6:29 PM
Rod feel free to hit me up with feedback about the process. Happy to make updates to the blog or modify readme. Your perspective, someone relatively new to TF, was something I was clearly missing when I drafted the first version.
r

Rod Christiansen

06/21/2022, 7:12 PM
Thanks Benjamin, looking back at your guide now it does follow pretty closely to the steps. I did write an internal doc for documentation of what steps I took and its fairly similar. See if anything here would help:
5:59 PM
Couple things that came to mind that you could add to blog post: • warning about not running multiple times like I did, I setup from scratch a few times ‘to start fresh’ but hadn’t run a
terraform apply -destroy
• warning about unique naming for the buckets/dynamo since they are global • warning on the migration step about docker limit like I hit and how to retry • the
hashicorp/aws
version needs to match •