Been following the read me but can’t quite get the...
# fleet
r
Been following the read me but can’t quite get there
z
Where are you stuck? Any error messages?
r
Hey Zach, thanks for jumping in
Let me get you a log
So when I run
terraform show
I get:
If I re-run
terraform apply -var-file=prod.tfvars
I get a lot of
Copy code
│ Error: failed creating IAM Role (fleetdm-role): EntityAlreadyExists: Role with name fleetdm-role already exists.
It’s trying to re-create some resources, is there a way to skip it?
Still happens after running a
terraform destroy
as well
Essentially, I’m not sure if I have everything up
z
Hmm, sounds like you first ran it without the vars file, then tried again with the vars file?
r
Mmmm that’s a possibility? I might’ve in one of a few attempts
z
If you do the
terraform destroy
, does it seem to destroy everything?
r
So it created resources not attached to my
vars
and now it can’t interact with them?
Sorry kind of new to Terraform
z
Yeah no prob. I'm not our best Terraform expert, but I have some experience and I'll try to help.
Mind sharing your vars file with me via DM?
r
Sure
z
Let's see if we can get back to a clean slate... What's the output of your
terraform destroy
?
r
Let me run that
Destroy goes through normally
Destroy complete! Resources: 53 destroyed.
z
Okay let's retry
terraform apply -var-file=prod.tfvars
r
Okay 👍
Do you know if these deprecation warnings matter?
z
I think you'll be okay with those warnings, but I'm going to file an issue so that we can quiet them on our end.
r
K thanks, its running
usually takes 15 min
Yea same results
Copy code
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Creating...
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [10s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [20s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [30s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [40s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [50s elapsed]
aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Creation complete after 52s [id=Z0153789AGKAV73DDKKN__3b82f3c76c9877eb0905c1f97d84050c.fleet.ecuad.ca._CNAME]
aws_acm_certificate_validation.dogfood_fleetdm_com: Creating...
aws_acm_certificate_validation.dogfood_fleetdm_com: Creation complete after 0s [id=2022-06-17 16:48:19.658 +0000 UTC]
╷
│ Warning: Argument is deprecated
│ 
│   with aws_s3_bucket.osquery-results,
│   on <http://firehose.tf|firehose.tf> line 7, in resource "aws_s3_bucket" "osquery-results":
│    7: resource "aws_s3_bucket" "osquery-results" { #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01  #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
│ 
│ Use the aws_s3_bucket_lifecycle_configuration resource instead
│ 
│ (and 8 more similar warnings elsewhere)
╵
╷
│ Error: failed creating IAM Role (fleetdm-role): EntityAlreadyExists: Role with name fleetdm-role already exists.
│ 	status code: 409, request id: d2dcb2ad-bfa4-49e9-8362-6a5fb48d5fdd
│ 
│   with aws_iam_role.main,
│   on <http://ecs-iam.tf|ecs-iam.tf> line 74, in resource "aws_iam_role" "main":
│   74: resource "aws_iam_role" "main" {
│ 
╵
╷
│ Error: error creating application Load Balancer: DuplicateLoadBalancerName: A load balancer with the same name 'fleetdm' exists, but with different settings
│ 	status code: 400, request id: e49a8a95-68ef-4ec6-9b04-5860b251dab2
│ 
│   with aws_alb.main,
│   on <http://ecs.tf|ecs.tf> line 14, in resource "aws_alb" "main":
│   14: resource "aws_alb" "main" {
│ 
╵
╷
│ Error: Creating CloudWatch Log Group failed: ResourceAlreadyExistsException: The specified log group already exists:  The CloudWatch Log Group 'fleetdm' already exists.
│ 
│   with aws_cloudwatch_log_group.backend,
│   on <http://ecs.tf|ecs.tf> line 114, in resource "aws_cloudwatch_log_group" "backend":
│  114: resource "aws_cloudwatch_log_group" "backend" { #tfsec:ignore:aws-cloudwatch-log-group-customer-key:exp:2022-07-01
│ 
╵
╷
│ Error: error creating S3 Bucket (ca-ecuad-queryops-fleet-osquery-results-archive-dev): BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
│ 	status code: 409, request id: 8QKQAWKMKFPG8V3T, host id: cGLTOq4ot2jEU2WRo6W7KOFHMqUUGEDol93rope13+e2btUrMvzII5SEHItCYT+99PGKR53PQcU=
│ 
│   with aws_s3_bucket.osquery-results,
│   on <http://firehose.tf|firehose.tf> line 7, in resource "aws_s3_bucket" "osquery-results":
│    7: resource "aws_s3_bucket" "osquery-results" { #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01  #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
│ 
╵
╷
│ Error: error creating S3 Bucket (ca-ecuad-queryops-fleet-osquery-status-archive-dev): BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
│ 	status code: 409, request id: 8QKS5J9XHNHQ6939, host id: SRTEVZrLHrsc9qt4Sii5tNnZtahUO60eEhaDZdW2KcaYjFKRqMWOdLX7trgWHh4kv8Guk7hmkpY=
│ 
│   with aws_s3_bucket.osquery-status,
│   on <http://firehose.tf|firehose.tf> line 41, in resource "aws_s3_bucket" "osquery-status":
│   41: resource "aws_s3_bucket" "osquery-status" { #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01 #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
│ 
╵
╷
│ Error: error creating Secrets Manager Secret: ResourceExistsException: The operation failed because the secret /fleet/database/password/master already exists.
│ 
│   with aws_secretsmanager_secret.database_password_secret,
│   on <http://rds.tf|rds.tf> line 7, in resource "aws_secretsmanager_secret" "database_password_secret":
│    7: resource "aws_secretsmanager_secret" "database_password_secret" { #tfsec:ignore:aws-ssm-secret-use-customer-key:exp:2022-07-01
│ 
╵
╷
│ Error: Error creating DB Parameter Group: DBParameterGroupAlreadyExists: Parameter group fleetdm-aurora-db-mysql-parameter-group already exists
│ 	status code: 400, request id: 58d16106-fe62-4906-b133-cccfabdb4d42
│ 
│   with aws_db_parameter_group.example_mysql,
│   on <http://rds.tf|rds.tf> line 107, in resource "aws_db_parameter_group" "example_mysql":
│  107: resource "aws_db_parameter_group" "example_mysql" {
│ 
╵
╷
│ Error: Error creating DB Cluster Parameter Group: DBParameterGroupAlreadyExists: Parameter group fleetdm-aurora-mysql-cluster-parameter-group already exists
│ 	status code: 400, request id: 96730c78-873b-4e42-bfca-3a34895dbcdd
│ 
│   with aws_rds_cluster_parameter_group.example_mysql,
│   on <http://rds.tf|rds.tf> line 113, in resource "aws_rds_cluster_parameter_group" "example_mysql":
│  113: resource "aws_rds_cluster_parameter_group" "example_mysql" {
│ 
╵
╷
│ Error: error creating S3 Bucket (osquery-carve-default): BucketAlreadyExists: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.
│ 	status code: 409, request id: BYZ1KZ1KNRCAJ0N6, host id: 5gY0n/SnlpvTsDu0zsCishhV3c4GUrirj3knAb7cTmc8MNCEcj50oKHkiIqkNAhv7bC+rcKeFdE=
│ 
│   with aws_s3_bucket.osquery-carve,
│   on <http://s3.tf|s3.tf> line 9, in resource "aws_s3_bucket" "osquery-carve":
│    9: resource "aws_s3_bucket" "osquery-carve" { #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01 #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
│ 
╵
╷
│ Error: Error creating DB Subnet Group: DBSubnetGroupAlreadyExists: The DB subnet group 'fleetdm-mysql-iam' already exists.
│ 	status code: 400, request id: 4ec6cead-a4a7-45a2-a46b-50af416b34fb
│ 
│   with module.aurora_mysql.aws_db_subnet_group.this[0],
│   on .terraform/modules/aurora_mysql/main.tf line 38, in resource "aws_db_subnet_group" "this":
│   38: resource "aws_db_subnet_group" "this" {
│ 
╵
╷
│ Error: Error creating DB Subnet Group: DBSubnetGroupAlreadyExists: The DB subnet group 'fleet-vpc' already exists.
│ 	status code: 400, request id: 6785406d-6e31-4016-b3ba-61df708bb2ec
│ 
│   with module.vpc.aws_db_subnet_group.database[0],
│   on .terraform/modules/vpc/main.tf line 458, in resource "aws_db_subnet_group" "database":
│  458: resource "aws_db_subnet_group" "database" {
│ 
╵
╷
│ Error: creating ElastiCache Subnet Group (fleet-vpc): CacheSubnetGroupAlreadyExists: Cache subnet group fleet-vpc already exists.
│ 	status code: 400, request id: 1d124b61-29b4-46e1-a372-fc2e5fcb4b77
│ 
│   with module.vpc.aws_elasticache_subnet_group.elasticache[0],
│   on .terraform/modules/vpc/main.tf line 542, in resource "aws_elasticache_subnet_group" "elasticache":
│  542: resource "aws_elasticache_subnet_group" "elasticache" {
│ 
╵
╷
│ Error: Error creating EIP: AddressLimitExceeded: The maximum number of addresses has been reached.
│ 	status code: 400, request id: a840976c-a6a1-45fe-8f1e-c155a458d6b1
│ 
│   with module.vpc.aws_eip.nat[0],
│   on .terraform/modules/vpc/main.tf line 1001, in resource "aws_eip" "nat":
│ 1001: resource "aws_eip" "nat" {
│ 
╵
Releasing state lock. This may take a few moments...
rod@RodChristiansen aws %
I prob need to manually delete all these resources that already exist
Wonder if just changing some variables would get it to run a cleaner run
New prefix
z
Sounds like @Benjamin Edwards (our expert who wrote the configs) will be around soon. Let's wait for him to come and advise 🙂
r
Oh amazing!
b
Hey Rod! Sorry you are having issues. I'm just waiting for my wife to get home and take over kiddo duty. Happy to help after that. I'll try to catch up in the meantime.
r
All good. Thanks for both of you
b
@Rod Christiansen I think Zach had the right idea, seems like terraform might have run with the default values, which will work for some resources, but will conflict with others (like S3, since bucket names are globally unique across all AWS accounts). I think another thing to consider, especially if you are new to Terraform, is to omit the remote state steps in the guide. Remote state is great for bigger projects or projects where multiple users (or CI/CD systems) are attempting to alter infrastructure at the same time. To start with it might be easier to just get rid of this, something I should have considered in the guide, but unfortunately didn't. So first in
<http://main.tf|main.tf>
edit the terraform block to look like:
Copy code
terraform {
#  backend "s3" {
#    bucket         = "fleet-terraform-remote-state"
#    region         = "us-east-2"
#    key            = "fleet"
#    dynamodb_table = "fleet-terraform-state-lock"
#  }
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "3.63.0"
    }

    tls = {
      source = "hashicorp/tls"
      version = "3.3.0"
    }
  }
}
Then make sure you follow this step of the guide:
We’ll also need a
tfvars
file to make some environment-specific variable overrides. Create a file in the same directory named
prod.tfvars
and paste the contents (note the bucket names will have to be unique for your environment):
Copy code
fleet_backend_cpu         = 1024
fleet_backend_mem         = 4096 //software inventory requires 4GB
redis_instance            = "cache.t3.micro"
domain_fleetdm            = "<http://fleet.queryops.com|fleet.queryops.com>" // YOUR DOMAIN HERE
osquery_results_s3_bucket = "foo-results-bucket" // UNIQUE BUCKET NAME
osquery_status_s3_bucket  = "bar-status-bucket" // UNIQUE BUCKET NAME
file_carve_bucket         = "qux-file-carve-bucket" // UNIQUE BUCKET NAME
If you run into trouble, maybe we can screen share on zoom and get it sorted out?
r
Hi @Benjamin Edwards thanks so much, this helps. Thanks for offering to jump on a screen share. I’m going to go through my AWS and see if I can find and delete all the resources listed with errors
I’ll pipe back in here with results
One quick other question you might know @Benjamin Edwards — the NS records get re-created on every run? I’ll end up getting new NS records from AWS hosted zone hey? I got mine set by our network person, who is great, but slow to respond (ha) and I try to avoid asking him for help — I prob won’t be able to use existing ones set in our domain side?
Ah sorry these aren’t really Fleet questions but more AWS Qs 😬
b
if you destroy them, they'll be recreated. Otherwise TF knows they already exist
👍 1
r
Got it now to the point that it doesn’t complain about any existing resources, but still getting a single error at the end of the build, looks like the tcp call is getting refused and its hanging up the phone
Copy code
╷
│ Error: error creating ElastiCache Replication Group (fleetdm-redis): waiting for completion: RequestError: send request failed
│ caused by: Post "<https://elasticache.ca-central-1.amazonaws.com/>": read tcp 192.168.1.137:55529->52.94.100.101:443: read: connection reset by peer
│ 
│   with aws_elasticache_replication_group.default,
│   on <http://redis.tf|redis.tf> line 13, in resource "aws_elasticache_replication_group" "default":
│   13: resource "aws_elasticache_replication_group" "default" {
│ 
╵
Hey! I got a fully clean run! \o/
🚀 1
💯 1
Sorry me calling again ☎️
Any quick idea of why the cluster tasks fail? It’s reaching a limit on docker.com ?
z
Ah, yes, this is an annoying thing with Docker's rate limiting. Now that the migration completed, do the other tasks successfully start up?
r
Doesn't look like it
I was just surprised to see docker in the mix
I need to get an account?
z
No you don't need an account -- Docker is just where the container images are hosted.
👍 1
Can you set the target tasks on the service down to 0? Then try running the migrate again?
r
Will do. Thanks.
z
You might still get rate limited for a while -- sorry about that, Docker Hub can really be a pain.
r
Got it. As long it's not a config error on my end. Just try until it works (not too often 😅)
b
Zach is right. Migration task needs to execute. Scaling down the fleet service to zero for 15-30 minutes should clear your rate limit. While the service is scaled down attempt the migration task, which is the ecs run-task command. Once that runs successfully, you can scale the service back up.
1
r
Thank you so much to both of you for bearing with my basic questions! We have lift off!
🚀 2
💯 1
z
Woooo hooo!
r
Very exciting. Just gotta get my network guy to set those NameServers now ha
b
Rod feel free to hit me up with feedback about the process. Happy to make updates to the blog or modify readme. Your perspective, someone relatively new to TF, was something I was clearly missing when I drafted the first version.
r
Thanks Benjamin, looking back at your guide now it does follow pretty closely to the steps. I did write an internal doc for documentation of what steps I took and its fairly similar. See if anything here would help:
💯 1
Couple things that came to mind that you could add to blog post: • warning about not running multiple times like I did, I setup from scratch a few times ‘to start fresh’ but hadn’t run a
terraform apply -destroy
• warning about unique naming for the buckets/dynamo since they are global • warning on the migration step about docker limit like I hit and how to retry • the
hashicorp/aws
version needs to match •