Morning everyone I m trying to stream my ECS Fargate Fleet s osquery #fleet

Morning everyone, I'm trying to stream my ECS Farg...

05/24/2021, 11:43 AM

Morning everyone, I'm trying to stream my ECS Fargate Fleet's osquery logs to Firehose, but my containers are failing to initialize with the error message:

Copy code

Error initializing service: initializing osquery logging: create firehose status logger: create Firehose writer: describe stream arn:aws:firehose:xx:xx:deliverystream/test_stream: NoCredentialProviders: no valid providers in chain. Deprecated.

My assumption is that the ECS task will first assume the role provided in

FLEET_FIREHOSE_STS_ASSUME_ROLE_ARN

using its default credential, which would be the ECS task role that it was configured to run with. Once assumed, the task will then be able to call

DescribeDeliveryStream

using the newly granted role. However, based on the code here, my guess is that the task could not find the default credentials(?). I'd like to avoid passing the access keys to the task, could anyone please take a look and see where I went wrong? These are the environment variables/actions that I have configured so far: • `FLEET_OSQUERY_RESULT_LOG_PLUGIN`:

firehose

• `FLEET_OSQUERY_STATUS_LOG_PLUGIN`:

firehose

•

FLEET_FIREHOSE_REGION

xx

•

FLEET_FIREHOSE_RESULT_STREAM

arn:aws:firehose:xx:xx:deliverystream/test_stream

•

FLEET_FIREHOSE_STATUS_STREAM

arn:aws:firehose:xx:xx:deliverystream/test_stream

•

FLEET_FIREHOSE_STS_ASSUME_ROLE_ARN

arn:aws:iam:xx:role/firehoseRole

• An ECS task role permission to assume the role

arn:aws:iam:xx:role/firehoseRole

• A new IAM role

arn:aws:iam:xx:role/firehoseRole

with permissions to call

firehose:DescribeDeliveryStream

and

firehose:PutRecordBatch

against

arn:aws:firehose:xx:xx:deliverystream/test_stream

ty 1

zwass

05/24/2021, 3:17 PM

@nyanshak @billcobbler any ideas what might be going on here? I know y'all use the assume role.

nyanshak

05/24/2021, 3:21 PM

err... I use osquery's assume role but not fleet so not very familiar with that aspect. Also don't use ECS Fargate 😐 So... not very familiar with fleet's errors / what they mean here. But based on the info provided, I would assume that this should work. I'm not sure if maybe one of the things you think is configured is maybe not configured as you expect (e.g., a typo). But afaict, the config looks okay.

nyanshak

05/24/2021, 3:22 PM

I don't know if there's maybe an issue specific to fargate or fleet's code though

05/24/2021, 3:46 PM

Possibly related to a troubleshooting hiccup of mine, but I'm now seeing that my ECS tasks cannot pull the Docker Hub fleet image and are failing to start with the following "Stopped reason":

CannotPullContainerError: ref pull has been retried 1 time(s): failed to copy: httpReaderSeeker: failed open: unexpected status code <https://registry-1.docker.io/v2/fleetdm/fleet/manifests/sha256:a6694d267a20c0f656304c6392efbec9f9bb88a2499f61c0e6f0dab2>...

and

CannotPullContainerError: inspect image has been retried 5 time(s): httpReaderSeeker: failed open: unexpected status code <https://registry-1.docker.io/v2/fleetdm/fleet/manifests/sha256:a6694d267a20c0f656304c6392efbec9f9bb88a2499f61c0e6f0dab25e993a20>: 4...

Has anyone encountered the issue before? My tasks run on a private subnet with a route to a NAT gateway. My execution role and role policy follow what was suggested by AWS:

Copy code

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "<http://ecs-tasks.amazonaws.com|ecs-tasks.amazonaws.com>"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Copy code

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    }
  ]
}

Zach Zeid

05/24/2021, 4:46 PM

you could be running into docker hub's rate limiting

05/24/2021, 4:49 PM

@Zach Zeid that seemed to be the case. I lowed the desired state to 0 and bumped it back up, the task has no issues pulling the image now. Appreciate your comment!

05/24/2021, 5:42 PM

As for the original post, I managed to resolve the error by adding the

firehose:DescribeDeliveryStream

and

firehose:PutRecordBatch

permissions directly in the ECS task role. The task role still contains the permission to assume the

firehoseRole

role that grants the same permission, which likely is redundant. I'm now looking into whether I can completely do without the firehoseRole and point

FLEET_FIREHOSE_STS_ASSUME_ROLE_ARN

directly to the ECS task role.

nyanshak

05/24/2021, 5:43 PM

(not being super familiar with this aspect) I would generally assume that STS is optional and that if you already have the correct permissions on the ECS task role, you could forego setting

FLEET_FIREHOSE_STS_ASSUME_ROLE_ARN

entirely.

ty 1

nyanshak

05/24/2021, 5:43 PM

but I would love to get confirmation of that

nyanshak

05/24/2021, 5:44 PM

basically - you're already in AWS, able to give yourself all the right permissions, no reason to use STS assume role if you don't have to

nyanshak

05/24/2021, 5:44 PM

obv there are cases where that's not true, but ... 🤷‍♀️

05/24/2021, 6:18 PM

I can confirm my ECS task can successfully push to Firehose without the ARN env var. Good call @nyanshak!

ty 1

14 Views

Open in Slack

Previous Next