Our builds are running out of disk on the AWS runners <https osquery #core

Our builds are running out of disk on the AWS runn...

seph

09/03/2023, 11:20 AM

Our builds are running out of disk on the AWS runners. https://github.com/osquery/osquery/actions/workflows/self_hosted_runners.yml @Stefano Bonicatti Did you fix this last time?

Stefano Bonicatti

09/04/2023, 9:20 AM

No I did not, we merged some tests to reduce the size consumed but it was a short term solution. The long term solution is a slightly more substantial change, where we have to change how we register tables and plugins, so that instead of using global initialization side effects, it's actually an explicit function call at runtime, so that we can remove a linker flag (whole-archive) that doesn't permit dropping unused code sections in some of the static libraries that are linked to make the final executable (tests specifically).

seph

09/05/2023, 12:54 PM

I was poking through the AWS account, and it looks like the disk might be 32gig.

Stefano Bonicatti

09/05/2023, 12:59 PM

yeah and we have 26GB of those free, so the build is definitely too big...

seph

09/05/2023, 1:00 PM

I noticed there was a worker that had been stuck running for months. I wonder if that contributed, somehow. But I would have expected each worker to get a fresh EBS

Stefano Bonicatti

09/05/2023, 1:06 PM

No I don't think so, the 26GB read came from a run that failed with disk out of space, so it's a new one each time

seph

09/05/2023, 1:07 PM

It seems reasonable to drop some of those df commands into the normal builds.

seph

09/05/2023, 1:08 PM

But looking at https://github.com/osquery/osquery/actions/runs/6084513342/job/16506598189

Copy code

/dev/root        84G   65G   19G  78% /
/dev/sdb1        14G  4.1G  9.0G  31% /mnt

Both feel weird

Stefano Bonicatti

09/05/2023, 1:09 PM

nah that one is the standard runner, I forgot that I should run it on the AWS runner ^^'

seph

09/05/2023, 1:09 PM

Ah.

Stefano Bonicatti

09/05/2023, 1:10 PM

Although I did not connect the dots. How is it that it's filling up 26GB of disk on the AWS runner but not 19GB on the standard runner

seph

09/05/2023, 1:16 PM

I do think it would be reasonable to toss some df/du/tree style things into the builds. May as well always collect it pre and post build.

Stefano Bonicatti

09/05/2023, 1:23 PM

I'll double check locally with my M2 to see what's the supposed build size first

Stefano Bonicatti

09/05/2023, 1:26 PM

So a local x86_64 build on Linux (RelWithDebInfo) is 20GB

Stefano Bonicatti

09/05/2023, 1:28 PM

and that's only the build folder

Stefano Bonicatti

09/05/2023, 1:48 PM

Same for aarch64

Stefano Bonicatti

09/05/2023, 2:08 PM

Ok, first strange thing is that on the x86_64 runner cloning the source code (which should be ~4GB) takes no space. The other thing is that on x86_64 we do a RelWithDebInfo build and we use that to make the packages (due to the need for the symbols), but we do not build the tests there, we only build/run them on the Release and Debug (with no debug symbols though)

Stefano Bonicatti

09/05/2023, 2:11 PM

We could do that too for aarch64 for now.. so tests and debug symbols do not overlap

Stefano Bonicatti

09/05/2023, 2:12 PM

It's obviously an additional build

Stefano Bonicatti

09/05/2023, 3:08 PM

So I'm going to do a couple of tests with using strategy/matrix to do this, there might be an issue that impede us on not duplicating the code, and also, I might break the logic to stop runners. If you'd like, I can keep an eye sometimes on the runners, to prevent them to run for so long in the future too, but I don't have access to that account.

Stefano Bonicatti

09/05/2023, 4:46 PM

either that or we can also add a step which lists instances older than 3 hours via

aws-cli

and kill them.

seph

09/09/2023, 12:00 PM

I’d be happy to give you AWS access, but I’m slightly embarrassed I can’t easily do that. The AWS accounts are a horrific tangle.

seph

09/09/2023, 12:00 PM

Oh yes, we should totally add some kind of job that kills instances older than a couple hours. It feels like a good action to run a couple times a day.

2 Views

Open in Slack

Previous Next