Trying to dig back into the history of when the linux osquer osquery #linux

Trying to dig back into the history of when the li...

defensivedepth

08/18/2022, 1:51 AM

Trying to dig back into the history of when the linux osqueryd binary increased in size.... Anyone recall why it increased so dramatically from

4.9

44mb ------>

5.0

196mb ?

sharvil

08/18/2022, 1:57 AM

5.0 was a major effort to redo packaging and releasing, and get us closer to reproduce-able builds, perhaps we now include debug symbols by default..? Or forgot to strip them

sharvil

08/18/2022, 1:57 AM

I will have to check this later when I am at a computer

sharvil

08/18/2022, 1:57 AM

And @Stefano Bonicatti might also have a clue

seph

08/18/2022, 2:05 AM

That's the tarball? It's not stripped. IMO it's a packaging bug. You can strip it, or extract from the rpm or Deb

Stefano Bonicatti

08/18/2022, 9:34 AM

https://github.com/osquery/osquery/issues/7169 This was the issue you’ve opened. I wouldn’t call it a bug? Providing the debug symbols so that one can split them and keep them it’s useful. That been said we can do that job, creating 2 archives, one with the stripped binary, and the other just with the debug symbols. They also can be merged back with a tool called

eu-unstrip

defensivedepth

08/18/2022, 11:41 AM

Ok so the osqueryd binary in the tarball is being generated with debug symbols but the one in the rpm isnt? 200mb in the tarball vs. 49mb in the rpm

defensivedepth

08/18/2022, 11:43 AM

This came up because there is a group I am working with that is deploying osquery and their deployment package is massive. Looks like they are using the tarball as the source.

Stefano Bonicatti

08/18/2022, 1:10 PM

yeah the rpm isn’t because it splits the debug symbols in a different package.

seph

08/18/2022, 1:18 PM

I consider this a bug, because I think most use cases do not want the debug symbols. So the easy paths should be the stripped versions.

Mike Myers

08/18/2022, 5:11 PM

Well this requires a change in

osquery-codesign

and things tend to get bottlenecked there

seph

08/18/2022, 5:44 PM

Does it? I think it’s mostly something in the cpack code. Which is in osquery repo

Stefano Bonicatti

08/18/2022, 6:21 PM

There should be no problem on the codesign side, since Linux binaries are not signed

Stefano Bonicatti

08/18/2022, 6:22 PM

there’s should also be minimal logic to upload the debug symbols separately in the codesign repo, but again that should be straightforward

Mike Myers

08/18/2022, 6:31 PM

ah, okay, then anyone who has time to make the PR to

osquery/osquery

can help, sounds like it can be done in a GitHub Actions workflow

defensivedepth

04/14/2023, 12:38 AM

If someone gives me a few pointers, I can take a crack at this. It's become much more important for a project Im working on to get this file size reduced 🙂

seph

04/14/2023, 1:40 AM

I’d certainly appreciate it. I don’t really understand cpack. But whereever cpack does stuff, the binary should be stripped. I suspect we also need to extract the debugging symbols for distribution seperately.

seph

04/14/2023, 1:40 AM

But this is all deep in the cpack code. And I don’t know who other than the ToB folk really understand it

Stefano Bonicatti

04/14/2023, 9:04 AM

So CPack is just something mainly automatic that you can control via options that you set in a CMakeLists.txt. There are global options and per package options, and they are all documented https://cmake.org/cmake/help/latest/module/CPack.html It takes control of the CMake install phase (it runs the

install

target) to install files in its own intermediate directory, to then run (some of) the native packaging tools. Normally stripping would be controlled when installing, when you’re using CMake only and not creating a package. CPack takes control of that and by default is disabled, and lets the native tools deal with that, especially so they can split debug info out.

Stefano Bonicatti

04/14/2023, 9:05 AM

For osquery specifically though we have split things. We first install via CMake only into a folder, whose files are used to create the packages in a second step and with a different CMake/CPack project (osquery-packaging)

Stefano Bonicatti

04/14/2023, 9:16 AM

Now the last tidbit is that CPack can only create one package per target defined in CMake. I don’t think it’s worth making all the changes to duplicate the targets or components inside

osquery-packaging

, and instead we should simply add an option that decides if to strip or not when creating the TGZ. Which means that the CI will have to run cpack an additional time for a tgz without debug symbols

Stefano Bonicatti

04/14/2023, 9:17 AM

The logic that drives the packaging via CI is here: https://github.com/osquery/osquery/blob/e29abce93789f718aeb56fd5da3e94c6fee08073/.github/workflows/hosted_runners.yml#L433

Stefano Bonicatti

04/14/2023, 9:19 AM

As far as telling CPack to strip files, there’s the

CPACK_STRIP_FILES

variable to set, which is documented in the link I gave above

Stefano Bonicatti

04/14/2023, 9:23 AM

I gave a long explanation just to give more insight in how things work, it seems complicated but it’s not that much and CPack is documented and behaves much like CMake (it’s automatic and you can control it via flags; no ones remember them by heart the options, so the manual is there to help), and now that I’m writing this, I think you could even not touch the

osquery-packaging

project and just act on the CI, passing

CPACK_STRIP_FILES

via command line, after adding a second run for a TGZ package.

Stefano Bonicatti

04/14/2023, 9:25 AM

From there the work is just letting the CI know where this new tgz is and upload it where necessary; the logic would be all in the

osquery

CI workflow

Stefano Bonicatti

04/14/2023, 9:38 AM

From there we have to take over in the codesign repo, to copy the new tgz coming from the

osquery

CI workflow run and upload it to the github release

Stefano Bonicatti

04/14/2023, 9:44 AM

https://osquery.slack.com/archives/CBLGAN1HD/p1681463866643789?thread_ts=1660787519.172489&cid=CBLGAN1HD Forgot the aarch64 workflow too (the logic there is a copy basically) https://github.com/osquery/osquery/blob/master/.github/workflows/self_hosted_runners.yml#L351

Stefano Bonicatti

04/14/2023, 9:57 AM

Another thing you will encounter is that the name of the archive generated is parametric but the parameters are given via the CMakeLists.txt, so unless you change them it’s “hardcoded”. This means that a second run of the tgz packaging will overwrite the previous tgz. Here, depending on if we want to have this logic be usable outside of the CI, you can just rename the first archive in the workflow, after you generate it, and then call the second tgz generation with the stripping. Or if we want to have this logic easily available just by passing an option to the

osquery-packaging

CMake, then we have to move this

if

, from the CI workflow to the

osquery-packaging

CMake logic.

36 Views

Open in Slack

Previous Next