Trying to dig back into the history of when the li...
# linux
Trying to dig back into the history of when the linux osqueryd binary increased in size.... Anyone recall why it increased so dramatically from
44mb ------>
196mb ?
5.0 was a major effort to redo packaging and releasing, and get us closer to reproduce-able builds, perhaps we now include debug symbols by default..? Or forgot to strip them
I will have to check this later when I am at a computer
And @Stefano Bonicatti might also have a clue
That's the tarball? It's not stripped. IMO it's a packaging bug. You can strip it, or extract from the rpm or Deb
s This was the issue you’ve opened. I wouldn’t call it a bug? Providing the debug symbols so that one can split them and keep them it’s useful. That been said we can do that job, creating 2 archives, one with the stripped binary, and the other just with the debug symbols. They also can be merged back with a tool called
Ok so the osqueryd binary in the tarball is being generated with debug symbols but the one in the rpm isnt? 200mb in the tarball vs. 49mb in the rpm
This came up because there is a group I am working with that is deploying osquery and their deployment package is massive. Looks like they are using the tarball as the source.
yeah the rpm isn’t because it splits the debug symbols in a different package.
I consider this a bug, because I think most use cases do not want the debug symbols. So the easy paths should be the stripped versions.
Well this requires a change in
and things tend to get bottlenecked there
Does it? I think it’s mostly something in the cpack code. Which is in osquery repo
There should be no problem on the codesign side, since Linux binaries are not signed
there’s should also be minimal logic to upload the debug symbols separately in the codesign repo, but again that should be straightforward
ah, okay, then anyone who has time to make the PR to
can help, sounds like it can be done in a GitHub Actions workflow
If someone gives me a few pointers, I can take a crack at this. It's become much more important for a project Im working on to get this file size reduced 🙂
I’d certainly appreciate it. I don’t really understand cpack. But whereever cpack does stuff, the binary should be stripped. I suspect we also need to extract the debugging symbols for distribution seperately.
But this is all deep in the cpack code. And I don’t know who other than the ToB folk really understand it
So CPack is just something mainly automatic that you can control via options that you set in a CMakeLists.txt. There are global options and per package options, and they are all documented It takes control of the CMake install phase (it runs the
target) to install files in its own intermediate directory, to then run (some of) the native packaging tools. Normally stripping would be controlled when installing, when you’re using CMake only and not creating a package. CPack takes control of that and by default is disabled, and lets the native tools deal with that, especially so they can split debug info out.
For osquery specifically though we have split things. We first install via CMake only into a folder, whose files are used to create the packages in a second step and with a different CMake/CPack project (osquery-packaging)
Now the last tidbit is that CPack can only create one package per target defined in CMake. I don’t think it’s worth making all the changes to duplicate the targets or components inside
, and instead we should simply add an option that decides if to strip or not when creating the TGZ. Which means that the CI will have to run cpack an additional time for a tgz without debug symbols
As far as telling CPack to strip files, there’s the
variable to set, which is documented in the link I gave above
I gave a long explanation just to give more insight in how things work, it seems complicated but it’s not that much and CPack is documented and behaves much like CMake (it’s automatic and you can control it via flags; no ones remember them by heart the options, so the manual is there to help), and now that I’m writing this, I think you could even not touch the
project and just act on the CI, passing
via command line, after adding a second run for a TGZ package.
From there the work is just letting the CI know where this new tgz is and upload it where necessary; the logic would be all in the
CI workflow
From there we have to take over in the codesign repo, to copy the new tgz coming from the
CI workflow run and upload it to the github release
Another thing you will encounter is that the name of the archive generated is parametric but the parameters are given via the CMakeLists.txt, so unless you change them it’s “hardcoded”. This means that a second run of the tgz packaging will overwrite the previous tgz. Here, depending on if we want to have this logic be usable outside of the CI, you can just rename the first archive in the workflow, after you generate it, and then call the second tgz generation with the stripping. Or if we want to have this logic easily available just by passing an option to the
CMake, then we have to move this
, from the CI workflow to the
CMake logic.