What's the status of container support for the bpf...
# ebpf
What's the status of container support for the bpf evented tables?
I'm slowly improving it, but I currently don't have as much time as I would like to work on it
but it's going to be an improvement on both cpu/memory usage and also less limitations on how many parameters we can get
more characters in the paths (like, working dir/binary path)
and then container names
I have a PoC but it's not yet ready for a PR
It is based on this library: https://github.com/trailofbits/btfparse
it lets us import kernel types dynamically from the /sys pseudo-dir
so it will always be up to date, and requires no dependencies on the system (like kernel header packages)
This updates the library and the execsnoop example. I am not entirely sure it is working, but I can see it is capturing the cgroup names:
Copy code
timestamp: 9119873753834 thread_id: 13561 process_id: 13561 uid: 0 gid: 0 cgroup_id: 22467 exit_code: 0 probe_error: 0 duration: 224574 cgroup_name: system.slice
  execve(filename: /usr/lib/NetworkManager/dispatcher.d/20-chrony-onoffline, argv: { /usr/lib/NetworkManager/dispatcher.d/20-chrony-onoffline, podman0, down })
This is using the BTF to LLVM bridge class that I wrote as a test, which is not pretty
and the btfparse library seeks around a lot while parsing and there is no cache in front of it so it's slow
cc @Artemis Tosini
Thanks! I need to do some setup but I'll try to get this running today to test
I'm building it this way: 1. I've placed the osquery-toolchain in /opt/osquery-toolchain 2. Set the TOOLCHAIN_PATH env var to point to it 3. Configured with
cmake -S . -B build -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain.cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DEBPFPUB_BUILD_EXAMPLES=true
4. To build and test, I am using:
cmake --build build && sudo ./build/examples/execsnoop/execsnoop
I'm getting cgroup names but for some reason it's not getting all processes so I only get
Not sure why, AArch64 linux still only has execve and execveat
ah I see, I guess we'll have to debug it! thanks for trying it out
So it is working, but only on Ubuntu
Here's a recording on my Ubuntu system
This is not ideal, I will try to debug this further
What distro and container tech was used to perform the test? I wonder if I can easily replicate it locally
Were other breakages noticed?
I was using Ubuntu Server AArch64 and docker, I didn't see any other breakages
I should have more time for this after work, so I'll try it again on different distros; I'll focus on Intel first then look at AArch64
Sorry for the delay! I wonder if I am reading the wrong field here
going to try with
Copy code
instead of
Copy code
oooh it seems to have fixed it here on my Fedora VM!
Copy code
timestamp: 375109372265 thread_id: 2705 process_id: 2705 uid: 0 gid: 0 cgroup_id: 8574 exit_code: 0 probe_error: 0 duration: 201273 cgroup_name: libpod-1aa46f32bbc6e946c757359f
  execve(filename: /usr/bin/date, argv: { date })
I'm using something weird here, it's not docker but a replacement that is auto-suggested by Fedora
Seems to be called "podman-docker"
I'll commit the change and try it again on Ubuntu with docker CE
I pushed the change!
I'm in a meeting but as soon as I'm done, I'll try this out on Ubuntu x86
Now I'm not getting any cgroup name
Copy code
timestamp: 433808650127 thread_id: 1994 process_id: 1994 uid: 0 gid: 0 cgroup_id: 4815 exit_code: 18446744073709551614 probe_error: 0 duration: 9875 cgroup_name: 
  execve(filename: /usr/local/sbin/gzip, argv: { gzip, -d })
it seems like it's in two different places depending on the system
I got to reinstall my ubuntu vm because for some reason both of the ones I have just give me back 404 for most of the repositories
but i was able to try the fix on another ubuntu 21.10
with the fix, fedora 36 works but ubuntu 21.10 breaks
without the fix, it's the opposite
I guess this isn't part of the "don't break userspace" rule
Maybe it's my code that sucks and has something broken or a wrong assumption
I'm reinstalling my VM, maybe 21.10 was not supported and they both died? how weird
journald has a way of doing this and it might be possible to do what they do, though I have another bug to work on today
Okay, here's what they do: https://github.com/systemd/systemd/blob/main/src/basic/cgroup-util.c#L700 It seems like they're reading it out of /proc which at least works
super interesting
this helped a lot!
we could get the whole tree
on my Fedora VM:
Copy code
cat /proc/2002/cgroup
on ubuntu
Copy code
cat 10401/cgroup 
Yeah, that also worked on my Ubuntu ARM VM
so it probably has to do with the backend used
i could just take the whole tree instead of just one entry
I'm guessing Fedora gave you podman
Yes, it seems like it's a docker CLI emulator
I (force) pushed a new commit
It is now capturing a slice of both the current and parent kernfs nodes
So we get something like this:
Copy code
timestamp: 2834861341276 thread_id: 3305 process_id: 3305 uid: 0 gid: 0 cgroup_id: 8997 exit_code: 0 probe_error: 0 duration: 167751
cgroup_name: libpod-46e9fc7dd73d128e58f34561, container

  execve(filename: /usr/bin/date, argv: { date })
On ubuntu + docker, it will look like this
Copy code
timestamp: 56308112557 thread_id: 2617 process_id: 2617 uid: 0 gid: 0 cgroup_id: 8271 exit_code: 0 probe_error: 0 duration: 182578
cgroup_name: system.slice, docker-456ab67280a837f4f36514c8

  execve(filename: /usr/bin/date, argv: { date })
We could switch one of the probes (maybe the execve/execveat ones are good?) to enable cgroup names and update how the system state tracker works to propagate it
so that we don't add the additional 64 bytes to all possible events