Artem
11/14/2023, 7:25 PMprocess_open_sockets
and listening_ports
tables on Linux load balancers with open sockets highload?
We have a number of servers acting as external load balancers that can have over 200,000 active TCP/UDP sockets at any time.
And on these servers we cannot effectively use the tables described above, because such queries often exceed the watchdog memory limit, although we raised it to 400 megabytes. As result they got denylisted.
As I think, at the C++ code level, osquery first receives the full set of all sockets, and then applies the specified filters to this set.
Perhaps there are some opportunities for optimization here.Stefano Bonicatti
11/14/2023, 7:43 PMprocess_open_sockets
the pid
for instance).
There might be a couple of things that could be done, which all require code changes obviously.
One is that if you're not querying for all the columns, the table can detect which columns are requested, and could return an empty result in that column.
The other, slightly more advanced, is to do some constraint filtering in the table itself.
But what I wonder here is what is occupying the majority of the memory.
For 400MB of limit (supposing it would stop there), with 200k sockets it's 2KB of data generated per socket.
Looking at the table alone it seems a bit high to be caused by the data presented there, so there are likely allocations in other places which are more connected to the amount of rows return (post filter).
Internally osquery has to transform the data returned to JSON and then store it into RocksDB if there's a buffered logger (TLS, Kinesis, Firehose, Azure..)Stefano Bonicatti
11/14/2023, 7:51 PMlistening_ports
I thing good part of the issue is that it queries the other table.
I think we should avoid/remove these interactions, for these reasons and other. The table could simply use a shared code implementation and not have to go through sqlite to get the data, and also filter one row at a time while getting the socket data itself.
It would improve its performance and peak memory usage.Artem
11/14/2023, 8:03 PMprocess_open_sockets
the pid
for instance).
Honestly, I tried to join processes
table with the process_open_sockets
using pid but it didn’t help on these hosts.
Simple query SELECT * FROM listening_ports
also dropped into denylist after several attemptsStefano Bonicatti
11/14/2023, 8:07 PMHonestly, I tried to joinThis suggests that the amount of rows post filter is still hightable with the `process_open_sockets`but it didn’t help on these hosts.processes
Stefano Bonicatti
11/14/2023, 8:08 PMSimple queryYeah as mentioned above, this is the same as first doingalso dropped into denylist after several attemptsSELECT * FROM listening_ports
SELECT * from processes_open_sockets
(which is literally what it's doing here: https://github.com/osquery/osquery/blob/ac174deee3f7e902a7abc817c602550eada3c112/osquery/tables/networking/listening_ports.cpp#L25) and then filtering/generating the listening_ports
data. So querying that table would use even more memory.