Hi all! Please tell me, has anyone encountered the...
# core
a
Hi all! Please tell me, has anyone encountered the inability to use the
process_open_sockets
and
listening_ports
tables on Linux load balancers with open sockets highload? We have a number of servers acting as external load balancers that can have over 200,000 active TCP/UDP sockets at any time. And on these servers we cannot effectively use the tables described above, because such queries often exceed the watchdog memory limit, although we raised it to 400 megabytes. As result they got denylisted. As I think, at the C++ code level, osquery first receives the full set of all sockets, and then applies the specified filters to this set. Perhaps there are some opportunities for optimization here.
s
Those seems quite a lot of connections. The fact is that almost all tables always return all data internally when queried, unless you're querying with a constraint that's a index/it's optimized (for
process_open_sockets
the
pid
for instance). There might be a couple of things that could be done, which all require code changes obviously. One is that if you're not querying for all the columns, the table can detect which columns are requested, and could return an empty result in that column. The other, slightly more advanced, is to do some constraint filtering in the table itself. But what I wonder here is what is occupying the majority of the memory. For 400MB of limit (supposing it would stop there), with 200k sockets it's 2KB of data generated per socket. Looking at the table alone it seems a bit high to be caused by the data presented there, so there are likely allocations in other places which are more connected to the amount of rows return (post filter). Internally osquery has to transform the data returned to JSON and then store it into RocksDB if there's a buffered logger (TLS, Kinesis, Firehose, Azure..)
For
listening_ports
I thing good part of the issue is that it queries the other table. I think we should avoid/remove these interactions, for these reasons and other. The table could simply use a shared code implementation and not have to go through sqlite to get the data, and also filter one row at a time while getting the socket data itself. It would improve its performance and peak memory usage.
👍 1
a
> you’re querying with a constraint that’s a index/it’s optimized (for
process_open_sockets
the
pid
for instance). Honestly, I tried to join
processes
table with the
process_open_sockets
using pid but it didn’t help on these hosts. Simple query
SELECT * FROM listening_ports
also dropped into denylist after several attempts
s
Honestly, I tried to join
processes
table with the `process_open_sockets`but it didn’t help on these hosts.
This suggests that the amount of rows post filter is still high
Simple query
SELECT * FROM listening_ports
also dropped into denylist after several attempts
Yeah as mentioned above, this is the same as first doing
SELECT * from processes_open_sockets
(which is literally what it's doing here: https://github.com/osquery/osquery/blob/ac174deee3f7e902a7abc817c602550eada3c112/osquery/tables/networking/listening_ports.cpp#L25) and then filtering/generating the
listening_ports
data. So querying that table would use even more memory.
👍 1