Hi all Please tell me has anyone encountered the inability t osquery #core

Hi all! Please tell me, has anyone encountered the...

Artem

11/14/2023, 7:25 PM

Hi all! Please tell me, has anyone encountered the inability to use the

process_open_sockets

and

listening_ports

tables on Linux load balancers with open sockets highload? We have a number of servers acting as external load balancers that can have over 200,000 active TCP/UDP sockets at any time. And on these servers we cannot effectively use the tables described above, because such queries often exceed the watchdog memory limit, although we raised it to 400 megabytes. As result they got denylisted. As I think, at the C++ code level, osquery first receives the full set of all sockets, and then applies the specified filters to this set. Perhaps there are some opportunities for optimization here.

Stefano Bonicatti

11/14/2023, 7:43 PM

Those seems quite a lot of connections. The fact is that almost all tables always return all data internally when queried, unless you're querying with a constraint that's a index/it's optimized (for

process_open_sockets

the

pid

for instance). There might be a couple of things that could be done, which all require code changes obviously. One is that if you're not querying for all the columns, the table can detect which columns are requested, and could return an empty result in that column. The other, slightly more advanced, is to do some constraint filtering in the table itself. But what I wonder here is what is occupying the majority of the memory. For 400MB of limit (supposing it would stop there), with 200k sockets it's 2KB of data generated per socket. Looking at the table alone it seems a bit high to be caused by the data presented there, so there are likely allocations in other places which are more connected to the amount of rows return (post filter). Internally osquery has to transform the data returned to JSON and then store it into RocksDB if there's a buffered logger (TLS, Kinesis, Firehose, Azure..)

Stefano Bonicatti

11/14/2023, 7:51 PM

For

listening_ports

I thing good part of the issue is that it queries the other table. I think we should avoid/remove these interactions, for these reasons and other. The table could simply use a shared code implementation and not have to go through sqlite to get the data, and also filter one row at a time while getting the socket data itself. It would improve its performance and peak memory usage.

👍 1

Artem

11/14/2023, 8:03 PM

> you’re querying with a constraint that’s a index/it’s optimized (for

process_open_sockets

the

pid

for instance). Honestly, I tried to join

processes

table with the

process_open_sockets

using pid but it didn’t help on these hosts. Simple query

SELECT * FROM listening_ports

also dropped into denylist after several attempts

Stefano Bonicatti

11/14/2023, 8:07 PM

Honestly, I tried to join
processes
table with the `process_open_sockets`but it didn’t help on these hosts.

This suggests that the amount of rows post filter is still high

Stefano Bonicatti

11/14/2023, 8:08 PM

Simple query
SELECT * FROM listening_ports
also dropped into denylist after several attempts

Yeah as mentioned above, this is the same as first doing

SELECT * from processes_open_sockets

(which is literally what it's doing here: https://github.com/osquery/osquery/blob/ac174deee3f7e902a7abc817c602550eada3c112/osquery/tables/networking/listening_ports.cpp#L25) and then filtering/generating the

listening_ports

data. So querying that table would use even more memory.

👍 1

3 Views

Open in Slack

Previous Next