< 钢铁侠> I have put some comment in the <https github com face osquery #general

<@UDK4HKG9Z>, I have put some comment in the <http...

npamnani

01/29/2019, 5:10 AM

@钢铁侠, I have put some comment in the https://github.com/facebook/osquery/issues/5310 which is related to the pull request above.

钢铁侠

01/29/2019, 5:17 AM

how about this

I think it's why osqueryd can't expire these old logs, it just can't found the data log without records log.

the osquery remove the

records.auditeventpublisher.user_events

but not

data.auditeventpublisher.user_events

钢铁侠

01/29/2019, 5:17 AM

@npamnani

npamnani

01/29/2019, 5:45 AM

Did you try the ldb command?

npamnani

01/29/2019, 5:47 AM

'./sst_dump --file=/var/osquery/osquery.db/ --command=scan | grep socket_events | wc -l' will give a lot more rows than are available in the events column_family

npamnani

01/29/2019, 5:48 AM

void EventSubscriberPlugin::expireCheck() { auto data_key = "data." + dbNamespace(); auto eid_key = "eid." + dbNamespace(); // Min key will be the last surviving key. size_t threshold_key = 0; { auto limit = getEventsMax(); std:vector<std:string> keys; scanDatabaseKeys(kEvents, keys, data_key); if (keys.size() <= limit) { return; } // There is an overflow of events buffered for this subscriber. LOG(WARNING) << "Expiring events for subscriber: " << getName() << " (overflowed limit " << limit << ")"; VLOG(1) << "Subscriber events " << getName() << " exceeded limit " << limit << " by: " << keys.size() - limit;

npamnani

01/29/2019, 5:49 AM

it is accessing only how many data rows are available. Expiring comes later

钢铁侠

01/29/2019, 6:38 AM

I use

ldb --db=<path to db> scan --column_family="events

,it returns nothing.

npamnani

01/29/2019, 6:51 AM

try '--column_family="queries" '

钢铁侠

01/29/2019, 6:54 AM

the size of my osquery.db has arrived 1.5G and there are 753 sst files in osquery.db directory. I think maybe the name of the sst file should be continuous,but I found the max num of the sst is 017854.sst,the min is 000021.sst. I think all the small num of sst file should be remove,but they are not.this is what I got:

钢铁侠

01/29/2019, 6:55 AM

when I use

'--column_family="queries"

,it has result.

钢铁侠

01/29/2019, 7:02 AM

when I use command as follows:

Copy code

$ ./sst_dump --file=/tmp/osquery.db/000021.sst --command=scan | grep record.auditeventpublisher.process_events
$ ./sst_dump --file=/tmp/osquery.db/000021.sst --command=scan | grep data.auditeventpublisher.process_events | wc -l
4844

钢铁侠

01/29/2019, 7:04 AM

it shows that there is no

record.auditeventpublisher.process_events

,but it has 4844 of

grep data.auditeventpublisher.process_events

,this is the

000021.sst

which is oldest sst file.

钢铁侠

01/29/2019, 7:14 AM

@npamnani

钢铁侠

01/29/2019, 7:23 AM

I think the root cause is that the osquery just delete the records.auditeventpublisher.socket_events but not data.auditeventpublisher.socket_events and the osquery does not remove the old sst file in time.

npamnani

01/29/2019, 8:08 AM

@钢铁侠, scan the entire db for data.auditeventpublisher.socket_events

npamnani

01/29/2019, 8:09 AM

Without knowing the internals of DB we can say "maybe the name of the sst file should be continuous"

npamnani

01/29/2019, 8:10 AM

what is the time stamp of this file

npamnani

01/29/2019, 8:10 AM

last modification time

钢铁侠

01/29/2019, 8:11 AM

-rw-r--r--. 1 user user 2138309 Jan 29 14:24 017854.sst

钢铁侠

01/29/2019, 8:12 AM

Copy code

-rw-r--r--.  1 user user 2136467 Jan 29 01:05 008594.sst
-rw-r--r--.  1 user user 2138352 Jan 29 01:05 008595.sst
-rw-r--r--.  1 user user 2139375 Jan 29 01:05 008596.sst
-rw-r--r--.  1 user user 2136993 Jan 29 01:05 008597.sst
-rw-r--r--.  1 user user    3278 Jan 29 01:12 008676.sst
-rw-r--r--.  1 user user 2140182 Jan 29 01:33 008923.sst
-rw-r--r--.  1 user user 2137903 Jan 29 02:05 009248.sst
-rw-r--r--.  1 user user 2139931 Jan 29 02:05 009249.sst
-rw-r--r--.  1 user user 2137200 Jan 29 02:05 009250.sst
-rw-r--r--.  1 user user 2139308 Jan 29 05:54 012010.sst
-rw-r--r--.  1 user user 2140123 Jan 29 05:54 012011.sst
-rw-r--r--.  1 user user 2137179 Jan 29 05:54 012012.sst
-rw-r--r--.  1 user user 2138895 Jan 29 05:54 012013.sst
-rw-r--r--.  1 user user    4779 Jan 29 05:54 012014.sst
-rw-r--r--.  1 user user   59646 Jan 29 07:22 012963.sst
-rw-r--r--.  1 user user  147760 Jan 29 08:33 013792.sst
-rw-r--r--.  1 user user 2136801 Jan 29 11:48 016022.sst

you can see that the the sst file has been removed between

02:05 009250.sst

and

05:54 012010.sst

npamnani

01/29/2019, 8:13 AM

what does the scan say?

钢铁侠

01/29/2019, 8:14 AM

Copy code

$ ./sst_dump --file=/tmp/osquery.db/ --command=scan | grep data.auditeventpublisher.socket_events | wc -l
42

钢铁侠

01/29/2019, 8:14 AM

got 42 records

npamnani

01/29/2019, 8:15 AM

not much of the data available

钢铁侠

01/29/2019, 8:15 AM

Copy code

$ ./sst_dump --file=/tmp/osquery.db/ --command=scan | grep data.auditeventpublisher.process_events | wc -l
3433390

钢铁侠

01/29/2019, 8:16 AM

there are a lot of process_events

npamnani

01/29/2019, 8:17 AM

did you try --column_family="queries"

npamnani

01/29/2019, 8:17 AM

then grep 'data.auditeventpublisher.process_events '

钢铁侠

01/29/2019, 8:18 AM

Copy code

./ldb --db=/tmp/osquery.db scan --column_family=queries | grep process_events | wc -l

51480

钢铁侠

01/29/2019, 8:19 AM

Copy code

./ldb --db=/tmp/osquery.db scan --column_family=queries | grep data.auditeventpublisher.process_events | wc -l

51460

npamnani

01/29/2019, 8:22 AM

./ldb --db=/tmp/osquery.db scan --column_family=queries | grep records.auditeventpublisher.process_events

npamnani

01/29/2019, 8:22 AM

What is its output

钢铁侠

01/29/2019, 8:24 AM

they miss 20 records is the

records.auditeventpublisher.process_events

npamnani

01/29/2019, 8:24 AM

钢铁侠

01/29/2019, 8:25 AM

image.png

钢铁侠

01/29/2019, 8:26 AM

the result of scaning process_events is 51480,but the result of scaning data.auditeventpublisher.process_events is 51460

钢铁侠

01/29/2019, 8:27 AM

the 20 missing records is all

records.auditeventpublisher.process_events

,when I compare the

process_events

result and

records.auditeventpublisher.process_events

npamnani

01/29/2019, 8:29 AM

did not understand

npamnani

01/29/2019, 8:30 AM

20718897 is the last count

钢铁侠

01/29/2019, 8:30 AM

./ldb --db=/tmp/osquery.db scan --column_family=queries | grep process_events | wc -l

have more 20 records than

./ldb --db=/tmp/osquery.db scan --column_family=queries | grep data.auditeventpublisher.process_events | wc -l

npamnani

01/29/2019, 8:30 AM

ok got it

钢铁侠

01/29/2019, 8:32 AM

I want to know what the 20 records are, finally I find that there are 16 records of

records.auditeventpublisher.process_events

钢铁侠

01/29/2019, 8:32 AM

and the last 4 records are

Copy code

eid.auditeventpublisher.process_events : 0020739730
indexes.auditeventpublisher.process_events.60 : 25812378,25812379,25812380,25812381,25812382
optimize.process_events : 1548743012
optimize_eid.process_events : 0020728620

npamnani

01/29/2019, 8:34 AM

you have to count the output of '/ldb --db=/tmp/osquery.db scan --column_family=queries | grep records.auditeventpublisher.process_events'

npamnani

01/29/2019, 8:34 AM

<rec num>:timestamp

npamnani

01/29/2019, 8:35 AM

compare it with the output './ldb --db=/tmp/osquery.db scan --column_family=queries | grep data.auditeventpublisher.process_events '

钢铁侠

01/29/2019, 8:40 AM

Copy code

./ldb --db=/tmp/osquery.db scan --column_family=queries | grep  data.auditeventpublisher.process_events | wc -l

51460

there are 51460 records.It is difficult to compare

npamnani

01/29/2019, 8:40 AM

you also have to go through the logic

npamnani

01/29/2019, 8:41 AM

db is reporting less data values than it has stored

钢铁侠

01/29/2019, 8:42 AM

go through the logic

? did not understand

npamnani

01/29/2019, 8:43 AM

along with the comparison I meant

npamnani

01/29/2019, 8:44 AM

does not matter if we are deleting the data or not

npamnani

01/29/2019, 8:44 AM

npamnani

01/29/2019, 8:45 AM

the above logic should report correct count

npamnani

01/29/2019, 8:45 AM

but its not, it mean

钢铁侠

01/29/2019, 8:45 AM

yep, but it not right

npamnani

01/29/2019, 8:45 AM

1. it is dangling data now

npamnani

01/29/2019, 8:45 AM

2. it will be deleted eventually

npamnani

01/29/2019, 8:46 AM

by the db, based upon the parameter we set or its internal logic

钢铁侠

01/29/2019, 8:47 AM

I also read the source code

Copy code

CPP
void EventSubscriberPlugin::expireRecords(const std::string& list_type,
                                          const std::string& index,
                                          bool all) {
  if (!executedAllQueries()) {
    return;
  }

  auto record_key = "records." + dbNamespace();
  auto data_key = "data." + dbNamespace();

  // If the expirations is not removing all records, rewrite the persisting.
  std::vector<std::string> persisting_records;
  // Request all records within this list-size + bin offset.
  auto expired_records = getRecords({list_type + '.' + index}, false);
  if (all && expired_records.size() > 1) {
    deleteDatabaseRange(kEvents,
                        data_key + '.' + expired_records.begin()->first,
                        data_key + '.' + expired_records.rbegin()->first);
  } else {
    for (const auto& record : expired_records) {
      if (record.second <= expire_time_) {
        deleteDatabaseValue(kEvents, data_key + '.' + record.first);
      } else {
        persisting_records.push_back(record.first + ':' +
                                     std::to_string(record.second));
      }
    }
  }

  // Either drop or overwrite the record list.
  // 直接设置新的rocksdb
  if (all) {
    deleteDatabaseValue(kEvents, record_key + "." + list_type + "." + index);
  } else if (persisting_records.size() < expired_records.size()) {
    auto new_records = boost::algorithm::join(persisting_records, ",");
    setDatabaseValue(
        kEvents, record_key + "." + list_type + "." + index, new_records);
  }
}

钢铁侠

01/29/2019, 8:50 AM

it use

auto expired_records = getRecords({list_type + '.' + index}, false);

to get the data.auditeventpublisher.process_events,I think the

getRecords()

does not return the right records or it does not find the records which they are exist indeed,so the osquery can't delete the data.auditeventpublisher.process_events

钢铁侠

01/29/2019, 9:03 AM

I use the

'/ldb --db=/tmp/osquery.db scan --column_family=queries |  grep records.auditeventpublisher.process_events'

,and this is what I got:

https://osquery.slack.com/files/UDK4HKG9Z/FFR66929F/image.png▾

and this is what the

./ldb --db=/tmp/osquery.db scan --column_family=queries | grep data.auditeventpublisher.process_events

got I find that the the eid in the fisrt 10

records.auditeventpublisher.process_events

records are all not exist in

data.auditeventpublisher.process_events

results.Only this

records.auditeventpublisher.process_events.60.[25812380-25812383]

,and the first item in

records.auditeventpublisher.process_events.60.25812380

0020688271:1548742845

,I convert the

1548742845

to date which is

2019-01-29 14:20:45

npamnani

01/29/2019, 9:04 AM

cant delete is one thing it is not even reporting correctly how many records it has stored

钢铁侠

01/29/2019, 9:08 AM

Copy code

$ ./ldb --db=/tmp/osquery.db scan --column_family=queries | grep  data.auditeventpublisher.process_events | wc -l
51460
$ ./sst_dump --file=/tmp/osquery.db/ --command=scan | grep data.auditeventpublisher.process_events | wc -l
3433390

钢铁侠

01/29/2019, 9:08 AM

why this two results are different？

钢铁侠

01/29/2019, 9:13 AM

there must be something woring,the size of osquery.db in one of my servers has gotten to 2GB

钢铁侠

01/29/2019, 10:43 AM

@npamnani

4 Views

Open in Slack

Previous Next