Title
#general
n

npamnani

01/29/2019, 5:10 AM
@钢铁侠, I have put some comment in the https://github.com/facebook/osquery/issues/5310 which is related to the pull request above.
u

钢铁侠

01/29/2019, 5:17 AM
how about this
I think it's why osqueryd can't expire these old logs, it just can't found the data log without records log.
the osquery remove the
records.auditeventpublisher.user_events
but not
data.auditeventpublisher.user_events
5:17 AM
@npamnani
n

npamnani

01/29/2019, 5:45 AM
Did you try the ldb command?
5:47 AM
'./sst_dump --file=/var/osquery/osquery.db/ --command=scan | grep socket_events | wc -l' will give a lot more rows than are available in the events column_family
5:48 AM
void EventSubscriberPlugin::expireCheck() { auto data_key = "data." + dbNamespace(); auto eid_key = "eid." + dbNamespace(); // Min key will be the last surviving key. size_t threshold_key = 0; { auto limit = getEventsMax(); std::vector&lt;std:😒tring> keys; scanDatabaseKeys(kEvents, keys, data_key); if (keys.size() <= limit) { return; } // There is an overflow of events buffered for this subscriber. LOG(WARNING) << "Expiring events for subscriber: " << getName() << " (overflowed limit " << limit << ")"; VLOG(1) << "Subscriber events " << getName() << " exceeded limit " << limit << " by: " << keys.size() - limit;
5:49 AM
it is accessing only how many data rows are available. Expiring comes later
u

钢铁侠

01/29/2019, 6:38 AM
I use
ldb --db=<path to db> scan --column_family="events
,it returns nothing.
n

npamnani

01/29/2019, 6:51 AM
try '--column_family="queries" '
u

钢铁侠

01/29/2019, 6:54 AM
the size of my osquery.db has arrived 1.5G and there are 753 sst files in osquery.db directory. I think maybe the name of the sst file should be continuous,but I found the max num of the sst is 017854.sst,the min is 000021.sst. I think all the small num of sst file should be remove,but they are not.this is what I got:
6:55 AM
when I use
'--column_family="queries"
,it has result.
7:02 AM
when I use command as follows:
$ ./sst_dump --file=/tmp/osquery.db/000021.sst --command=scan | grep record.auditeventpublisher.process_events
$ ./sst_dump --file=/tmp/osquery.db/000021.sst --command=scan | grep data.auditeventpublisher.process_events | wc -l
4844
7:04 AM
it shows that there is no
record.auditeventpublisher.process_events
,but it has 4844 of
grep data.auditeventpublisher.process_events
,this is the
000021.sst
which is oldest sst file.
7:14 AM
@npamnani
7:23 AM
I think the root cause is that the osquery just delete the records.auditeventpublisher.socket_events but not data.auditeventpublisher.socket_events and the osquery does not remove the old sst file in time.
n

npamnani

01/29/2019, 8:08 AM
@钢铁侠, scan the entire db for data.auditeventpublisher.socket_events
8:09 AM
Without knowing the internals of DB we can say "maybe the name of the sst file should be continuous"
8:10 AM
what is the time stamp of this file
8:10 AM
last modification time
u

钢铁侠

01/29/2019, 8:11 AM
-rw-r--r--. 1 user user 2138309 Jan 29 14:24 017854.sst
8:12 AM
-rw-r--r--.  1 user user 2136467 Jan 29 01:05 008594.sst
-rw-r--r--.  1 user user 2138352 Jan 29 01:05 008595.sst
-rw-r--r--.  1 user user 2139375 Jan 29 01:05 008596.sst
-rw-r--r--.  1 user user 2136993 Jan 29 01:05 008597.sst
-rw-r--r--.  1 user user    3278 Jan 29 01:12 008676.sst
-rw-r--r--.  1 user user 2140182 Jan 29 01:33 008923.sst
-rw-r--r--.  1 user user 2137903 Jan 29 02:05 009248.sst
-rw-r--r--.  1 user user 2139931 Jan 29 02:05 009249.sst
-rw-r--r--.  1 user user 2137200 Jan 29 02:05 009250.sst
-rw-r--r--.  1 user user 2139308 Jan 29 05:54 012010.sst
-rw-r--r--.  1 user user 2140123 Jan 29 05:54 012011.sst
-rw-r--r--.  1 user user 2137179 Jan 29 05:54 012012.sst
-rw-r--r--.  1 user user 2138895 Jan 29 05:54 012013.sst
-rw-r--r--.  1 user user    4779 Jan 29 05:54 012014.sst
-rw-r--r--.  1 user user   59646 Jan 29 07:22 012963.sst
-rw-r--r--.  1 user user  147760 Jan 29 08:33 013792.sst
-rw-r--r--.  1 user user 2136801 Jan 29 11:48 016022.sst
you can see that the the sst file has been removed between
02:05 009250.sst
and
05:54 012010.sst
n

npamnani

01/29/2019, 8:13 AM
what does the scan say?
u

钢铁侠

01/29/2019, 8:14 AM
$ ./sst_dump --file=/tmp/osquery.db/ --command=scan | grep data.auditeventpublisher.socket_events | wc -l
42
8:14 AM
got 42 records
n

npamnani

01/29/2019, 8:15 AM
not much of the data available
u

钢铁侠

01/29/2019, 8:15 AM
$ ./sst_dump --file=/tmp/osquery.db/ --command=scan | grep data.auditeventpublisher.process_events | wc -l
3433390
8:16 AM
there are a lot of process_events
n

npamnani

01/29/2019, 8:17 AM
did you try --column_family="queries"
8:17 AM
then grep 'data.auditeventpublisher.process_events '
u

钢铁侠

01/29/2019, 8:18 AM
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep process_events | wc -l

51480
8:19 AM
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep data.auditeventpublisher.process_events | wc -l

51460
n

npamnani

01/29/2019, 8:22 AM
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep records.auditeventpublisher.process_events
8:22 AM
What is its output
u

钢铁侠

01/29/2019, 8:24 AM
they miss 20 records is the
records.auditeventpublisher.process_events
n

npamnani

01/29/2019, 8:24 AM
?
u

钢铁侠

01/29/2019, 8:25 AM
8:26 AM
the result of scaning process_events is 51480,but the result of scaning data.auditeventpublisher.process_events is 51460
8:27 AM
the 20 missing records is all
records.auditeventpublisher.process_events
,when I compare the
process_events
result and
records.auditeventpublisher.process_events
n

npamnani

01/29/2019, 8:29 AM
did not understand
8:30 AM
20718897 is the last count
u

钢铁侠

01/29/2019, 8:30 AM
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep process_events | wc -l
have more 20 records than
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep data.auditeventpublisher.process_events | wc -l
n

npamnani

01/29/2019, 8:30 AM
ok got it
u

钢铁侠

01/29/2019, 8:32 AM
I want to know what the 20 records are, finally I find that there are 16 records of
records.auditeventpublisher.process_events
8:32 AM
and the last 4 records are
eid.auditeventpublisher.process_events : 0020739730
indexes.auditeventpublisher.process_events.60 : 25812378,25812379,25812380,25812381,25812382
optimize.process_events : 1548743012
optimize_eid.process_events : 0020728620
n

npamnani

01/29/2019, 8:34 AM
you have to count the output of '/ldb --db=/tmp/osquery.db scan --column_family=queries | grep records.auditeventpublisher.process_events'
8:34 AM
<rec num>:timestamp
8:35 AM
compare it with the output './ldb --db=/tmp/osquery.db scan --column_family=queries | grep data.auditeventpublisher.process_events '
u

钢铁侠

01/29/2019, 8:40 AM
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep  data.auditeventpublisher.process_events | wc -l

51460
there are 51460 records.It is difficult to compare
n

npamnani

01/29/2019, 8:40 AM
you also have to go through the logic
8:41 AM
db is reporting less data values than it has stored
u

钢铁侠

01/29/2019, 8:42 AM
go through the logic
? did not understand
n

npamnani

01/29/2019, 8:43 AM
along with the comparison I meant
8:44 AM
does not matter if we are deleting the data or not
8:44 AM
void EventSubscriberPlugin::expireCheck() { auto data_key = "data." + dbNamespace(); auto eid_key = "eid." + dbNamespace(); // Min key will be the last surviving key. size_t threshold_key = 0; { auto limit = getEventsMax(); std::vector&lt;std:😒tring> keys; scanDatabaseKeys(kEvents, keys, data_key); if (keys.size() <= limit) { return; } // There is an overflow of events buffered for this subscriber. LOG(WARNING) << "Expiring events for subscriber: " << getName() << " (overflowed limit " << limit << ")"; VLOG(1) << "Subscriber events " << getName() << " exceeded limit " << limit << " by: " << keys.size() - limit;
8:45 AM
the above logic should report correct count
8:45 AM
but its not, it mean
u

钢铁侠

01/29/2019, 8:45 AM
yep, but it not right
n

npamnani

01/29/2019, 8:45 AM
1. it is dangling data now
8:45 AM
2. it will be deleted eventually
8:46 AM
by the db, based upon the parameter we set or its internal logic
u

钢铁侠

01/29/2019, 8:47 AM
I also read the source code
CPP
void EventSubscriberPlugin::expireRecords(const std::string& list_type,
                                          const std::string& index,
                                          bool all) {
  if (!executedAllQueries()) {
    return;
  }

  auto record_key = "records." + dbNamespace();
  auto data_key = "data." + dbNamespace();

  // If the expirations is not removing all records, rewrite the persisting.
  std::vector<std::string> persisting_records;
  // Request all records within this list-size + bin offset.
  auto expired_records = getRecords({list_type + '.' + index}, false);
  if (all && expired_records.size() > 1) {
    deleteDatabaseRange(kEvents,
                        data_key + '.' + expired_records.begin()->first,
                        data_key + '.' + expired_records.rbegin()->first);
  } else {
    for (const auto& record : expired_records) {
      if (record.second <= expire_time_) {
        deleteDatabaseValue(kEvents, data_key + '.' + record.first);
      } else {
        persisting_records.push_back(record.first + ':' +
                                     std::to_string(record.second));
      }
    }
  }

  // Either drop or overwrite the record list.
  // 直接设置新的rocksdb
  if (all) {
    deleteDatabaseValue(kEvents, record_key + "." + list_type + "." + index);
  } else if (persisting_records.size() < expired_records.size()) {
    auto new_records = boost::algorithm::join(persisting_records, ",");
    setDatabaseValue(
        kEvents, record_key + "." + list_type + "." + index, new_records);
  }
}
8:50 AM
it use
auto expired_records = getRecords({list_type + '.' + index}, false);
to get the data.auditeventpublisher.process_events,I think the
getRecords()
does not return the right records or it does not find the records which they are exist indeed,so the osquery can't delete the data.auditeventpublisher.process_events
9:03 AM
I use the
'/ldb --db=/tmp/osquery.db scan --column_family=queries |  grep records.auditeventpublisher.process_events'
,and this is what I got:

https://osquery.slack.com/files/UDK4HKG9Z/FFR66929F/image.png

and this is what the
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep data.auditeventpublisher.process_events
got I find that the the eid in the fisrt 10
records.auditeventpublisher.process_events
records are all not exist in
data.auditeventpublisher.process_events
results.Only this
records.auditeventpublisher.process_events.60.[25812380-25812383]
,and the first item in
records.auditeventpublisher.process_events.60.25812380
is
0020688271:1548742845
,I convert the
1548742845
to date which is
2019-01-29 14:20:45
n

npamnani

01/29/2019, 9:04 AM
cant delete is one thing it is not even reporting correctly how many records it has stored
u

钢铁侠

01/29/2019, 9:08 AM
$ ./ldb --db=/tmp/osquery.db scan --column_family=queries | grep  data.auditeventpublisher.process_events | wc -l
51460
$ ./sst_dump --file=/tmp/osquery.db/ --command=scan | grep data.auditeventpublisher.process_events | wc -l
3433390
9:08 AM
why this two results are different?
9:13 AM
there must be something woring,the size of osquery.db in one of my servers has gotten to 2GB
10:43 AM
@npamnani