<@UDK4HKG9Z>, I have put some comment in the <http...
# general
n
@钢铁侠, I have put some comment in the https://github.com/facebook/osquery/issues/5310 which is related to the pull request above.
u
how about this
I think it's why osqueryd can't expire these old logs, it just can't found the data log without records log.
the osquery remove the
records.auditeventpublisher.user_events
but not
data.auditeventpublisher.user_events
@npamnani
n
Did you try the ldb command?
'./sst_dump --file=/var/osquery/osquery.db/ --command=scan | grep socket_events | wc -l' will give a lot more rows than are available in the events column_family
void EventSubscriberPlugin::expireCheck() { auto data_key = "data." + dbNamespace(); auto eid_key = "eid." + dbNamespace(); // Min key will be the last surviving key. size_t threshold_key = 0; { auto limit = getEventsMax(); std:vector&lt;std:string> keys; scanDatabaseKeys(kEvents, keys, data_key); if (keys.size() <= limit) { return; } // There is an overflow of events buffered for this subscriber. LOG(WARNING) << "Expiring events for subscriber: " << getName() << " (overflowed limit " << limit << ")"; VLOG(1) << "Subscriber events " << getName() << " exceeded limit " << limit << " by: " << keys.size() - limit;
it is accessing only how many data rows are available. Expiring comes later
u
I use
ldb --db=<path to db> scan --column_family="events
,it returns nothing.
n
try '--column_family="queries" '
u
the size of my osquery.db has arrived 1.5G and there are 753 sst files in osquery.db directory. I think maybe the name of the sst file should be continuous,but I found the max num of the sst is 017854.sst,the min is 000021.sst. I think all the small num of sst file should be remove,but they are not.this is what I got:
when I use
'--column_family="queries"
,it has result.
when I use command as follows:
Copy code
$ ./sst_dump --file=/tmp/osquery.db/000021.sst --command=scan | grep record.auditeventpublisher.process_events
$ ./sst_dump --file=/tmp/osquery.db/000021.sst --command=scan | grep data.auditeventpublisher.process_events | wc -l
4844
it shows that there is no
record.auditeventpublisher.process_events
,but it has 4844 of
grep data.auditeventpublisher.process_events
,this is the
000021.sst
which is oldest sst file.
@npamnani
I think the root cause is that the osquery just delete the records.auditeventpublisher.socket_events but not data.auditeventpublisher.socket_events and the osquery does not remove the old sst file in time.
n
@钢铁侠, scan the entire db for data.auditeventpublisher.socket_events
Without knowing the internals of DB we can say "maybe the name of the sst file should be continuous"
what is the time stamp of this file
last modification time
u
-rw-r--r--. 1 user user 2138309 Jan 29 14:24 017854.sst
Copy code
-rw-r--r--.  1 user user 2136467 Jan 29 01:05 008594.sst
-rw-r--r--.  1 user user 2138352 Jan 29 01:05 008595.sst
-rw-r--r--.  1 user user 2139375 Jan 29 01:05 008596.sst
-rw-r--r--.  1 user user 2136993 Jan 29 01:05 008597.sst
-rw-r--r--.  1 user user    3278 Jan 29 01:12 008676.sst
-rw-r--r--.  1 user user 2140182 Jan 29 01:33 008923.sst
-rw-r--r--.  1 user user 2137903 Jan 29 02:05 009248.sst
-rw-r--r--.  1 user user 2139931 Jan 29 02:05 009249.sst
-rw-r--r--.  1 user user 2137200 Jan 29 02:05 009250.sst
-rw-r--r--.  1 user user 2139308 Jan 29 05:54 012010.sst
-rw-r--r--.  1 user user 2140123 Jan 29 05:54 012011.sst
-rw-r--r--.  1 user user 2137179 Jan 29 05:54 012012.sst
-rw-r--r--.  1 user user 2138895 Jan 29 05:54 012013.sst
-rw-r--r--.  1 user user    4779 Jan 29 05:54 012014.sst
-rw-r--r--.  1 user user   59646 Jan 29 07:22 012963.sst
-rw-r--r--.  1 user user  147760 Jan 29 08:33 013792.sst
-rw-r--r--.  1 user user 2136801 Jan 29 11:48 016022.sst
you can see that the the sst file has been removed between
02:05 009250.sst
and
05:54 012010.sst
n
what does the scan say?
u
Copy code
$ ./sst_dump --file=/tmp/osquery.db/ --command=scan | grep data.auditeventpublisher.socket_events | wc -l
42
got 42 records
n
not much of the data available
u
Copy code
$ ./sst_dump --file=/tmp/osquery.db/ --command=scan | grep data.auditeventpublisher.process_events | wc -l
3433390
there are a lot of process_events
n
did you try --column_family="queries"
then grep 'data.auditeventpublisher.process_events '
u
Copy code
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep process_events | wc -l

51480
Copy code
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep data.auditeventpublisher.process_events | wc -l

51460
n
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep records.auditeventpublisher.process_events
What is its output
u
they miss 20 records is the
records.auditeventpublisher.process_events
n
?
u
the result of scaning process_events is 51480,but the result of scaning data.auditeventpublisher.process_events is 51460
the 20 missing records is all
records.auditeventpublisher.process_events
,when I compare the
process_events
result and
records.auditeventpublisher.process_events
n
did not understand
20718897 is the last count
u
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep process_events | wc -l
have more 20 records than
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep data.auditeventpublisher.process_events | wc -l
n
ok got it
u
I want to know what the 20 records are, finally I find that there are 16 records of
records.auditeventpublisher.process_events
and the last 4 records are
Copy code
eid.auditeventpublisher.process_events : 0020739730
indexes.auditeventpublisher.process_events.60 : 25812378,25812379,25812380,25812381,25812382
optimize.process_events : 1548743012
optimize_eid.process_events : 0020728620
n
you have to count the output of '/ldb --db=/tmp/osquery.db scan --column_family=queries | grep records.auditeventpublisher.process_events'
<rec num>:timestamp
compare it with the output './ldb --db=/tmp/osquery.db scan --column_family=queries | grep data.auditeventpublisher.process_events '
u
Copy code
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep  data.auditeventpublisher.process_events | wc -l

51460
there are 51460 records.It is difficult to compare
n
you also have to go through the logic
db is reporting less data values than it has stored
u
go through the logic
? did not understand
n
along with the comparison I meant
does not matter if we are deleting the data or not
void EventSubscriberPlugin::expireCheck() { auto data_key = "data." + dbNamespace(); auto eid_key = "eid." + dbNamespace(); // Min key will be the last surviving key. size_t threshold_key = 0; { auto limit = getEventsMax(); std:vector&lt;std:string> keys; scanDatabaseKeys(kEvents, keys, data_key); if (keys.size() <= limit) { return; } // There is an overflow of events buffered for this subscriber. LOG(WARNING) << "Expiring events for subscriber: " << getName() << " (overflowed limit " << limit << ")"; VLOG(1) << "Subscriber events " << getName() << " exceeded limit " << limit << " by: " << keys.size() - limit;
the above logic should report correct count
but its not, it mean
u
yep, but it not right
n
1. it is dangling data now
2. it will be deleted eventually
by the db, based upon the parameter we set or its internal logic
u
I also read the source code
Copy code
CPP
void EventSubscriberPlugin::expireRecords(const std::string& list_type,
                                          const std::string& index,
                                          bool all) {
  if (!executedAllQueries()) {
    return;
  }

  auto record_key = "records." + dbNamespace();
  auto data_key = "data." + dbNamespace();

  // If the expirations is not removing all records, rewrite the persisting.
  std::vector<std::string> persisting_records;
  // Request all records within this list-size + bin offset.
  auto expired_records = getRecords({list_type + '.' + index}, false);
  if (all && expired_records.size() > 1) {
    deleteDatabaseRange(kEvents,
                        data_key + '.' + expired_records.begin()->first,
                        data_key + '.' + expired_records.rbegin()->first);
  } else {
    for (const auto& record : expired_records) {
      if (record.second <= expire_time_) {
        deleteDatabaseValue(kEvents, data_key + '.' + record.first);
      } else {
        persisting_records.push_back(record.first + ':' +
                                     std::to_string(record.second));
      }
    }
  }

  // Either drop or overwrite the record list.
  // 直接设置新的rocksdb
  if (all) {
    deleteDatabaseValue(kEvents, record_key + "." + list_type + "." + index);
  } else if (persisting_records.size() < expired_records.size()) {
    auto new_records = boost::algorithm::join(persisting_records, ",");
    setDatabaseValue(
        kEvents, record_key + "." + list_type + "." + index, new_records);
  }
}
it use
auto expired_records = getRecords({list_type + '.' + index}, false);
to get the data.auditeventpublisher.process_events,I think the
getRecords()
does not return the right records or it does not find the records which they are exist indeed,so the osquery can't delete the data.auditeventpublisher.process_events
I use the
'/ldb --db=/tmp/osquery.db scan --column_family=queries |  grep records.auditeventpublisher.process_events'
,and this is what I got:

https://osquery.slack.com/files/UDK4HKG9Z/FFR66929F/image.png

and this is what the
./ldb --db=/tmp/osquery.db scan --column_family=queries | grep data.auditeventpublisher.process_events
got I find that the the eid in the fisrt 10
records.auditeventpublisher.process_events
records are all not exist in
data.auditeventpublisher.process_events
results.Only this
records.auditeventpublisher.process_events.60.[25812380-25812383]
,and the first item in
records.auditeventpublisher.process_events.60.25812380
is
0020688271:1548742845
,I convert the
1548742845
to date which is
2019-01-29 14:20:45
n
cant delete is one thing it is not even reporting correctly how many records it has stored
u
Copy code
$ ./ldb --db=/tmp/osquery.db scan --column_family=queries | grep  data.auditeventpublisher.process_events | wc -l
51460
$ ./sst_dump --file=/tmp/osquery.db/ --command=scan | grep data.auditeventpublisher.process_events | wc -l
3433390
why this two results are different?
there must be something woring,the size of osquery.db in one of my servers has gotten to 2GB
@npamnani