Rebecca Mahany-Horton
07/12/2024, 2:00 PMRebecca Mahany-Horton
07/12/2024, 2:00 PMRebecca Mahany-Horton
07/12/2024, 2:01 PMProgram terminated with signal SIGSEGV, Segmentation fault.
#0 0x00005652aa357892 in osquery::tables::RpmEnvironmentManager::Callback (rec=<optimized out>, data=0x7ff4acbfb340)
at /__w/osquery/osquery/workspace/padding-required-by-rpm-packages/src/osquery/tables/system/linux/rpm_packages.cpp:112
112 /__w/osquery/osquery/workspace/padding-required-by-rpm-packages/src/osquery/tables/system/linux/rpm_packages.cpp: No such file or directory.
Stefano Bonicatti
07/12/2024, 2:16 PMgdb <path to osqueryd> <path to coredump> -ex "thread apply all bt full" -ex quit | tee crash.log
and upload the crash.log to an issue, we can have a lookRebecca Mahany-Horton
07/12/2024, 2:18 PMRebecca Mahany-Horton
07/12/2024, 2:24 PMgdb /usr/lib/debug/opt/osquery/bin/osqueryd-5.12.2-1.linux.x86_64.debug ./osqueryd.coredump-lapis -ex "thread apply all bt full" -ex quit | tee crash.log
. this is not the machine that the coredump occurred on so i am not sure how helpful it isStefano Bonicatti
07/12/2024, 3:10 PMrpmdb.sqlite
causes those lines to be called, but the log is printed correctly and no crash happens. I think this has to be reproduced locally/manually to understand what are the conditions.
Although the question is if you have any RPM notice:
messages in your logs that might help understand what was (part of) the issue maybe.Stefano Bonicatti
07/12/2024, 3:10 PMStefano Bonicatti
07/12/2024, 3:12 PMrpm_packages
Rebecca Mahany-Horton
07/12/2024, 3:25 PMRebecca Mahany-Horton
07/12/2024, 3:27 PMglog_logger.cpp:49] RPM notice: Failed to read auxiliary vector, /proc not mounted?
and then 0 to a lot of glog_logger.cpp:49] Could not get RPM header flag.
logsRebecca Mahany-Horton
07/12/2024, 3:27 PMRebecca Mahany-Horton
07/12/2024, 3:30 PMrpm_packages
and let you know what i find 🤞seph
also a very wild guess is to check if those machines have a scheduled query and a distributed query being run somewhat at the same time, both on rpm_packagesI doubt this. What are you thinking about? If it’s a lock somewhere in rpm, what if it’s a lock outside osquery?
Stefano Bonicatti
07/12/2024, 5:12 PMRebecca Mahany-Horton
07/12/2024, 5:13 PMRebecca Mahany-Horton
07/12/2024, 5:14 PMseph
Rebecca Mahany-Horton
07/12/2024, 5:21 PMStefano Bonicatti
07/12/2024, 5:36 PM-ex
commands with -ex disas -ex quit
and log that to another file. That dumps the asm.. it should show better where exactly in the operation happening it's crashing.Rebecca Mahany-Horton
07/12/2024, 5:39 PMStefano Bonicatti
07/12/2024, 5:46 PM-ex "info registers"
before those two would be helpful too, so that we know the values being used.Rebecca Mahany-Horton
07/12/2024, 5:47 PMRebecca Mahany-Horton
07/12/2024, 5:49 PMRebecca Mahany-Horton
07/12/2024, 5:49 PMRebecca Mahany-Horton
07/12/2024, 5:51 PMStefano Bonicatti
07/12/2024, 7:41 PMrpm_packages
that use pid_with_namespace
. It could also very well be a memory corruption, of that pointer.Rebecca Mahany-Horton
07/12/2024, 8:26 PMRebecca Mahany-Horton
07/12/2024, 8:34 PMdeb_packages
that use pid_with_namespace
, but no other use of pid_with_namespace
.Lucas Rodriguez
07/23/2024, 8:57 PMselect 1 from rpm_packages rp, os_version ov where rp.name = "foo-fedora-playbooks" AND ov.name = "Fedora Linux";
.
It happens ~3/5 times I execute such query on such system.
Jul 23 17:53:20 fedora orbit[6140]: {"node_key":"eYRzBJB8B/uvb8Og4UmSbcWVuQKycyET"}
Jul 23 17:53:20 fedora orbit[6140]: {
Jul 23 17:53:20 fedora orbit[6140]: "queries": {
Jul 23 17:53:20 fedora orbit[6140]: "fleet_distributed_query_81": "select 1 from rpm_packages rp, os_version ov where rp.name = \"foo-fedora-playbooks\" AND ov.name = \"Fedora Linux\";"
Jul 23 17:53:20 fedora orbit[6140]: },
Jul 23 17:53:20 fedora orbit[6140]: "discovery": {
Jul 23 17:53:20 fedora orbit[6140]: "fleet_distributed_query_81": "SELECT 1"
Jul 23 17:53:20 fedora orbit[6140]: }
Jul 23 17:53:20 fedora orbit[6140]: }
Jul 23 17:53:20 fedora orbit[6140]: I0723 17:53:20.171214 6336 distributed.cpp:173] Executing distributed query: fleet_distributed_query_81: select 1 from rpm_packages rp, os_version ov where rp.name = "foo-fedora-playbooks" AND ov.name = "Fedora Linux";
Jul 23 17:53:20 fedora audit[6140]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=6140 comm="DistributedRunn" exe="/opt/orbit/bin/osqueryd/linux/stable/osqueryd" sig=11 res=1
Jul 23 17:53:20 fedora kernel: traps: DistributedRunn[6336] general protection fault ip:7fc0d605ce5c sp:7fc0c69f9f68 error:0 in libc.so.6[7fc0d5f09000+169000]
Jul 23 17:53:20 fedora audit: BPF prog-id=220 op=LOAD
Jul 23 17:53:20 fedora audit: BPF prog-id=221 op=LOAD
Jul 23 17:53:20 fedora audit: BPF prog-id=222 op=LOAD
Jul 23 17:53:20 fedora systemd[1]: Started systemd-coredump@4-6364-0.service - Process Core Dump (PID 6364/UID 0).
Jul 23 17:53:20 fedora audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-coredump@4-6364-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jul 23 17:53:21 fedora abrt-dump-journal-oops[1085]: abrt-dump-journal-oops: Found oopses: 1
Jul 23 17:53:21 fedora abrt-dump-journal-oops[1085]: abrt-dump-journal-oops: Creating problem directories
Jul 23 17:53:21 fedora systemd-coredump[6365]: Process 6140 (osqueryd) of user 0 dumped core.#012#012Module libnss_myhostname.so.2
Jul 23 17:53:21 fedora systemd-coredump[6365]: Process 6140 (osqueryd) of user 0 dumped core.#012#012Module libnss_myhostname.so.2 from rpm systemd-255.4-1.fc40.x86_64#012Module libcap.so.2 from rpm libcap-2.69-3.fc40.x86_64#012Module libnss_resolve.so.2 from rpm systemd-255.4-1.fc40.x86_64#012Module libnss_mdns4_minimal.so.2 from rpm nss-mdns-0.15.1-11.fc40.x86_64#012Stack trace of thread 6336:#012#0 0x00007fc0d605ce5c __strlen_evex (libc.so.6 + 0x17be5c)#012#1 0x00007fc0d5f474e8 __printf_buffer (libc.so.6 + 0x664e8)#012#2 0x00007fc0d5f6b82f __vsnprintf (libc.so.6 + 0x8a82f)#012#3 0x00005579b2533034 rpmlog (osqueryd + 0x272b034)#012#4 0x00005579b24fb286 errCb (osqueryd + 0x26f3286)#012#5 0x00005579b40e692c sqlite3_log (osqueryd + 0x42de92c)#012#6 0x00005579b414607f resolveExprStep (osqueryd + 0x433e07f)#012#7 0x00005579b4140c79 walkExpr (osqueryd + 0x4338c79)#012#8 0x00005579b4140c94 walkExpr (osqueryd + 0x4338c94)#012#9 0x00005579b41446d1 sqlite3ResolveExprNames (osqueryd + 0x433c6d1)#012#10 0x00005579b4146337 resolveSelectStep (osqueryd + 0x433e337)#012#11 0x00005579b4140d5f sqlite3WalkSelect (osqueryd + 0x4338d5f)#012#12 0x00005579b41471ff sqlite3ResolveSelectNames (osqueryd + 0x433f1ff)#012#13 0x00005579b4147184 sqlite3SelectPrep (osqueryd + 0x433f184)#012#14 0x00005579b412cabc sqlite3Select (osqueryd + 0x4324abc)#012#15 0x00005579b41258a1 yy_reduce (osqueryd + 0x431d8a1)#012#16 0x00005579b40f1e42 sqlite3RunParser (osqueryd + 0x42e9e42)#012#17 0x00005579b411f338 sqlite3Prepare (osqueryd + 0x4317338)#012#18 0x00005579b40f1120 sqlite3LockAndPrepare (osqueryd + 0x42e9120)#012#19 0x00005579b40e80de sqlite3_prepare_v2 (osqueryd + 0x42e00de)#012#20 0x00005579b1d44026 _ZN7osquery23getQueryColumnsInternalERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEERNS0_6vectorINS0_5tupleIJS6_NS_10ColumnTypeENS_13ColumnOptionsEEEENS4_ISD_EEEERKNS0_10shared_ptrINS_16SQLiteDBInstanceEEE (osqueryd + 0x1f3c026)#012#21 0x00005579b1d43f4c _ZNK7osquery15SQLiteSQLPlugin15getQueryColumnsERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS1_6vectorINS1_5tupleIJS7_NS_10ColumnTypeENS_13ColumnOptionsEEEENS5_ISE_EEEE (osqueryd + 0x1f3bf4c)#012#22 0x00005579b1d363cb _ZN7osquery9SQLPlugin4callERKNSt3__13mapINS1_12basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES8_NS1_4lessIS8_EENS6_INS1_4pairIKS8_S8_EEEEEERNS1_6vectorISF_NS6_ISF_EEEE (osqueryd + 0x1f2e3cb)#012#23 0x00005579b40add42 _ZN7osquery17RegistryInterface4callERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERKNS1_3mapIS7_S7_NS1_4lessIS7_EENS5_INS1_4pairIS8_S7_EEEEEERNS1_6vectorISG_NS5_ISG_EEEE (osqueryd + 0x42a5d42)#012#24 0x00005579b40abfd1 _ZN7osquery15RegistryFactory4callERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES9_RKNS1_3mapIS7_S7_NS1_4lessIS7_EENS5_INS1_4pairIS8_S7_EEEEEERNS1_6vectorISG_NS5_ISG_EEEE (osqueryd + 0x42a3fd1)#012#25 0x00005579b1d354c4 _ZN7osquery15getQueryColumnsERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEERNS0_6vectorINS0_5tupleIJS6_NS_10ColumnTypeENS_13ColumnOptionsEEEENS4_ISD_EEEE (osqueryd + 0x1f2d4c4)#012#26 0x00005579b1d35301 _ZN7osquery3SQLC2ERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEb (osqueryd + 0x1f2d301)#012#27 0x00005579b1d97f64 _ZN7osquery11Distributed17monitorNonnumericERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES9_ (osqueryd + 0x1f8ff64)#012#28 0x00005579b1d97813 _ZN7osquery11Distributed10runQueriesEv (osqueryd + 0x1f8f813)#012#29 0x00005579b1ebc8e4 _ZN7osquery17DistributedRunner5startEv (osqueryd + 0x20b48e4)#012#30 0x00005579b40a8309 _ZN7osquery16InternalRunnable3runEv (osqueryd + 0x42a0309)#012#31 0x00005579b40a929d _ZNSt3__18__invokeIRMN7osquery16InternalRunnableEFvvERPS2_JEvEEDTcldsdeclsr3std3__1E7forwardIT0_Efp0_Efp_spclsr3std3__1E7forwardIT1_Efp1_EEEOT_OS8_DpOS9_ (osqueryd + 0x42a129d)#012#32 0x00005579b40a925d _ZNSt3__115__apply_functorIMN7osquery16InternalRunnableEFvvENS_5tupleIJPS2_EEEJLm0EENS5_IJEEEEENS_13__bind_returnIT_T0_T2_Xsr22__is_valid_bind_returnISA_SB_SC_EE5valueEE4typeERSA_RSB_NS_15__tuple_indicesIJXspT1_EEEEOSC_ (osqueryd + 0x42a125d)#012#33 0x00005579b40a91e1 _ZNSt3__114__thread_proxyINS_5tupleIJNS_10unique_ptrINS_15__thread_structENS_14default_deleteIS3_EEEENS_6__bindIMN7osquery16InternalRunnableEFvvEJPS9_EEEEEEEEPvSF_ (osqueryd + 0x42a11e1)#012#34 0x00007fc0d5f771b7 start_thread (libc.so.6 + 0x961b7)#012#35 0x00007fc0d5ff939c __clone3 (libc.so.6 + 0x11839c)#012#012Stack trace of thread 6285:#012#0 0x00007fc0d5f73919 __futex_abstimed_wait_common (libc.so.6 + 0x92919)#012#1 0x00007fc0d5f76652 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x95652)#012#2 0x00005579b4bae18c _ZNSt3__118condition_variable15__do_timed_waitERNS_11unique_lockINS_5mutexEEENS_6chrono10time_pointINS5_12system_clockENS5_8durationIxNS_5ratioILl1ELl1000000000EEEEEEE (osqueryd + 0x4da618c)#012#3 0x00005579b40a8273 _ZNSt3__118condition_variable8wait_forIxNS_5ratioILl1ELl1000EEEEENS_9cv_statusERNS_11unique_lockINS_5mutexEEERKNS_6chrono8durationIT_T0_EE (osqueryd + 0x42a0273)#012#4 0x00005579b40a81a6 _ZN7osquery21InterruptibleRunnable5pauseENSt3__16chrono8durationIxNS1_5ratioILl1ELl1000EEEEE (osqueryd + 0x42a01a6)#012#5 0x00005579b1d7eda5 _ZN7osquery19ConfigRefreshRunner5startEv (osqueryd + 0x1f76da5)#012#6 0x00005579b40a8309 _ZN7osquery16InternalRunnable3runEv (osqueryd + 0x42a0309)#012#7 0x00005579b40a929d _ZNSt3__18__invokeIRMN7osquery16InternalRunnableEFvvERPS2_JEvEEDTcldsdeclsr3std3__1E7forwardIT0_Efp0_Efp_spclsr3std3__1E7forwardIT1_Efp1_EEEOT_OS8_DpOS9_ (osqueryd + 0x42a129d)#012#8 0x00005579b40a925d _ZNSt3__115__apply_functorIMN7osquery16InternalRunnableEFvvENS_5tupleIJPS2_EEEJLm0EENS5_IJEEEEENS_13__bind_returnIT_T0_T2_Xsr22__is_valid_bind_returnISA_SB_SC_EE5valueEE4typeERSA_RSB_NS_15__tuple_indicesIJXspT1_EEEEOSC_ (osqueryd + 0x42a125d)#012#9 0x00005579b40a91e1 _ZNSt3__114__thread_proxyINS_5tupleIJNS_10unique_ptrINS_15__thread_structENS_14default_deleteIS3_EEEENS_6__bindIMN7osquery16InternalRunnableEFvvEJPS9_EEEEEEEEPvSF_ (osqueryd + 0x42a11e1)#012#10 0x00007fc0d5f771b7 start_thread (libc.so.6 + 0x961b7)#012#11 0x00007fc0d5ff939c __clone3 (libc.so.6 + 0x11839c)#012#012Stack trace of thread 6141:#012#0 0x00007fc0d5f73919 __futex_abstimed_wait_common (libc.so.6 + 0x92919)#012#1 0x00007fc0d5f76652 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x95652)#012#2 0x00005579b4bae18c _ZNSt3__118condition_variable15__do_timed_waitERNS_11unique_lockINS_5mutexEEENS_6chrono10time_pointINS5_12system_clockENS5_8durationIxNS_5ratioILl1ELl1000000000EEEEEEE (osqueryd + 0x4da618c)#012#3 0x00005579b40a8273 _ZNSt3__118condition_variable8wait_forIxNS_5ratioILl1ELl1000EEEEENS_9cv_statusERNS_11unique_lockINS_5mutexEEERKNS_6chrono8durationIT_T0_EE (osqueryd + 0x42a0273)#012#4 0x00005579b40a81a6 _ZN7osquery21InterruptibleRunnable5pauseENSt3__16chrono8durationIxNS1_5ratioILl1ELl1000EEEEE (osqueryd + 0x42a01a6)#012#5 0x00005579b25c8115 _ZN7osquery20WatcherWatcherRunner5startEv (osqueryd + 0x27c0115)#012#6 0x00005579b40a8309 _ZN7osquery16InternalRunnable3runEv (osqueryd + 0x42a0309)#012#7 0x00005579b40a929d _ZNSt3__18__invokeIRMN7osquery16InternalRunnableEFvvERPS2_JEvEEDTcldsdeclsr3std3__1E7forwardIT0_Efp0_Efp_spclsr3std3__1E7forwardIT1_Efp1_EEEOT_OS8_DpOS9_ (osqueryd + 0x42a129d)#012#8 0x00005579b40a925d _ZNSt3__115__apply_functorIMN7osquery16InternalRunnableEFvvENS_5tupleIJPS2_EEEJLm0EENS5_IJEEEEENS_13__bind_returnIT_T0_T2_Xsr22__is_valid_bind_returnISA_SB_SC_EE5valueEE4typeERSA_RSB_NS_15__tuple_indicesIJXspT1_EEEEOSC_ (osqueryd + 0x42a125d)#012#9 0x00005579b40a91e1 _ZNSt3__114__thread_proxyINS_5tupleIJNS_10unique_ptrINS_15__thread_structENS_14default_deleteIS3_EEEENS_6__bindIMN7osquery16InternalRunnableEFvvEJPS9_EEEEEEEEPvSF_ (osqueryd + 0x42a11e1)#012#10 0x00007fc0d5f771b7 start_thread (libc.so.6 + 0x961b7)#012#11 0x00007fc0d5ff939c __clone3 (libc.so.6 + 0x11839c)#012#012Stack trace of thread 6144:#012#0 0x00007fc0d5f73919 __futex_abstimed_wait_common (libc.so.6 + 0x92919)#012#1 0x00007fc0d5f762e9 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x952e9)#012#2 0x00005579b4
Jul 23 17:53:21 fedora audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-coredump@4-6364-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Stefano Bonicatti
07/23/2024, 9:03 PMStefano Bonicatti
07/23/2024, 10:42 PMselect * from rpm_packages where name = "foo"
.
It crashes when the sqlite logging is called, which then tries to call the logging functions of librpm, after is deinitialized in the table.
librpm should be prevented from changing sqlite logging function, since unfortunately it's a global option and affects all parts of osquery.Stefano Bonicatti
07/23/2024, 10:57 PMStefano Bonicatti
07/23/2024, 11:03 PMStefano Bonicatti
07/23/2024, 11:11 PMLucas Rodriguez
07/24/2024, 7:51 PMLucas Rodriguez
07/24/2024, 7:51 PMLucas Rodriguez
07/24/2024, 7:52 PMDon't muck with per-process global sqlite configuration from the db backend
Stefano Bonicatti
07/24/2024, 7:52 PMLucas Rodriguez
07/24/2024, 7:56 PM+sqlite3_config() affects all in-process uses of sqlite. librpm being a
+low-level library, it has no business whatsoever making such decisions
+for the applications running on top of it. Besides that, the callback can
+easily end up pointing to an already closed database, causing an
+innocent API user to crash in librpm on an entirely unrelated error on
+some other database. "Oops."
+
+The sqlite API doesn't seem to provide any per-db or non-global context
+for logging errors, thus we can only remove the call and let sqlite output
+errors the way it pleases (print through stderr, presumably).
https://patchwork.yoctoproject.org/project/oe-core/patch/20230703065909.45555-1-anuj.mittal@intel.com/Stefano Bonicatti
07/24/2024, 7:58 PMLucas Rodriguez
07/24/2024, 8:09 PMLucas Rodriguez
07/24/2024, 8:09 PMLucas Rodriguez
07/24/2024, 8:09 PMStefano Bonicatti
07/24/2024, 8:12 PMLucas Rodriguez
07/24/2024, 8:15 PMLucas Rodriguez
07/24/2024, 8:15 PMSaid that, yeah, I was going to say that I have time to look into it tomorrow.Cool, thanks! Happy to re-test when I get back but probably not necessary since you were able to reproduce