Title
#fleet
Jocelyn Bothe

Jocelyn Bothe

07/21/2021, 8:26 PM
we tried upgrading to v4 today and it did not go well after upgrade, our mysql db keeps crashing on this query
Query (2b66089b0a90): INSERT IGNORE INTO scheduled_query_stats (     scheduled_query_id,     host_id,     average_memory,     denylisted,     executions,     schedule_interval,     last_executed,     output_size,     system_time,     user_time,     wall_time    )    VALUES ((SELECT sq.id FROM scheduled_queries sq JOIN packs p ON (sq.pack_id = p.id) WHERE p.name = ? AND sq.name = ?),?,?,?,?,?,?,?,?,?,?),((SELECT sq.id FROM scheduled_queries sq JOIN packs p ON (sq.pack_id = p.id) WHERE p.name = ? AND sq.name = ?),?,?,?,?,?,?,?,?,?,?),((SELECT sq.id FROM scheduled_queries sq JOIN packs p ON (sq.pack_id = p.id) WHERE p.name = ? AND sq.name = ?),?,?,?,?,?,?,?,?,?,?),((SELECT sq.id FROM scheduled_queries sq JOIN packs p ON (sq.pack_id = p.id) WHERE p.name = ? AND sq.name = ?),?,?,?,?,?,?,?,?,?,?),((SELECT sq.id FROM scheduled_queries sq JOIN packs p ON (sq.pack_id = p.id) WHERE p.name = ? AND sq.name = ?),?,?,?,?,?,?,?,?,?,?),((SELECT sq.id FROM scheduled_queries sq JOIN packs p ON (sq.pack_id = p.id) WHERE p.name = ? AND sq.name = ?),?
Tomas Touceda

Tomas Touceda

07/21/2021, 9:05 PM
hi Jocelyn, do you have logs you can share to see what the error is?
Jocelyn Bothe

Jocelyn Bothe

07/21/2021, 9:10 PM
that is from the mysql error log
9:10 PM
2021-07-21T20:00:48.271896Z 0 [Note] Server socket created on IP: '::'.
2021-07-21T20:00:48.298284Z 0 [Warning] 'user' entry 'mysql.sys@localhost' ignored in --skip-name-resolve mode.
2021-07-21T20:00:48.298300Z 0 [Warning] 'user' entry 'rdsadmin@localhost' ignored in --skip-name-resolve mode.
2021-07-21T20:00:48.360779Z 0 [Note] Event Scheduler: Loaded 1 event
2021-07-21T20:00:48.360853Z 1 [Note] Event Scheduler: scheduler thread started with id 1
2021-07-21T20:00:48.467787Z 0 [Note] /rdsdbbin/oscar/bin/mysqld: ready for connections.
Version: '5.7.12'  socket: '/tmp/mysql.sock'  port: 3306  MySQL Community Server (GPL)
20:00:51 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=84
max_threads=3000
thread_count=39
connection_count=38
connection_count=36
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1601274 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.


Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (2b85dd1b1a90): INSERT IGNORE INTO scheduled_query_stats (     scheduled_query_id,     host_id,     average_memory,     denylisted,     executions,     schedule_interval,     last_executed,     output_size,     system_time,     user_time,     wall_time    )    VALUES ((SELECT sq.id FROM scheduled_queries sq JOIN packs p ON (sq.pack_id = p.id) WHERE p.name = ? AND sq.name = ?),?,?,?,?,?,?,?,?,?,?),((SELECT sq.id FROM scheduled_queries sq JOIN packs p ON (sq.pack_id = p.id) WHERE p.name = ? AND sq.name = ?),?,?,?,?,?,?,?,?,?,?),((SELECT sq.id FROM scheduled_queries sq JOIN packs p ON (sq.pack_id = p.id) WHERE p.name = ? AND sq.name = ?),?,?,?,?,?,?,?,?,?,?),((SELECT sq.id FROM scheduled_queries sq JOIN packs p ON (sq.pack_id = p.id) WHERE p.name = ? AND sq.name = ?),?,?,?,?,?,?,?,?,?,?),((SELECT sq.id FROM scheduled_queries sq JOIN packs p ON (sq.pack_id = p.id) WHERE p.name = ? AND sq.name = ?),?,?,?,?,?,?,?,?,?,?),((SELECT sq.id FROM scheduled_queries sq JOIN packs p ON (sq.pack_id = p.id) WHERE p.name = ? AND sq.name = ?),?
Connection ID (thread ID): 124
Status: NOT_KILLED

The manual page at <http://dev.mysql.com/doc/mysql/en/crashing.html> contains
information that should help you find out what is causing the crash.
Writing a core file
9:11 PM
it would crash on that as soon as it started
9:31 PM
we have over 20k hosts, so I suspect when it's trying to do the initial data population for scheduled_query_stats it's too much for the db to handle
Tomas Touceda

Tomas Touceda

07/21/2021, 9:31 PM
would you be able to try with mysql 5.7.35? I don't recall any crashes like this one before. I would be surprised if a bug in fleet could cause that
9:31 PM
do you have CPU/memory metrics for the database?
Jocelyn Bothe

Jocelyn Bothe

07/21/2021, 9:31 PM
we're using an AWS Aurora mysql global db
9:32 PM
cpu, mem, and connections were all fine
9:32 PM
it was literally just dying on that query over and over
9:33 PM
the scheduled_query_stats table doesn't exist in 3.10, it looks like when 4.0 starts it tries to populate data into that new table?
9:33 PM
so what is scheduled_query_stats for, and can we disable it somehow, for testing?
Tomas Touceda

Tomas Touceda

07/21/2021, 9:35 PM
it's updated at different points as the hosts do requests against fleet for different things
Jocelyn Bothe

Jocelyn Bothe

07/21/2021, 9:37 PM
why start storing stats in the db?
Tomas Touceda

Tomas Touceda

07/21/2021, 9:39 PM
I would have to get back to you on that, as I wasn't involved in that
Jocelyn Bothe

Jocelyn Bothe

07/21/2021, 9:39 PM
fair enough
9:39 PM
I'd just love to get v4 working
Tomas Touceda

Tomas Touceda

07/21/2021, 9:39 PM
It looks like the LTS version of aurora is 2.07, and based on the logs you are in at most 2.03.1
9:40 PM
would you be able to test upgrading to 2.07?
Jocelyn Bothe

Jocelyn Bothe

07/21/2021, 9:41 PM
we're on
Engine version
5.7.mysql_aurora.2.07.2
9:42 PM
Instance class
db.r5.2xlarge
vCPU
8
RAM
64 GB
Tomas Touceda

Tomas Touceda

07/21/2021, 9:42 PM
huh, weird, the log shows
Version: '5.7.12'  socket: '/tmp/mysql.sock'  port: 3306
which according to the logs
For Aurora MySQL 2.x, all versions 2.03.1 and lower are represented by the engine version 5.7.12.
9:43 PM
let me do some digging and will get back to you
9:51 PM
so it seems others have encountered similar issues, and they solved it by upgrading to 2.10 https://osquery.slack.com/archives/C01DXJL16D8/p1623355865176100?thread_ts=1623178672.165300&amp;cid=C01DXJL16D8
Jocelyn Bothe

Jocelyn Bothe

07/21/2021, 9:51 PM
mysql> select aurora_version(), @@aurora_version;
+------------------+------------------+
| aurora_version() | @@aurora_version |
+------------------+------------------+
| 2.07.2           | 2.07.2           |
+------------------+------------------+
1 row in set (0.00 sec)
9:52 PM
sweet, we'll give that a try tomorrow
Noah Talerman

Noah Talerman

07/22/2021, 2:06 PM
@Jocelyn Bothe thank you for your patience and help troubleshooting the upgrade. Adding context to your earlier question:
why start storing stats in the db?
Fleet 3.12.0 introduced scheduled query information (“Name”, “Description”, “Frequency”, and “Last run”) to the Host details page. This way you can verify if, and when, a scheduled query ran successfully against a specific host without having to check osquery status logs or a configured log destination. In the coming months, the Fleet team plans to surface more scheduled query performance info in the Fleet UI. This UI will likely use
scheduled_query_stats
.
Jocelyn Bothe

Jocelyn Bothe

07/22/2021, 2:50 PM
thanks for the info!
2:51 PM
During our investigation, we identified the source of these failovers due to following sql statement:
INSERT IGNORE INTO scheduled_query_stats (...) VALUES (SELECT ... JOIN ...)(...)
This is a known Mysql Community edition bug and is already fixed in Aurora with the 2.09.1 release.  We recommend you to upgrade your cluster to 2.09.1 at your earliest convenience.
In the meantime, if possible could you avoid running this query to prevent further crashes.  You may instead modify your sql statement to insert single row at a time to avoid this issue.
2:52 PM
might be worth adding a note to your upgrade guide to mention the new requirement for mysql version