Title
#kolide
n

nyanshak

07/24/2020, 12:24 AM
fleet 3.0.0 DB deadlock when migrating <thread>
12:25 AM
2020/07/23 20:12:34 FAIL 20200405120000_UpdateLabelStorage.go (ensure all hosts are in all hosts label: Error 1213: Deadlock found when trying to get lock; try restarting transaction), quitting migration.
^ when upgrading from 2.6.0 to 3.0.0
12:26 AM
But then when I retried
fleet prepare db
, it failed on:
2020/07/23 22:47:00 FAIL 20200405120000_UpdateLabelStorage.go (create label_membership table: Error 1050: Table 'label_membership' already exists), quitting migration.
12:26 AM
I was able to fix this by:
drop table label_membership;
Then re-running
fleet prepare db
12:27 AM
@zwass ^ do you have any suggestions on how to better handle this when upgrading? And do you think there's anything to be done to make the upgrades more resilient?
zwass

zwass

07/24/2020, 12:37 AM
Bizarrely, MySQL create table statements cannot be reverted by rolling back a transaction. This makes it tricky to be sure of exactly where things failed and how.
12:38 AM
Did you have active servers running against the database while the migration was running?
12:38 AM
Curious to understand how there could be a transaction deadlock while migration transactions are run sequentially.
n

nyanshak

07/24/2020, 12:48 AM
Did you have active servers running against the database while the migration was running?
Yes - though I'm guessing from that that I probably shouldn't. Trying to limit downtime so hosts can continue to get osquery configs, but sounds like I should plan for downtime here
zwass

zwass

07/24/2020, 12:56 AM
Well it's an interesting problem. Hosts typically do fine if the server is down for a short time (due to buffering of logs, stored config, etc.)
12:57 AM
Servers are likely to have errors during migrations due to changing schema. We've never really built migrations with the expectation that uptime is 100%.
12:58 AM
It could be done if folks have compelling reasons that it is necessary. In the mean time, sounds like we need to improve the documentation to make this clear.