fleet 3.0.0 DB deadlock when migrating <thread&...
# kolide
n
fleet 3.0.0 DB deadlock when migrating <thread>
Copy code
2020/07/23 20:12:34 FAIL 20200405120000_UpdateLabelStorage.go (ensure all hosts are in all hosts label: Error 1213: Deadlock found when trying to get lock; try restarting transaction), quitting migration.
^ when upgrading from 2.6.0 to 3.0.0
But then when I retried
fleet prepare db
, it failed on:
Copy code
2020/07/23 22:47:00 FAIL 20200405120000_UpdateLabelStorage.go (create label_membership table: Error 1050: Table 'label_membership' already exists), quitting migration.
I was able to fix this by:
drop table label_membership;
Then re-running
fleet prepare db
@zwass ^ do you have any suggestions on how to better handle this when upgrading? And do you think there's anything to be done to make the upgrades more resilient?
z
Bizarrely, MySQL create table statements cannot be reverted by rolling back a transaction. This makes it tricky to be sure of exactly where things failed and how.
Did you have active servers running against the database while the migration was running?
Curious to understand how there could be a transaction deadlock while migration transactions are run sequentially.
n
Did you have active servers running against the database while the migration was running?
Yes - though I'm guessing from that that I probably shouldn't. Trying to limit downtime so hosts can continue to get osquery configs, but sounds like I should plan for downtime here
z
Well it's an interesting problem. Hosts typically do fine if the server is down for a short time (due to buffering of logs, stored config, etc.)
Servers are likely to have errors during migrations due to changing schema. We've never really built migrations with the expectation that uptime is 100%.
It could be done if folks have compelling reasons that it is necessary. In the mean time, sounds like we need to improve the documentation to make this clear.