On Friday December 17, 2:55:54PM the Picasso parachain on the Kusama relay chain stopped producing blocks, the last block it produced was backed by relay block #10,561,126. The reason was a faulty runtime upgrade that was applied at parachain block #75,775.
Specifically, a runtime upgrade that changed it's block-times for the aura consensus worker. This piece of documentation here doesn't specify that this value shouldn't be changed once the chain is live and so was updated from 3000
(milli-seconds) to 6000
(milli-seconds). Which was more in line with the relay chain's own expected block times for para-chains which is 12 seconds.
The reason it was 6s
in the first place was because our runtime was first a standalone substrate blockchain and had a 6s
expected block time, unfortunately when it was converted to a parachain, this value wasn't updated to parachain standard block time. Which in turn caused spotty block production on Picasso with block times averaging 30s
. Which makes sense, since the collators are authoring blocks every 6s
but are really only allowed to produce blocks every 12s
by the relay chain, so collators often miss their slots.
After the runtime upgrade, all our collators stopped producing blocks. Primarily because from the perpective of the parachain, the new slot numbers produced by the new runtime code are for earlier slots. From the logs of one of the collators:
2021-12-18 05:54:00.506 DEBUG tokio-runtime-worker aura: [Parachain] Starting authorship at slot 136650570; timestamp = 1639806840450
Note the slot number, now compare with slot number of the last block on the chain.
It would take roughly 53 years for our chain to start producing blocks.
With the chain unable to produce blocks it became clear we have only 2 options:
MinimumPeriod
then, create a new chain-spec with that same wasm blob specified as a wasm override, then update our collators so they can resume producing blocks.Pros
Cons
2. We can call paras.force_set_current_head
and paras.force_set_current_wasm
inorder to reset the chain with a new wasm/header with the correct MinimumPeriod
Pros
Cons
After careful consideration we decided to go with approach 2. primarily because it's the faster route to solving all our problems (un-bricking the chain and 30s
block production). Otherwise we'd have to first modify the aura pallet to support slot offsets allowing us upgrade our chain and successfully change the MinimumPeriod
value then properly test this new functionality. But luckily because Picasso is also still a Poa (proof of authority) chain, it can afford to be restarted.
Starting from scratch with the right config will prevent us from having any more troubles with potentially bricking Picasso even after it becomes community run.
We'd like to restart the chain from genesis with a new header and wasm file using paras.force_set_current_head
and paras.force_set_current_code
paras.force_set_current_head
and paras.force_set_current_wasm
.