Convergent evolution: how rollups + data shards became the ultimate solution

Researchers have been hard at work on the blockchain scalability problem. The key tenet to a decentralized, permissionless and trustless network is to have a culture of users verifying the chain. Some, like EOS, Solana or Polygon PoS aren’t interested in this, and go for a centralized network where users have to trust a smaller number of validators with high-spec machines. There’s nothing wrong with this — it’s simply a direct trade-off. Some, like Bitcoin, have given up on the problem, presumably deeming it unsolvable — instead relying on more centralized entities outside the chain. Others are attempting more nuanced solutions to this problem.

The first obvious solution was to simply break the network up into multiple chains, with communication protocols between them. This will give you high scalability, as you can now can spread the load across multiple chains. You also maintain a higher degree of decentralization as each of these chains will still be accessible for verification or usage by the average user. However, you significantly give up on security, as your validator set is now split up into subnets between multiple chains. More naïve variants of this simply have different validator sets for the different chains (sidechains). More sophisticated variants have dynamic subnets. Either way, the point is — the split validator set is inherently less secure.

The next idea was to take the multiple chains approach, but enable shared security across all chains by posting fraud proofs from each shard chain to a central security chain. This is sharding, and each shard chain is backed by the full security of the network. You’ll remember the old Ethereum 2.0 roadmap followed this approach, with a central chain (beacon chain) connecting multiple shards.

Polkadot started with this model, but made two changes — make the beacon chain much more centralized (and rename it relay chain) and open up the shards. The limitation with Ethereum 2.0 shards were they were all designed to be identical at the protocol level. Polkadot’s shards (or what they call parachains) have a wider design space, where the parachain operators can customize each chain within the specifications of the overall network.

Rollups take this to the next level. Now, what were essentially shards or parachains are completely decoupled from the network, and protocol developers have a wide open space to develop the chain however they want. They can use L1’s security by simply communicating through arbitrary smart contracts developed in a way that is best optimized for their specific rollup, instead of in-protocol clients. Decoupling the rollup chains from the protocol has two further advantages over shards: if a rollup fails, it has no impact on L1; and most importantly, the L1 protocol doesn’t have any need to run a rollup full node. With sharding, there are still validators per shard which need to hold the full nodes for the shard (Polkadot calls them collators) in-protocol. If a shard fails, it can have ramifications for the shared consensus and other shards.

The disadvantage to a non-standardized approach with rollups is that there’s no clear interoperability schemes. However, by letting open innovation and free market sort this out can possibly achieve better solutions long term. For example, rollups are replacing fraud proofs (optimistic rollups) with validity proofs (zk Rollups), which have significant benefits. Now, sharding can also replace previous fraud proof models with zk-SNARK proofs, though this is an innovation born of and expedited by the open nature of rollups. If we had shards with fraud proofs at the protocol level, as originally planned, we would very likely not see zk-shards with validity proofs several years down the line. Likewise for experimental execution layers, like Aztec’s fully private VM, or StarkNet’s quantum-resistant VM. [Addendum: More examples: Fuel V2’s UTXOs-with-access-lists model. Arbitrum on WASM. zkSync 2.0 with LLVM/Rust.]

Rollups offer similar scalability to shards by themselves, but this is where the final piece of the puzzle comes in: data shards. One of the biggest challenges to executable shards was interoperability. While there are schemes for asynchronous communication, there’s never been a convincing proposal for composability. In a happy accident, shards can now be used as a data availability layer for rollups. A single rollup can now remain composable while leveraging data from multiple shards. Data shards can continue to expand, enabling faster and more rollups along with it. With innovative solutions like data availability sampling, extremely robust security is possible with data across upto a thousand shards.

Earlier, I mentioned that with executable shards, the subnet for each shard needs to hold full nodes, which significantly limits scalability. So what about rollups? If there was to be a “super-rollup” that does 100,000 TPS across 64 data shards, someone has to hold the full node, right? The answer is, yes, but in a zkR environment, this only needs to be sequencers. It’s perfectly fine for sequencers to run high-spec machines, if the average user can reconstruct the state from L1, or exit the rollup from L1 directly. With optimistic rollups, you do need at least one honest player to run a full node, but by the time we’re in the situation requiring a super-rollup, I’d imagine we’d be all in on zkRollups anyway. Further, we’ll need innovations like state expiry at the rollup level to make this viable, or possibly even schemes (just showerthinking here, don’t even know if it’s possible) [Addendum: It’s possible!]to have stateless clients that reconstruct relevant state directly from L1 etc. These types of innovations will simply be much slower and more restrictive with shards/L1s. Of course, you can also have sharded or multi-chain rollups, though each of them will likely break composability.

On that note, rollups do face some of the same challenges as shards with interoperability and composability. While one rollup can remain composable across multiple data shards, communication between rollups is just as challenging as between shards or blockchains of any kind, but not more so. As alluded to above, the bazaar will take some time to standardize on solutions, but these solutions will certainly end up being more innovative than hardcoded in-protocol solutions. [Addendum in Sept ’21: several projects have cross-L2 bridges live now, like Hop, cBridge, Connext, Biconomy. We have highly innovative schemes being developed, like dAMM — which lets multiple zkRs share liquidity! Cross-rollup interoperability, unsurprisingly, is already better than cross-L1. Further, you can eventually have multiple zkR shards that can be fully composable with each other by sharing proofs.]

The end result here is: rollups + data shards are the best solution we have. The blockchain world finally has converged on a solution that’ll enable mass adoption.

To be very clear, though, we’re right in the middle of this evolution. There remain some open questions. Rollups didn’t exist 3 years ago, and the rollup-centric pivot by Ethereum is less than a year old. Who knows how things will evolve over the coming years? I noticed Tezos founder Arthur Breitman acknowledging the superiority of the rollup + data shard model [Addendum: Arthur has since detailed why rollups + data shards are the best solution, and Tezos now seems to be following Ethereum on a rollup-centric roadmap too], and we’ve seen data availability chains like Celestia and Avail pop up to play in the rollup-centric (I’ll broadly include validium here, of course) world. I have an information gap that I’d request some feedback on: which are the other projects that are making the pivot towards the rollup-centric world, in some way? I’d love to know more, but it seems to me that we’re still very early and most blockchain projects still have their heads buried in the monolithic blockchain sand. I don’t see any other route than all projects converge on the rollup-centric world, in some way, or rely purely on marketing and memes to overlook technological obsolescence.

In short:

- Rollups take the multi-chain and sharding concepts to the next level.

- Rollups enable open innovation at the execution layer. [Addendum to clarify: Anything any centralized L1 can do, a rollup can do orders of magnitude better.]

- Use L1 for security and data availability.

- Combined with Ethereum data shards, open the floodgates to massive scalability (to the tune of millions of TPS long term).

- A single rollup can retain full composability across multiple data shards (the last bastion for high-TPS single ledger chains evaporates away).

- Inter-chain interoperability and composability remains an open challenge, much like with shards or L1 chains, though multiple projects are working on it in different ways. [Add

- Last, but not the least, they’re already here!

Discussion here: Convergent evolution: how rollups + data shards became the ultimate solution : ethfinance (reddit.com)

Rants and musings on blockchain tech. All content here in the public domain, please feel free to share/adapt/republish.