Why rollups + data shards are the only sustainable solution for high scalability

Polynya
10 min readSep 8, 2021

--

The argument for rollups + data shards (rads henceforth) is usually it’s more secure and decentralized. But this is only part of the story. The real reason rads are the only solution for global scale is scalability — because it’s the only way to do millions of TPS long term. Specifically, I’m going to consider zkRollups, as optimistic rollups have inherent limitations. So, why is this? It comes down to a) technical sustainability, and b) economic sustainability.

Technical sustainability

Breaking this down further, a technically sustainable blockchain node has to do three things:

  1. Keep up with the chain, and have nodes in sync.
  2. Be able to sync from genesis in a reasonable time.
  3. Avoid state bloat getting out of hand.

Obviously, for a decentralized network, all of this is non-negotiable, and leads to severe bottlenecks. [Addendum: Some have pointed out that 2) isn’t necessary. I agree, verified snapshots with social consensus are fine. Also, the above applies for state, not history.] Ethereum is pushing the edge of what’s possible while retaining all 3, and this is clearly not enough. A sharded chain retaining these 3 will only increase scale to a few thousand TPS at most — also not enough.

The centralized solution and their hard limits

But more centralized networks can start compromising. 1) You don’t need everyone to keep up with the chain, as long as a minimal number of validators do. 2) You don’t need to sync from genesis, just use snapshots and other shortcuts. 3) State expiry is a great solution to this, and will be implemented across most chains; until then, brute force expiry solutions like regenesis can be helpful. By now, you can see that these networks are no longer decentralized, but we don’t care about that for this post — we are only concerned with scalability.

Of these, 1) is a hard limit, and RAM, CPU, disk I/O and bandwidth are potential bottlenecks for each node, more importantly — keeping a minimal number of nodes in sync across the network means there are hard limits to how far you can push. Indeed, you can see networks like Solana and Polygon PoS pushing too hard already, despite only processing a few hundred TPS (not counting votes). I went to the website Solana Beach, and it says “Solana Beach is having issues catching up with the Solana blockchain”, with block times mentioned as 0.55s — 43% off the 0.4 second target. You need a minimum of 128 GB to even keep up with the chain, and even 256 GB RAM isn’t enough to sync from genesis — so you need snapshots to make it work. This is the 2) compromise, as mentioned above, but we’ll let it pass as we’re solely focused on scalability here. Jameson Lopp did a test on a 32 GB machine — and predictably, it crashed within an hour unable to keep up. Of course, Solana makes for a good example, but this is true of others.

zkRollups can push well past centralized L1s

zkRs can have significantly higher requirement than even the most centralized L1s, because the validity proof makes it as secure as the most decentralized L1! You can have only one node active at a given time, and still be highly secure. Of course, for censorship resistance and resilience, we need multiple sequencers, but even these don’t need to come to consensus, and can be rotated accordingly. Hermez and Optimism, for example, only plan to have one sequencer active at one time, rotated between multiple sequencers.

Further, zkRs can use all the innovations to make full node clients as efficient as possible, whether they are done for zkRs or L1s. zkRollups can get very creative with state expiry techniques, given that history can be reconstructed directly from L1. Indeed, there will be innovations with shard and history access precompiles that could enable running zkRs directly over data shards! We’d also need light unassisted withdrawals to make all of this bulletproof (pun not intended).

However, even here, we run into hard limits. 1 TB RAM, 2 TB RAM, there’s a limit to how far one can go. You also need to consider infrastructure providers who need to be able to keep up with the chain.

So, yes, a zkR can be significantly more scalable than the most scalable L1, but it’s not going to attain global scale by itself.

And keep going with multiple zkRs

This is where you can have multiple zkRs running over Ethereum data shards — effectively sharded zkRs. Once released, they’ll provide massive data availability, that’ll continue to expand as required, speculatively up to 15 million TPS by the end of the decade. One zkR is not going to do these kinds of insane throughputs, but multiple zkRs can.

Will each zkR shard break composability? Currently, yes. [Addendum: Actually, it turns out it’s possible for zkR shards within the same network to have full synchronous atomic composability — as you can do this pre-consensus. To the security layer, it’ll just appear as a single zkR with a single proof. This is impossible with L1s — another big win for zkRs.] But we’re seeing a ton of work being done in this space with fast bridges like Hop, Connext, cBridge, Biconomy, and brilliant innovations like dAMM that let multiple zkRs share liquidity. Many of these innovations would be much harder or impossible on L1s. I expect continued innovation in this space to make multiple zkR chains seamlessly interoperable.

Tl;dr: Whatever the most centralized of L1s can do, zkR can do much better, with significantly higher TPS. Further, we can have multiple zkRs that can effectively attain global scale in aggregate.

Economic sustainability

This one’s fairly straightforward. A network needs to collect more transaction fees than inflation handed out to validators and delegators. In reality, this is a very complex topic, so I’ll try to keep it as simple as possible. It’s certainly true that speculative fervour and monetary premium could keep a network sustainable even if it’s effectively running at a loss, but for a truly resilient, decentralized network, we should strive for economic sustainability.

Centralized L1s cost way more to maintain than revenues collected

Let’s consider our two favourite examples again — Polygon PoS and Solana. Polygon PoS is collecting roughly $50,000/day in transaction fees, or $18M annualized. Meanwhile, it’s distributing well over $400M in inflationary rewards. That’s an incredible net loss of 95%. As for Solana, it collected only ~$10K/day for the longest time, but with the speculative mania it has seen a significant increase to ~$100K/day, or $36.5M annualized. Solana is giving out an even more astounding $4B in inflationary rewards, leading to a net loss of 99.2%. I’ve collected my numbers from Token Terminal and Staking Rewards, and I should note that I’m being very conservative with these numbers — in reality they look even worse. By the way, Ethereum is collecting more fees in a day than both of these networks combined in an entire year!

You can’t just increase throughput beyond what’s technically possible

Now, the argument here is that — they’ll process more transactions and collect more fees in the future, and the inflation will decrease, and eventually, the networks will break even. The reality is far more complicated. Firstly, even if we consider Solana’s lowest possible inflation attained at the end of the decade, we’re still looking at a 96% loss. Things are so skewed that it hardly matters — you need to do throughput well beyond what’s possible to break even. As a thought experiment, Solana would need to do 154,000 TPS at the current transaction fee just to break even — which is totally impossible given current hardware and bandwidth.

The bigger issue, though, is that those additional transactions don’t come for free — they add greater bandwidth requirements, greater state bloat, and in general, higher system requirements still. Some would argue further that there’s great headroom already, and they can do much more, but as I covered in the technical scalability section, this is a dubious assumption at best — given you need 128 GB RAM to even keep up with a chain that’s only doing a few hundred TPS. The other argument is that hardware will become cheaper — true enough, but this is not a magical solution — you will either need to choose higher scale, lower costs, or a balance of the two, and note that zkR will also benefit equally from Moore’s law and Nielsen’s law.

In the end, all centralized L1s have to increase their fees

The only two resolutions for this, in the end, are a) the network becomes even more centralized, and b) higher fees as the network reaches its limits. a) has its limits, as disussed, so b) is inevitable. You can see this happen on Polygon PoS, with fees starting to creep up. Indeed, Binance Smart Chain has already gone through this process, and is now a sustainable network — though the fees are significantly higher to get there. Remember, we’re just talking about economic sustainability here.

Before moving on, let me just point out again that there are many, many variables — like price appreciation and volatility — and this is definitely a simplified take, but I believe the general logic will be clear.

How rads are significantly more efficient, with a fraction of the overhead

Coming to the rads scenario. On the rollup side, it costs a tiny, tiny fraction to maintain, with very few nodes required to be live at a given time, and without the requirement for expensive consensus mechanisms for security. All of this despite offering much greater throughput than any L1 ever can. Rollups can simply charge a nominal L2 tx fee, which keeps the network profitable. On the data availability side, Ethereum is highly deflationary currently, and combined with the highly efficient Beacon chain consensus mechanism only needs a minimal level of activity to have near-zero inflation.

The entire rads ecosystem can thus remain sustainable with far greater scalability and potentially much lower fees. Indeed, it’s in the best interest of L1s to become zkRs, and I’m glad to see Solana at least contemplating this.

Tl;dr: Rads have a miniscule fraction of the cost overhead of a centralized L1, allowing it to offer orders of magnitude greater throughput with similar fees; or similar throughput with a fraction of the fees.

The short term view

It’s very important to understand that rads is a long-term view that’ll take several years to mature.

In the short term, though, there are two options:

  1. A sustainable centralized L1 like Binance Smart Chain and rollups.
  2. An unsustainable centralized L1.

1 is still going to be too expensive for most. Optimised rollups like Hermez, dYdX or Loopring offer BSC-like fees, while Arbitrum One and Optimistic Ethereum have a ways to get there — though OVM 2.0 releasing next month promises to bring 10x lower fees on OE. 2) Polygon PoS and Solana offer lower fees currently, but I have made an extensive argument above about how this is unsustainable long term. In the short term, though, they offer a great option for users looking out for cheap transactions. But, wait, there’s a third option! 3) Validiums.

Validiums offer Polygon PoS or Solana like fees — indeed, Immutable X is now live offering free NFT mints. Try out yourself on SwiftMint. Now, the data availability side of a validium is arguably as unsustainable as a centralized L1, though with using alternative consensus methods like data availability committees it’s actually significantly cheaper still. But the brilliant thing about validiums is that they have a direct forward compatibility into rollups or volitions when data shards release. Of course, L1s have this option too, as mentioned above, but it’ll be a much more disruptive change. Also, they are significantly more secure than L1s.

Summing up

  1. The blockchain industry does not yet possess the technology to achieve global scale.
  2. Some projects are offering very low fees, effectively subsidized by speculation on the token. They are a great option for users who are looking for dirt cheap fees, though, as long as you recognize this is not a sustainable model, let alone the severe decentralization and security compromises made.
  3. But even these projects will be forced to increase fees if they get any traction, to be replaced by newer, more centralized L1s. It’s a race to the bottom that’s not sustainable long term.
  4. Currently, sustainable options do exist, like Binance Smart Chain (at least economically) or optimized rollups, which can offer fees in the ~$0.10-$1 range.
  5. Long term, rads are the only solution that can scale to millions of TPS, attaining global scale, while remaining technically and economically sustainable. That they can do this while remaining highly secure, decentralized, permissionless, trustless and credibly neutral is indeed magical. As a wise man once said, “Any sufficiently advanced technology is indistinguishable from magic”. That’s what rollups and data shards are.

Finally, this is not just about Ethereum. Tezos has made the rollup-centric pivot too, and Polygon, and it’s inevitable all L1s either a) become a zkRollup; b) become a security and/or data availability chain for rollups to build on; or c) accept technological obsolescence and rely entirely on marketing, memes, and network effects.

[Addendum: Multi-chain & sharded networks are definitely more sustainable than centralized single-ledger L1s— but they are also monolithic and have much lower scalability potential. Indeed, Ethereum’s future was planned to be a sharded network before (the old 2.0 spec), but by upgrading to a rollup-centric modular architecture with rollups & data shards will scale to 30x-100x more. This is also true of other currently sharded networks —for example, if Polkadot replaced parachains with data sharding, and have rollups build as execution layers — it’ll automatically gain a 100x boost to scalability. More about the progresion here: Convergent evolution: how rollups + data shards became the ultimate solution | by Polynya | Medium. Another exciting innovation is zkL1s like Mina — they are basically zkRs but with monolithic security & DA. Unfortunately, this does mean they are much less secure & decentralized, and have lower scalability limits as the protocol still needs to run full nodes.]

--

--

Polynya

Rants and musings on blockchain tech. All content here in the public domain, please feel free to share/adapt/republish.