I have long wanted to write a post about this, but have been patiently waiting for some concrete progress. This happened today, with Vitalik’s update with a provisional roadmap with proto-EIPs: A state expiry and statelessness roadmap — HackMD (ethereum.org)
Currently, the single greatest bottleneck to scalability is state bloat. You need a 1 TB SSD to run a full node reliably with some future-proofing, and some have complained that’s already too much. As the network is used, the state will only keep growing. The only recourse to manage state thus far has been to severely limit scalability.
There are two concepts that are being proposed right now to solve this long-standing challenge:
In the weak statelessness scheme, only block proposers and full nodes will need to hold full state. All other nodes, including attesting validators, can run a stateless client and verify blocks without actually storing state. How this works: (oversimplifying) you don’t need to hold the state, but just need a succinct (in the literal sense) proof, called a witness, to verify state.
This concept has been around, but the issue thus far was that even the witnesses were too large. The technology is finally ready to enable very small witnesses: Verkle trees.
Currently, Ethereum’s state is stored as hexary trees. We know the beacon chain has the concept of epochs where blocks are justified and finalized. As per this proto-EIP, Ethereum will have a new time scale: Period. One period will be approximately one year. To enable statelessness, a hardfork will freeze the pre-fork hexary tree (Period 0), while all new data appended or accessed post-fork will live in the new Verkle tree (kicking off Period 1).
Whenever stateless clients are ready post the transition to Verkle trees, all regular users will be able to verify statelessly. As mentioned before, block proposers will still need to hold the full state. This will be controversial, but the important thing here is to have a culture of users verifying, and that is accomplished by weak statelessness. Gas limits can now be increased, and while block proposers will have to upgrade their systems, regular users might actually see a decreased system requirement with stateless clients despite higher scalability.
Of course, strong statelessness, where even blocks can be proposed statelessly will remain a future problem to be solved.
A lot of Ethereum’s state hasn’t been accessed in years. Yet, all nodes are burdened with this data forever. What if you could only hold the relevant, recently accessed data, while archive (again, literally) lapsed data? Proposals around state rent, state expiry and regenesis have suggested similar schemes, but now we have a concrete proto-EIP. For the longest time, state expiry and statelessness were considered competing solutions to the same problem, but amazingly, now they are both being implemented together!
Each period, state from two periods (~years) ago is frozen and archived. In state expiry, full nodes and block proposers will only need to store state from two periods — the current and the previous. Users looking to verify transactions will continue to verify statelessly.
The state expiry hard fork happens at the beginning of Period 1, or roughly 1 year after moving to Verkle trees and enabling weak stateleness. Here, the pre-statelessness Period 0 hex tree will be replace with a Verkle tree. I’d expect from Period 3, the Period 1 state will expire, and so on.
Overall, both combined, state management will be effectively be solved, and we can start increasing gas limits without worrying about long-term state bloat. However, expect this increase to be moderate, around 3x is suggested by Vitalik. I believe this to be a very conservative estimate, particularly with SSDs continuing to become more affordable. Now, we have mainstream $400-$500 game consoles (well, when they’re available) shipping with extremely fast 1 TB NVMe SSDs. By the time statelessness + state expiry ship, I fully expect budget laptops to feature 1 TB and above SSDs. Further, clients like Erigon are putting in a lot of work to optimize this. So, I’d optimistically expect a 5x-10x increase in scalability instead while still reducing system requirements compared to now. Of course, as SSDs becomes more affordable over time, we can scale linearly now that we have predictable state management.
Users can revive expired data by providing a witness proof and paying gas to have the corresponding data reappended to the active tree. What about expired state? There can obviously be archive nodes (clarification: not to be confused with archival nodes, I meant archive in the literal sense here — basically like a full node is currently. But I can’t call it a full node because under the state expiry scheme the new full nodes will only be 2 periods.) which can continue to contain the full state. This will almost certainly be very, very expensive, so we’ll need some sort of infrastructure for expired state. I believe Solana is exploring using Arweave for similar state rent schemes, though I wasn’t able to find any details. IPFS, BitTorrent, Filecoin and others are all options.
Rollups and sharding
State management is crucial for rollups, because they are designed around having immense state bloat. Because the entire rollup state can be reconstructed from L1, they can be even more innovative and flexible with how they approach this. Things like regenesis are easily done. Rollups can be the perfect way to battle-test these new schemes being proposed. I’d expect Optimistic Ethereum, for example, to transition to Verkle trees and state expiry well before Ethereum mainnet does with much shorter periods given it already has an archival mechanism on L1. Of course, rollups will benefit directly from whatever scalability upgrades state management brings to L1 as well.
Statelessness + state expiry will directly multiply sharding execution, as each shard will also feature these scalability improvements. Though, given the scalability improvements statelessness + state expiry bring to the single execution chain, and the scalability rollups + data sharding offer, would we even need executable shards in the foreseeable future? Seems doubtful to me. I wasn’t able to find any concrete information on how the current proposals will directly affect data shards, but I’d expect many of these concepts to be adapted for it in the future.
Proof-of-stake will solve sustainability. Rollups + data sharding will solve scalability. State size management was the last remaining challenge, and it’s being met head on with some real, concrete proposals now. One could argue that privacy and VM innovations are a further pending challenge: but I’d expect rollups to better better address these. Indeed, we’re seeing this with Aztec being privacy-focused, while zkSync 2.0 introduces LLVM and StarkWare has built a quantum-resistant StarkNet OS. Of course, learnings from these rollups can be adopted on L1 if desired.