Review of MPT Task

Proving Ethereum MPT inclusion using M3

Here, we get to the key point in our arithmetization story: our arithmetization of Merkle–Patricia trie inclusion. This arithmetization is highly complicated, and exhibits the full power of our M3 abstraction. Indeed, it shows how M3 can capture not just toy problems, but also highly nontrivial problem of critical real-life importance—and do so highly efficiently! This arithmetization is implemented live in production in Binius.

Merkle–Patricia Tries

We recall that Ethereum represents its global state in a Merkle–Patricia trie, a Merkle-like data structure. Each Merkle–Patricia trie represents a massive amount of underlying data, itself essentially an enormous, flat key–value table. Each key in that table is a 32-byte string, a hash of an Ethereum address. Each value contains its address's basic account state: its balance, its nonce, its code hash and its storage hash.

Thus, a single key–value pair in the global Ethereum state might look like

{
    key: 0xf90211a0b618d09a8558811fc4f8366721068c97347bfeb5ee3d83be61c,
    value: [
        0x05a7,
        0x16883cb527eb925a64,
        0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421,
        0xc5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470
    ]
}

The key above is itself the Keccak-256 hash of the Ethereum address 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045. The value above contains the nonce, balance (in wei), code hash and storage hash of that address.

Taken together, the list of all such key–value pairs gives us Ethereum's global account state. (We are ignoring contract storage state for now.)

The Merkle–Patricia construction "bottles up" this entire global state table into a single hash digest. It moreover does so in a way that conveys many important efficiency guarantees. For example, let's say that I receive the root digest alone—and nothing more—from a trusted source. Let's say moreover that an untrusted source makes various claims to me: that one particular key–value pair (of the above form) is in the table, and that a second key is not in the table at all, let's say. By providing me just a logarithmic amount of data, containing various so-called MPT proofs, this untrusted party can prove to me that his claims are valid.

Moreover, when the underlying global state changes in some way, I can incorporate those changes—and thereby obtain a new state root—using just a logarithmic amount of work in the total size of the global table.

The full definition of the Ethereum MPT is complicated; we refer to the official documentation for more detail on it.

The Problem to Arithmetize

Here, we review in clear terms what we are trying to arithmetize. Essentially, we are trying to arithmetize the Merkle–Patricia inclusion proof mechanism just described above.

The public statement—which both the prover and the verifier will possess—will be:

a state trie root, together with
a plurality of key–value pairs of the form given above, themselves claimed to belong to the trie's underlying state table.

The witness will be:

the respective Merkle proofs associated to these key–value pairs, against the single state root.

The statement–witness pair will be valid if each claimed key–value pair has a corresponding proof which correctly attests to its membership, per the MPT's rules.

Here, we see the full power of succinctness. Though MPT proofs are designed to be relatively efficient, as the number of key–value pairs above grows, the total size of the witness will become rather large in concrete terms (say, in the hundreds of megabytes). On the other hand, our proof will be just hundreds of kilobytes. From the verifier's point of view, it will be much easier to receive and verify a statement and a proof than a statement and a witness.