Error-Correcting Codes

A primer on error-correcting codes, and their utility in SNARKs

Error-correcting linear block codes are typically used across engineering applications ranging from deep-space communication to digital storage media (and many others). It's thus a fascinating fact that they also play a crucial role in modern post-quantum SNARKs, and in particular in Binius.

Classically, error-correcting codes are used to transmit data over a noisy channel, which is liable to flip and corrupt various bits of the message it's transmitting. Instead of just sending the message directly over the channel, the sender opts rather to first encode it—in such a way as to introduce some cleverly devised redundancy. The encoded message—which we call a codeword—is designed to be more resilient to errors than the raw message itself would have been. In particular, if various bits of the codeword get flipped, we want to to nonetheless be able to recover the original message on the far end. This process is called decoding. Decoding is what gets performed by the receiver on the received word—essentially, what's left of the codeword after various of its bits possibly got flipped.

If the error-correcting code is well-devised, and the channel isn't too noisy, then this error-recovery should succeed. Error-correcting codes are an extraordinarily complex and difficult topic. In general, there are countless subtle tradeoffs, which all mitigate against each other. For example, for efficiency reasons, we want the length of the codeword to be as short as possible, relative to the length of the message (this ratio is called the code's rate). Simultaneously, we want the error-correction capacity—i.e., how many flipped bits the thing can recover from—to be as high as possible; this parameter is controlled by the code's distance. Finally, we want the code's encoding algorithm—and in many cases, it's decoding algorithm—to be as computationally efficient as possible.

Codes in SNARKs

In SNARKs, we also find occasion to use codes. This fact is surprising, since we have no noisy channel, no sender and no receiver. Rather—and this is a fascinating aspect of theoretical computer science—we use them in a completely different way.

Indeed, error-correction is possible because of a property called distance, which appeared above. That a code has high distance entails that unequal underlying messages yield codewords which are "far apart from each other". Intuitively, this makes sense: if we imagine as a hypothetical that there were two different messages that yielded very similar codewords—which differed from each other at just a single bit position, let's say—then there is no way that these messages could be reliably recovered by the receiver (after all, it would take just a single bit-flip in transit to confound the two messages). Rather, we want the opposite property: namely, whereby any two messages that differ even in one place yield codewords that differ in many places. This is exactly what it means for the code to have high distance.

Roughly, high distance is what makes codes useful in SNARKs. Imagine that we have a prover who either uses "the right" message or "the wrong" one. If we didn't use codes, then the "wrongness" of his message could come down to a single bit being corrupted in one place—something that's very hard for the verifier to detect. What we can do instead is ask the prover to encode everything under a suitable code. Because codes magnify small differences into large ones, the prover's wrongness in this case becomes guaranteed to manifest itself in many bit positions, as opposed to as few as just one.

The more positions that two bitstrings differ at, the easier it is for the verifier to detect the difference. Having asked that everything get encoded, the verifier can now start checking random positions for equality or not. If the proportion of positions at which a difference manifests (i.e., assuming the strings are different at all) is guaranteed to be bounded from below by some constant—a property known as constant relative distance—then the verifier can very quickly drive down the possibility of failing to notice the difference, by poking around both strings at random positions. In fact, the probability with which the verifier will fail to spot the difference will decay exponentially in the number of checks, or "queries", that the verifier runs.

Decoding and Extraction

Interestingly, there was never a point where we had to decode any codeword. This fact makes its use in our SNARK paradoxical: what is the use of encoding a message that never gets decoded? As it happens, in [DP24], we do ultimately end up using a decoder—but only in the protocol's security analysis (not in the actual course of the protocol itself). Indeed, the decoder gets run by the emulator, an abstract hypothetical machine used only to prove the security of the protocol (it never runs in the real world).