The Additive NTT

Binary Reed–Solomon codes, and an efficient algorithm to encode them

The most important code we come across in SNARKs is the Reed–Solomon code. This is a simple—and yet extremely powerful and effective—code, and appears not just in Binius but also in DEEP-ALI-based SNARKs like Plonky3 and Stwo.

As we've suggested, error-correcting codes have two desirable properties—distance and rate—which are at odds with each other in general. Indeed, a mathematical result called the Singleton bound formalizes the precise way in which these desiderata are in tension. Reed–Solomon codes meet the Singleton bound, in the sense that the inequality becomes an equality; they thus negotiate the tension between these attributes as favorably as it is possible to do.

Reed–Solomon codes are not just coding-theoretically nice. They also support an extremely rich set of auxiliary protocols, designed to achieve such ends as proximity testing (determining whether a supplied word is close to a codeword or not, by just reading from it at a few places). The crucial protocol here is FRI (short for fast Reed–Solomon interactive oracle proof of proximity), due to Ben-Sasson, Bentov, Horesh and Riabzev. We will use FRI in our evaluation protocol.

Review of Reed–Solomon Codes

Mathematically, each Reed–Solomon code depends on a field $\mathbb{F}$ , a message length $k$ and a block length $n$ . We first need to pick a domain $S \subset \mathbb{F}$ of size $|S| = n$ (as a side effect, we need $| \mathbb{F} | \geq n$ for this to make sense).

Our message space will be the set of univariate polynomials $P(X) = a_0 + a_1 \cdot X + \cdots + a_{k - 1} \cdot X^{k - 1}$ of degree less than $k$ with coefficients in $\mathbb{F}$ . The word space will be the set of all $n$ -tuples of $\mathbb{F}$ -elements $(b_0, \ldots , b_{n - 1})$ (here, we can think of associating these with the elements of $S$ ). The codewords will be those $n$ -tuples which arise as the set of evaluations of some polynomial $P(X)$ as above on $S$ : that is, those of the form $(P(x_0), \ldots , P(x_{n - 1}))$ . Here, $(x_0, \ldots , x_{n - 1})$ is an enumeration of $S$ 's elements.

The Binary Case

In Binius, Reed–Solomon codes also play a central role. On the other hand, since we work with binary fields, things get significantly trickier. The first question that arises has to do with encoding: the actual algorithm which takes as input a message and outputs its corresponding codeword. As should be clear from the above, this is a problem of polynomial evaluation: how do we take the $k$ coefficients of some polynomial, and output the $n$ values that that polynomial takes on some domain? We can trivially do this in $\Theta(k \cdot n)$ steps; the goal is to do better than that.

In the classical, prime-field setting, encoding is very standard: the key algorithm is the NTT (number-theoretic transform), a close relative of the fast Fourier transform. The NTT runs in $\Theta(\log_2(k) \cdot n)$ steps. Here, the set $S \subset \mathbb{F}$ is a multiplicative subgroup $S \subset \mathbb{F}^*$ of power-of-two order $n$ (the field $\mathbb{F}$ needs to be chosen in such a way that this kind of subgroup exists).

In binary fields, primitive multiplicative roots of unity of power-of-two order don't exist, and things get significantly trickier. Indeed, it was not known until 2014—i.e., until the arrival of a key work by Lin, Chung and Han [LCH14]—how one might encode in the binary setting in such a way as to recover the $\Theta(\log_2(k) \cdot n)$ time complexity of the classical algorithm.

Even the Lin–Chung–Han algorithm has a catch. While classical Reed–Solomon encoding algorithms understand their inputs $(a_0, \ldots , a_{k - 1})$ as the coefficients in the standard monomial basis of the polynomial being evaluated, Lin, Chung and Han actually need to introduce a special polynomial basis: the so-called novel polynomial basis. These are specially constructed polynomials $1, X_1(X), \ldots, X_{k - 1}(X)$ unequal to the simple monomials $1, X, \ldots, X^{k - 1}$ . The polynomial we actually end up evaluating is not $a_0 + a_1 \cdot X + \cdots + a_{k - 1} \cdot X^{k - 1}$ , but rather $a_0 + a_1 \cdot X_1(X) + \cdots + a_{k - 1} \cdot X_{k - 1}(X)$ .

Finally, the set $S \subset \mathbb{F}$ we evaluate on is not a multiplicative subgroup, but rather an $\mathbb{F}_2$ -subspace of $\mathbb{F}$ (recall that when the characteristic of $\mathbb{F}$ is 2, we can view it as a vector space over its own subfield $\mathbb{F}_2$ ).

The Algorithm

Here, we sketch in slightly more technically precise terms what Lin, Chung and Han's algorithm [LCH14] achieves. We fix parameters $\ell$ and $\mathcal{R}$ . Ultimately, the code we spin up is going to be Reed–Solomon, with message length $k = 2^\ell$ and block length $n = 2^{\ell + \mathcal{R}}$ ; we will write $\mathsf{RS}_{S, \mathcal{T}_\tau}[2^\ell, 2^{\ell + \mathcal{R}}]$ for this code. We fix a binary tower field $\mathcal{T}_\tau$ , and write $\beta_0, \ldots , \beta_{2^\tau - 1}$ for an $\mathbb{F}_2$ -basis of it. From this $\mathbb{F}_2$ -basis of $\mathcal{T}_\tau$ , [LCH14] tells us how to construct a further basis of polynomials—called the novel polynomial basis—which we will write as $X_0(X), \ldots, X_{2^\ell - 1}(X)$ . Each $X_i(X)$ will be a polynomial over $\mathcal{T}_\tau$ of degree $i$ . In fact, these polynomials will themselves constitute a $\mathcal{T}_\tau$ -basis of the $2^\ell$ -dimensional vector space $\mathcal{T}_\tau[X]^{\prec 2^\ell}$ (i.e., of the set of polynomials over $\mathcal{T}_\tau$ of degree less than $2^\ell$ ).

Finally, we write $S \subset \mathcal{T}_\tau$ for the $\mathbb{F}_2$ -span of the basis prefix $\beta_0, \ldots , \beta_{\ell + \mathcal{R} - 1}$ .

The additive NTT is an algorithm that takes as input a list $(a_0, \ldots , a_{2^\ell - 1})$ and returns a list $(b_0, \ldots , b_{2^{\ell + \mathcal{R}} - 1})$ defined in the following way. We write $P(X) = a_0 + a_1 \cdot X_1(X) + \cdots + X_{2^\ell - 1}(X)$ for the polynomial whose coefficients in the novel basis are the given input $(a_0, \ldots , a_{2^\ell - 1})$ . What we return is the list $(P(x_0), \ldots , P(x_{2^{\ell + \mathcal{R}} - 1}))$ of $P$ 's evaluations on $S$ . Here, $(x_0, \ldots , x_{2^{\ell + \mathcal{R}} - 1})$ is again an enumeration of $S$ . In practice, we will get this by taking the lexicographically ordered basis combinations of $\beta_0, \ldots , \beta_{\ell + \mathcal{R} - 1}$ .

We refrain here from describing how the additive NTT actually works (i.e., how it achieves the above task). In short, Lin, Chung and Han [LCH14] rely on a totally new sort of polynomial decomposition, specific to the novel basis polynomials. In [DP24, § 3], we show that Lin–Chung–Han's NTT is can actually be re-characterized in terms more familiarly Fourier-theoretic.