Gadget Design

Designing a lookup gadget

On this page we design a lookup gadget, introducing in the process the concepts of "multiplicities" and "components". A central sub-gadget will be get_components. We'll use in gadget in a blackbox fashion on this page, and construct it in the next page.

Our lookup gadget will take the signature

fn plain_lookup<U, F, FS, const LOG_MAX_MULTIPLICITY: usize>(
	builder: &mut ConstraintSystemBuilder<U, F>,
	table: OracleId,
	table_count: usize,
	balancer_value: FS,
	lookup_values: OracleId,
	lookup_values_count: usize,
) -> Result<Boundary<F>, anyhow::Error>
where
	U: PackScalar<F> + PackScalar<FS> + PackScalar<BinaryField1b> + Pod,
	F: TowerField + ExtensionField<FS>,
	FS: TowerField + Pod,

where we call this a "plain" lookup to distinguish it from Lasso lookups, explained here and demonstrated in many forms here. While the mechanics of "timestamps" like in Lasso are not needed here, other concepts will need explanation. Looking at this signature we have

table: an oracle over FS holding the table of valid lookup values.
table_count: only the first table_count values of table are considered valid lookup values.
balancer_value: any valid table value, needed for balancing the channel.
lookup_values: an oracle over FS holding the values to be looked up.
lookup_values_count: only the first lookup_values_count values in lookup_values will be looked up.
LOG_MAX_MULTIPLICITY: no value in lookup_values can be looked up 1 << LOG_MAX_MULTIPLICITY times or more.

The last point regarding LOG_MAX_MULTIPLICITY is an inconvenience. Either the prover and verifier must agree on a strict upper bound for the number of times any table value will be looked up, or the prover can supply the verifier with non-deterministic advice for the upper bound, which in practice will be much smaller than a worst-case upper bound.

Another limitation is that we've restricted the table and lookup values to be over subfield FS. But many cases can be adapted for this. For example, when we apply this lookup for u8 multiplication in our example application, we will pack the two inputs and the output into one BinaryField32b element and thus FS as BinaryField32b suffices.

First thing we'll do in this gadget is a sanity check that the size of the table is no smaller than table_count.

let n_vars = builder.log_rows([table])?;
debug_assert!(table_count <= 1 << n_vars);

Next we'll make a channel.

let channel = builder.add_channel();

The idea behind this lookup scheme is that the prover can push any values it wishes into this channel. Those values will come from the oracle lookup_values. The verifier will allow only valid lookup values to be pulled out of the channel. Those values will come from the oracle table. Soundness, the property that the prover cannot cheat in this simple scheme, is evident. Completeness, the property that an honest prover can be successful, can be achieved if the prover tells the verifier how many of each table value should be pulled from the channel. We call these numbers "multiplicities", and an honest prover would choose multiplicity k for a table value if the prover has exactly k instances of that value in lookup_values.

But before we work with multiplicities, we can already push into the channel the first lookup_values_count values of lookup_values.

builder.send(channel, lookup_values_count, [lookup_values]);

What remains is to pull table values from the channel, pulling the right number of each table value such that the channel balances.

Multiplicities

We create an Option<Vec<usize>> object to hold the multiplicities, with the internal Vec of size table_count such that there is one multiplicity for each valid table value. The reason we wrap the multiplicities in an Option is that both the verifier and prover will call plain_lookup, but only the prover will know the multiplicities. Thus we declare multiplicities as None, but then have a prover-only block following in which we read the table values and the lookup_values, counting the multiplicities, and updating multiplicities to a Some variant.

let mut multiplicities = None;
// have prover compute and fill the multiplicities
if let Some(witness) = builder.witness() {
    let table_slice = witness.get::<FS>(table)?.as_slice::<FS>();
    let values_slice = witness.get::<FS>(lookup_values)?.as_slice::<FS>();
 
    multiplicities = Some(count_multiplicities(
        &table_slice[0..table_count],
        &values_slice[0..lookup_values_count],
        true,
    )?);
}

We write the function count_multiplicities, which doesn't depend on an Binius API, as

fn count_multiplicities<T: Eq + std::hash::Hash + Clone + std::fmt::Debug>(
	table: &[T],
	values: &[T],
	check_inclusion: bool,
) -> Result<Vec<usize>, anyhow::Error> {
	use std::collections::{HashMap, HashSet};
 
	if check_inclusion {
		let table_set: HashSet<_> = table.iter().cloned().collect();
		if let Some(invalid_value) = values.iter().find(|value| !table_set.contains(value)) {
			return Err(anyhow::anyhow!("value {:?} not in table", invalid_value));
		}
	}
 
	let counts: HashMap<_, usize> = values.iter().fold(HashMap::new(), |mut acc, value| {
		*acc.entry(value).or_insert(0) += 1;
		acc
	});
 
	let multiplicities = table
		.iter()
		.map(|item| counts.get(item).copied().unwrap_or(0))
		.collect();
 
	Ok(multiplicities)
}

Components

Now that we've computed the multiplicities, we must use them to pull from the channel exactly the right multiplicity of each table value.

Without yet explaining why this works, to make this happen we will bit decompose the multiplicities into columns we'll call bits, create related columns we'll call components, and then flush these components (pulling them from the channel). When we flush we will leverage flush_with_multiplicity so that instead of flushing all values in a component once, we flush all values in a component an appropriate number of times for the channel to balance.

After sanity checking each multiplicity to be below 1 << LOG_MAX_MULTIPLICITY, we bit-decompose each multiplicity into LOG_MAX_MULTIPLICITY bits. We will create an array of LOG_MAX_MULTIPLICITY oracles over BinaryField1b called bits that represent these bits. We will also create an array of LOG_MAX_MULTIPLICITY oracles over FS called components that represent the bits oracles determining whether a component value is the table value (if the bit is one) or the balancer_value (if the bit is zero). In pseudocode, for component column j and bit column j, at index i in the range [0..table_count] we will have

components_j[i] := bits_j[i] * table[i] + (1 - bits_j[i]) * balancer_value

Given the correctly formed component columns, we'll pull the i^th component oracle from the channel with multiplicity 1 << i.

The idea is that by flushing the i^th component with multiplicity 1 << i we will flush a table value 1 << i times if the i^th bit of the multiplicity for that table value is 1, and otherwise flush the table value 0 times. Aggregating the flushes across all components we will end up with the flush count of each table value matching the corresponding multiplicity.

Our channel balancing strategy isn't finalized, because when a bit is zero we pull the balancer_value from the channel rather than a looked up value. To counter this, we will need to push enough balancer_values into the channel, but we discuss that later in channel balancing. For now assume there is a gadget get_components that creates the bits oracles and the components oracles, checks the components oracles against the table oracle and the bits oracles, and then returns the components oracles. We'll invoke this gadget as

let components: [OracleId; LOG_MAX_MULTIPLICITY] =
    get_components::<_, _, FS, LOG_MAX_MULTIPLICITY>(
        builder,
        table,
        table_count,
        multiplicities,
    )?;

Then as said we'll flush the i^th component with multiplicity 1 << i.

components
    .into_iter()
    .enumerate()
    .for_each(|(i, component)| {
        builder.flush_with_multiplicity(
            FlushDirection::Pull,
            channel,
            table_count,
            [component],
            1 << i,
        );
    });

Note that since we pushed the lookup values into the channel, we now pull the table values from the channel.

Channel Balancing

We don't yet hold the Boundary value we promised to return from the plain_lookup gadget. Even if we did, we wouldn't be done yet. The reason is even after having both pushed to the channel and pulled from the channel, the channel isn't balanced. We discuss how flushing a boundary value (made from balancer_value) to the channel with a certain multiplicity rectifies the situation, balancing the channel for both completeness and soundness.

Each value in lookup_values got pushed into the channel exactly once. There are lookup_values_count such values. If the channel is to balance, we must pull exactly that many values from the channel. If we count how many values got pulled from the channel, we'll see the numbers do not match.

For each index of the i^th component, we pulled 1 << i things from the channel. If the corresponding bit was set, we pulled a looked up value with multiplicity 1 << i. If the corresponding bit was not set, we pulled balancer_value with multiplicity 1 << i. If we consider an index across all LOG_MAX_MULTIPLICITY components, we can conclude that a total of

(0..LOG_MAX_MULTIPLICITY).map(|i| 1 << i).sum()

things were pulled from the channel at that index, some of them the looked up value for that index, and some of them balancer_value. As a geometric series this expression equals

(1 << LOG_MAX_MULTIPLICITY) - 1

Therefore, the total number of things pulled from the channel by flushing the components is

((1 << LOG_MAX_MULTIPLICITY) - 1) * table_count

Since we pushed exactly lookup_values_count lookup values into the channel, this means we pulled an superfluous

((1 << LOG_MAX_MULTIPLICITY) - 1) * table_count - lookup_values_count

balancer_values out of channel. We assign this expression the name balancer_value_multiplicity. Naturally then, we must push this many balancer_value values into the channel. We do so with the boundary value

let boundary = Boundary {
    values: vec![balancer_value.into()],
    channel_id: channel,
    direction: FlushDirection::Push,
    multiplicity: balancer_value_multiplicity,
};

which we then return from the gadget as

Ok(boundary)

Assuming the boundary value is fed to the verifier, our channel now balances. Therefore we have taken care of completeness, but what about soundness? Since the balancer_value is a valid table value by assumption, the verifier is still only allowing valid table values to be pulled from the channel. Therefore if the channel balances, it means the prover pushed only valid table values into the channel.