# Pedersen Hash¶

## Scope¶

The 4-bit window Pedersen hash function is a secure hash function which maps a sequence of bits to a compressed point on an elliptic curve (Libert, Mouhartem, and Stehlé, n.d.).

This proposal aims to standardize this hash function for use primarily within the arithmetic circuits of zero knowledge proofs, together with other generic uses such as for Merkle tree or any use cases requiring a secure hash function.

As part of the standard, the paper details the elliptic curve used for the hash function, the process to compute the Pedersen hash from a given sequence of bits, and the computation of the hash from a sequences of bits using an arithmetic circuit—which can be used within zero knowledge proofs.

Moreover the paper includes references to open-source implementations of the Pedersen hash function which follows the computation process details in this proposal.

## Motivation¶

The primary advantage of this Pedersen hash function is its efficiency. The ability to compute the hash efficiently makes it an attractive proposal for use within the circuits associated with zk-SNARK proofs (“ZCash Open Discussion: Choose Improved Hash Function for Merkle Tree (or Replace Merkle Tree),” n.d.).

Having a standard, secure, and efficient hash function is one of the paramount aspect for implementing usable, comprehensible, and easily verifiable zero knowledge proofs.

## Background¶

The Pedersen hash has already been defined and used by the ZCash team in Sapling, their latest network upgrade (Hopwood et al., n.d.). They construct it on the Jubjub elliptic curve and using 3-bit lookup tables. In this document, we propose a different implementation of the Pedersen hash function using Baby-Jubjub elliptic curve and 4-bit windows, which requires less constraints per bit than using 3-bit windows.

## Terminology¶

### Elliptic Curve: Baby-Jubjub¶

Consider the prime number

\begin{align}\begin{aligned} p = 21888242871839275222246405745257275088548364 400416034343698204186575808495617\\and let :math:\ensuremath{\mathbb{F}_p} be the finite field with\end{aligned}\end{align}

$$p$$ elements.

We define $$E_M$$ as the Baby-Jubjub Montgomery elliptic curve defined over $$\ensuremath{\mathbb{F}_p}$$ given by equation

\begin{align}\begin{aligned}E: v^2 = u^3 + 168698u^2 + u.\\The order of :math:E_M is :math:n = 8\times r, where\end{aligned}\end{align}
\begin{align}\begin{aligned} r = 2736030358979909402780800718157159386076813972 158567259200215660948447373041\\is a prime number. Denote by :math:\ensuremath{\mathbb{G}} the\end{aligned}\end{align}

subgroup of points of order $$r$$, that is,

$\ensuremath{\mathbb{G}}= \Set{ P \in E(\ensuremath{\mathbb{F}_p}) | r P = O }.$
$$E_M$$ is birationally equivalent to the Edwards elliptic curve
\begin{align}\begin{aligned}E: x^2 + y^2 = 1 + d x^2 y^2\\where\end{aligned}\end{align}

$$d = 9706598848417545097372247223557719406784115219466060233080913168975159366771.$$

The birational equivalence (Bernstein et al., n.d. Thm. 3.2) from $$E$$ to $$E_M$$ is the map
\begin{align}\begin{aligned}(x,y) \to (u,v) = \left( \frac{1 + y}{1 - y} , \frac{1 + y}{(1 - y)x} \right)\\with inverse from :math:E_M to :math:E\end{aligned}\end{align}
$(u, v) \to (x, y) = \left( \frac{u}{v}, \frac{u - 1}{u + 1} \right).$

### Pedersen Hash¶

Let $$M$$ be a sequence of bits. The Pedersen hash function of $$M$$ is defined as follows:

• Let $$P_0,P_1,\dots,P_k$$ be uniformly sampled generators of $$\ensuremath{\mathbb{G}}$$ (for some specified integer $$k$$).

• Split $$M$$ into sequences of at most 200 bits and each of those into chunks of 4 bits . More precisely, write

\begin{align}\begin{aligned}\begin{split} \begin{gathered} M = M_0M_1\dots M_l \quad\text{where}\quad M_i = m_0m_1\dots m_{k_i} \quad\text{with}\quad \begin{cases} k_i = 49 \;\text{ for } i = 0, \dots, l-1, \\ k_i \leq 49 \;\text{ for } i = l, \end{cases} \end{gathered}\end{split}\\where the :math:m_j terms are chunks of 4 bits\end{aligned}\end{align}

$$[b_0\: b_1\: b_2\: b_3]$$. Define

\begin{align}\begin{aligned} enc(m_j) = (2b_3-1) \cdot (1+b_{0}+2b_{1}+4b_{2})\\and let\end{aligned}\end{align}
\begin{align}\begin{aligned}\langle M_i \rangle = \sum_{j=0}^{k_i-1} enc(m_j) \cdot 2^{5j}.\\We define the Pedersen hash of :math:M as\end{aligned}\end{align}
\begin{align}\begin{aligned} \label{eq-ped} H(M) = \langle M_0 \rangle \cdot P_0 + \langle M_1 \rangle \cdot P_1 + \langle M_2 \rangle \cdot P_2 + \dots + \langle M_l \rangle \cdot P_l.\\Note that the expression above is a linear combination of elements\end{aligned}\end{align}

of $$\ensuremath{\mathbb{G}}$$, so itself is also an element of $$\ensuremath{\mathbb{G}}$$. That is, the resulting Pedersen hash $$H(M)$$ is a point of the elliptic curve $$E$$ of order $$r$$.

## Description¶

### Set Of Generators¶

We generate the points $$P_0,\dots,P_{{k}}$$ in such a manner that it is difficult to find a connection between any of these two points. More precisely, we take D = "string\_seed" followed by a byte S holding that smallest number that H = Keccak256(D || S) results in a point in the elliptic curve $$E$$.

### Computation Of The Pedersen Hash¶

In the following circuit pedersen hash, we have depicted the circuit used to compute the Pedersen hash of a message $$M$$ described in equation [eq-ped]. Each multiplication box returns a term of the sum.  As the set of generators are fixed, we can precompute its multiples and use 4-bit lookup windows to select the right points. This is done as shown in the circuit called selector. This circuit receives 4-bit chunk input and returns a point. The first three bits are used to select the right multiple of the point and last bit decides the sign of the point. The sign determines if the $$x$$-coordinate should be taken positive or negative, as with Edwards curves, negating a point corresponds to the negation of its first coordinate. [sec-computation]

Work In Progress

## Challenges¶

One of the main challenges to create this standard and to see it adopted by the community is to provide correct, usable, and well-maintained implementations in as many languages as possible.

Some effort is also required to audit and verify code coming from the community and claiming to implement the 4-bit window Pedersen hash function to prevent the propagation of potentially insecure implementations.

Finally, the proposal as it stands today includes the padding of the message $$M$$ to a multiple of four bits. There are potentials issues with this approach where collisions can happen.

## Security¶

### Overflow Prevention¶

As we described in section [sec-computation], we use a windowed scalar multiplication algorithm with signed digits. Each 4-bit message chunk corresponds to a window called selector and each chunk is encoded as an integer from the set $$\{-8..8\}\backslash \{0\}$$. This allows a more efficient lookup of the window entry for each chunk than if the set $$\{1..16\}$$ had been used, because a point can be conditionally negated using only a single constraint (Hopwood et al., n.d.).
As there are up to 50 segments per each generator $$P_i$$, the largest multiple of the generator $$P_i$$ is $$n\cdot P_i$$ with
\begin{align}\begin{aligned}n = 2^0 \times8 + 2^5 \times 8 + \left(2^5\right)^2 \times8 \dots + 2^{245}\times 8 .\\To avoid overflow, this number should be smaller than\end{aligned}\end{align}

$$(r-1)/2$$. Indeed,

\begin{align}\begin{aligned}\begin{split} \begin{aligned} \quad\; n & = 8 \times \sum_{ k = 0}^{49} 2^{5k} = 8 \times \frac{2^{250}-1}{2^5-1}\\ & = 466903585634339497675689455680193176827701551071131306610716064548036813064%\\\end{aligned}\end{split}\\and\end{aligned}\end{align}
\begin{split}\begin{aligned} \frac{r-1}{2} &= 1368015179489954701390400359078579693038406986079283629600107830474223686520 \\ & > n.\\ \vspace{0.4cm}\end{aligned}\end{split}

## Implementation¶

### A Note On Efficency: Number Of Constraints Per Bit¶

When using 3-bit and 4-bit windows, we have 1 constraint for the sign and 3 for the sum (as we are using the Montgomery form of the curve, that requires only 3). Now let’s look at the constraints required for the multiplexers.
With 3-bit windows we need only one constraint per multiplexer, so 2 constraints in total.
Standard 4-bit windows require two constraints: one for the output and another to compute $$s_0*s_1$$. So, a priori we would need 4 constraints, two per multiplexer. But we can reduce it to 3 as the computation of $$s_0*s_1$$ is the same in both multiplexers, so this constraint can be reused. This way only 3 constraints are required.
So, the amount of constraints per bit are:
• 3-lookup window : $$(1+3+2)/3 = 2$$ constraints per bit.
• 4-lookup window : $$(1 +3+3)/4 = 1.75$$ constraints per bit.

The specific constraints can be determined as follows: let the multiplexers of coordinates $$x$$ and $$y$$ be represented by the following look up tables:

$$s_2$$ $$s_1$$ $$s_0$$ $$out$$
0 0 0 $$a_0$$
0 0 1 $$a_1$$
0 1 0 $$a_2$$
0 1 1 $$a_3$$
1 0 0 $$a_4$$
1 0 1 $$a_5$$
1 1 0 $$a_6$$
1 1 1 $$a_7$$
$$s_2$$ $$s_1$$ $$s_0$$ $$out$$
0 0 0 $$b_0$$
0 0 1 $$b_1$$
0 1 0 $$b_2$$
0 1 1 $$b_3$$
1 0 0 $$b_4$$
1 0 1 $$b_5$$
1 1 0 $$b_6$$
1 1 1 $$b_7$$

We can express them with the following 3 constraints:

• $$aux = s_0 s_1$$

• $$out = [ (a_7-a_6-a_5+a_4-a_3+a_2+a_1-a_0)*aux + (a_6-a_4-a_2+a_0)*s_1$$
$$\text{\qquad\;\;} + (a_5-a_4-a_1+a_0)*s_0 + (a_4 - a_0) ] z + (a_3-a_2-a_1+a_0)*aux + (a_2-a_0)*s_1$$
$$\text{\qquad\;\;} + (a_1-a_0)*s_0+ a_0$$
• $$out = [ (b_7-b_6-b_5+b_4-b_3+b_2+b_1-b_0)*aux + (b_6-b_4-b_2+b_0)*s_1$$
$$\text{\qquad\;\;} + (b_5-b_4-b_1+b_0)*s_0 + (b_4 - b_0)] z + (b_3-b_2-b_1+b_0)*aux + (b_2-b_0)*s_1 \\ \text{\qquad\;\:} + (b_1-b_0)*s_0+ b_0$$

### Existing Implementations¶

Implementation of the specifications and arithmetic of the Baby-Jubjub curve:

Implementation of the Pedersen Hash function:

## Intellectual Property¶

  If $$M$$ is not a multiple of 4, pad $$M$$ to a multiple of 4 bits by appending zero bits.