Parameter Optimization & Larger Precision for (T)FHE
This blog post is an overview of two contributions aiming to make Fully Homomorphic Encryption (FHE) more efficient. The following also appears in the scientific paper BBB+23, which has been published in the Journal of Cryptology.
The first idea relies on the fact that every FHE scheme depends on parameters that together determine the efficiency of homomorphic computations. However, these parameters also have a direct effect on crucial properties such as the precision of the message, the security level and the correctness of the encryption. To minimize the computational cost of a circuit under fixed constraints, BBB+23 proposes an optimization framework for the TFHE scheme (see our TFHE Deep Dive series).
The second contribution is a new way of computing on higher precision messages with TFHE. As a reminder, the programmable bootstrapping (PBS) of TFHE has a computational cost that grows exponentially with the number of bits of message precision. Plus, a padding bit (the most significant bit of the plaintext) needs to be fixed before the PBS can evaluate non-negacyclic functions. This padding bit represents a loss in message precision and, coupled with the computational cost growth, implies that the PBS remains efficient only up to 8 bits of precision. The new Without Bit of Padding PBS (WoP-PBS) introduced in BBB+23 does not need this padding bit and is faster than the PBS used to bootstrap messages of higher precisions.
Parameters optimization.
The TFHE scheme is based on the use of different kinds of cipher-texts: LWE, RLWE, GLWE, GGSW (generalization of GSW). These ciphertexts depend on distinct parameters: for LWE ciphertexts a dimension $n$ is required, for GLWE ciphertexts a polynomial size $N$ and a dimension $k$ are required. In BBB+23, these parameters are called macro-parameters. Other parameters that are only used inside FHE operators are called micro-parameters. As an example, a PBS requires a decomposition base $\beta$ and a number of decomposition levels $\ell$.
Any FHE operator is associated with a noise model and a cost model. A noise model is a formula used to emulate the noise evolution across an FHE operator. A cost model is a substitute for the metric you want to minimize. This could be the execution time, the power consumption, or the price. For each operation, micro and macro parameters have an impact on the cost and on the noise growth, so they need to be carefully chosen.
To select parameters for a circuit, BBB+23 introduces the definition of an Atomic Pattern $(AP)$ type $\mathcal{A}^{(\cdot)}$, which corresponds to a sub-graph of FHE operators that outputs one or several ciphertexts. When $A \in \mathcal{A}^{(\cdot)}$ is instantiated with a parameter set $x$, we write $A(x)$. From $A(x)$, you can estimate the amount of noise and cost at any edge of its FHE sub-graph. The best set of parameters for a given circuit, security level and message precision is then the set that minimizes the cost of the AP instances of the circuit while preserving correctness. The security level of a parameters set $x$ would be assessed beforehand thanks to an oracle that is built from the lattice estimator.
We further define three operators:
• The homomorphic dot product (DP) is a product between a vector of ciphertexts and a vector of cleartexts;
• The key switch (KS) allows switching to a ciphertext encrypted with a different key;
• The PBS allows bootstrapping a ciphertext while evaluating a lookup table on the message.
The next figure describes the atomic pattern of type $\mathcal{A}^{ \left( \text{CJP}21 \right) }$ from a sequence of operations introduced in CJP21. This atomic pattern is composed of a DP (represented with the symbol $\sum$), followed by a KS and a PBS.
Observe that $N$ is a power of two such that $N \in \{2^8,..., 2^{17}\}$. For the other parameters we have $n_{small} \in [[256,..., 2048]]$, $k \in [[1, 10]]$, $(\beta_{KS}, \mathcal{l}_{KS}) \in [[1, 64]]^2$ and $(\beta_{BR}, \mathcal{l}_{BR}) \in [[1, 64]]^2$. By multiplying the number of possible options together we get around $2^{40}$ possible parameters sets.
To determine the optimal set, we compare the instances $\mathcal{A}^{ \left( \text{CJP}21 \right) }$, where $x$ is a parameter set, and select the set that corresponds to the instance with the best performance in terms of computational cost under the same precision, security and correctness constraints.
This methodology allows you to find the best parameters for any atomic pattern type and consequently speed up homomorphic computations. However, it also permits selecting how to compute a homomorphic circuit.
Indeed, this new framework can be used, not only to choose the best parameters in terms of computational cost, but also to compare the efficiency of operators that output the same result.
New WoP-PBS.
New WoP-PBS. The TFHE scheme encrypts only up to 11 bits of message per ciphertext for a reasonable computational cost. It is possible to divide a bigger message into several chunks and encrypt each chunk into a ciphertext. However, the PBS cannot evaluate a lookup table with multiple ciphertexts as input. To do so, we can use the tree-PBS introduced in GBA21.
Now we define the atomic pattern of type $\mathcal{A}^{ \left( \text{GBA}21 \right) }$, this atomic pattern is composed of a dot product and a tree-PBS.
As $\mathcal{A}^{ \left( \text{CJP}21 \right) }$ and $\mathcal{A}^{ \left( \text{GBA}21 \right) }$ allow computing the same operation, we can now plot the cost (logarithm in base 2 of the execution time on the Y axis) of optimized instances of atomic patterns of type $\mathcal{A}^{ \left( \text{CJP}21 \right) }$ and of atomic patterns of type $\mathcal{A}^{ \left( \text{GBA}21 \right) }$ with 2 and 3 blocks (meaning 2 and 3 ciphertexts each containing a chunk of message) depending on the precision of the message (X axis). Note that the precision variable does not include the padding bit that is necessary for the PBS/tree-PBS. Therefore a plaintext with $u$ bits of message contains $u+1$ bits, the last one being the bit of padding.
All the experiments were done with the same constraints on correctness and security.
You can see that we cannot find any parameters to compute an atomic pattern of type $\mathcal{A}^{ \left( \text{CJP}21 \right) }$ for more than $11$ bits of precision. The rapid growth in cost makes $\mathcal{A}^{ \left( \text{CJP}21 \right) }$ and $\mathcal{A}^{ \left( \text{GBA}21 \right) }$ inefficient to compute on large precision messages.
To tackle this issue, BBB+23 proposes a new WoP-PBS which inputs several ciphertexts and permits the evaluation of any multivariate lookup table.
The idea behind this technique, represented above, can be decomposed in three steps:
• Bit Extract: we extract each bit of each message into single LWE ciphertexts by using PBS.
• Circuit Bootstrapping: a technique from CGGI20 to turn all the extracted LWE cipher-texts into GGSW ciphertexts.
• Vertical packing: a technique from CGGI20 to evaluate one or several lookup tables to obtain the new ciphertexts.
To compare this new technique with the previous techniques, we define an atomic pattern type $\mathcal{A}^{ \left( \text{this work} \right) }$ composed of a DP and the WoP-PBS.
As the WoP-PBS can be used on several ciphertexts containing chunks of the message, three variants are presented: $1$, $2$ and $4$ blocks.
We plot the cost (logarithm in base 2 of the execution time on the Y axis) of optimized instances of atomic patterns of type $\mathcal{A}^{ \left( \text{CJP}21 \right) }$, $\mathcal{A}^{ \left( \text{GBA}21 \right) }$ with $2$ and $3$ blocks and $\mathcal{A}^{ \left( \text{this work} \right) }$ with $1$, $2$ and $4$ blocks depending on the precision (X axis).
The same constraints on correctness and security as in the previous experiment were used, and the bit of padding is not included in the precision of the message for the PBS/tree-PBS.
The left-most circle shows that at $5$ bits the tree-PBS is faster than the standard PBS and the WoP-PBS.
However, the WoP-PBS offers a better scalability: over $10$ bits, the WoP-PBS with $2$ and $4$ blocks becomes faster than the tree-PBS, and the difference in cost grows with added precision.
At the $21$ bits mark circled on the right, you can see that the Wop-PBS is more than $2^{10}$ times faster than the tree-PBS.
Moreover, the WoP-PBS with $2$ and $4$ blocks allows computing on precisions that were not reachable with the tree-PBS.
This shows that the WoP-PBS is currently the most efficient way of evaluating lookup tables for messages greater than $10$ bits.
Want to know more?
For more details on this work, take a look at the paper or watch the talk given at the FHE.org 2023 conference (Video here, and slides here).