TFHE-rs v0.7: Ciphertext Compression, Multi-GPU Support and More
TFHE-rs v0.7 now supports the compression of ciphertexts that encrypt the result of some homomorphic computations. This new feature reduces the size of ciphertexts by up to 1,900x with the provided parameters! Additionally, TFHE-rs v0.7 allows users to leverage multi-GPU architectures, which are widely deployed on servers, to drastically enhance computational performance. As usual, this release introduces a plethora of new features and improvements, as detailed below!
Compressing ciphertexts after homomorphic computation
One of the challenges of FHE implementation is the size of the ciphertexts. By default, the ratio between one bit of cleartext and its encrypted equivalent is around 8,200, meaning that it takes 8,200 bits of data in a ciphertext to represent 1 bit of data in a cleartext. TFHE-rs has supported input compression since v0.2; however, reducing the post-computation sizes of ciphertexts was not possible until now. Starting from this release of v0.7, ciphertexts can now be compressed at any point in the program.
For now, post-computation compression is available only for a few parameter sets, as demonstrated in the code snippet below. A compressed ciphertext list with these parameters can have up to 256 slots, each capable of containing 2 bits of encrypted data. For example, 8 [.c-inline-code]FheUint64[.c-inline-code] values may be optimally stored in one list. Table 1 provides a summary of the ciphertext sizes and compression ratios.
The following example demonstrates how to use TFHE-rs to compress ciphertexts with the newly introduced [.c-inline-code]CompressedCiphertextList[.c-inline-code]. It is heterogeneous, thus allowing the storage of any type of ciphertext together, such as [.c-inline-code]FheUint32[.c-inline-code], [.c-inline-code]FheUint16[.c-inline-code], [.c-inline-code]FheBool[.c-inline-code], etc. In instances where the number of input bits exceeds the maximum threshold, the list is automatically split into multiple ones.
Accelerating homomorphic computations with multiple GPUs
TFHE-rs v0.7 enables the use of multiple GPUs for homomorphic computations for the first time, marking a significant advancement in performance. There is no need to change the code to execute on multiple GPUs. To maintain the API as user-friendly as possible, the configuration is set automatically; the user has no fine-grained control over the selection of GPUs.
However, there are certain limitations: only GPUs with peer access to GPU 0 via NVLink are used for the computations. Depending on the platform, this may limit the number of GPUs that TFHE-rs can effectively harness.
The multi-GPU support, along with some optimizations introduced in this release, brings unprecedented performance for integer operations.
The optimal number of GPUs per operation varies depending on the operation itself and the integer precision specified by the user. Comprehensive arrays of benchmark results for both single and multiple GPUs across all specified precisions are available in the documentation.
Additional features and improvements
TFHE-rs v0.7 also includes some other new features and performance improvements:
- Updated cryptographic parameter sets: Previously, the default failure probability for programmable bootstrapping was less than 2^−40. To reduce the probability of errors over a long run, the new parameter sets now default to 2^−64. The impact on performance is negligible.
- New vector and array operations: TFHE-rs now includes operations on vectors of ciphertexts. For example, it is now possible to compute equality between two vectors of ciphertexts or to check if one vector is contained within another.
- Improved Zero-Knowledge Proofs: Through optimizations and dedicated parameter sets for compact public key encryption, both the commitment size and the proof and verification timings have been reduced. More details and benchmarks are available in the documentation.
- Optimized keyswitch on GPU: The time to keyswitch has been reduced from 5.3 ms to 123 µs for the default parameters, bringing the overall latency of programmable bootstrapping (which includes the aforementioned keyswitch) down to 4.3 ms (compared to 9.5 ms in the previous version of TFHE-rs).
The next release of TFHE-rs will focus on enhancing multi-GPU performance, along with expanding the set of available operations.
Additional links
- Star the TFHE-rs Github repository to endorse our work.
- Review the TFHE-rs documentation.
- Get support on our community channels.
- Participate in the Zama Bounty Program to get rewards in cash!