TFHE-rs v1.3: Faster Division on CPU, Key Upgrader & Memory Tracking on GPU
TFHE-rs v1.3 brings several major improvements and new features across CPU, GPU, and HPU backends, enhancing performance, usability, and security for Fully Homomorphic Encryption(FHE) workloads.
Highlights
- GPU performance boost: Integer logarithm is now 4× faster, and several other operations are up to 20% faster.
- GPU memory tracking: Utility functions help you predict memory usage and avoid out-of-memory errors.
- CPU division speed-up: 36% faster division.
- Zero-Knowledge Proof v2: New hashing mode improves verification speed by 34–38% with a slight proof time increase.
- Noise squashing compression: Compress large ciphertexts generated in MPC.
- Key upgrader: Update old ciphertexts to new cryptographic parameters for performance and security
- HPU security alignment: Default cryptographic parameters now match CPU and GPU standards, reducing error probability to < 2-128 with minimal performance impact.
- HPU backend improvements: More supported operations and faster FPGA reloading.
GPU: memory tracking and performance improvements
Memory tracking
GPU memory is used to store input and output ciphertexts, the server key, and temporary buffers needed during the FHE operations. Those buffers’ size depend on the operations, for example, a 64-bit addition requires relatively little memory, while a 128-bit multiplication consumes significantly more.
Before v1.3, when using the high-level API of TFHE-rs, developers couldn’t know how much GPU memory an operation would require before executing it. This made it challenging to schedule large computations or parallel workloads without risking out-of-memory errors.
TFHE-rs v1.3 introduces utility functions that return the amount of GPU memory each operation requires, and allow users to check whether the GPU has enough memory available to execute it.
Here’s a minimal example showing how you can track GPU memory usage:
In this example, you first calculate the GPU memory needed to store the ciphertexts:
Then you calculate the memory required for the addition itself:
Finally, you check whether the GPU has enough memory available to perform the addition safely:
These new utility functions help to avoid out-of-memory errors during execution.
Improved performance
TFHE-rs v1.3 brings significant GPU performance improvement in particular:
- Multiplying or dividing a ciphertext by a clear value is now ~20% faster.
- The integer logarithm operation is up to 4× faster compared to the previous version.
As TFHE-rs is the underlying library of the Zama Confidential Blockchain Protocol, to illustrate real-world performance, consider an ERC20 transfer that requires executing the following sequence of operations:
On 8xH100 GPUs, the latency and throughput for this ERC20 transfer are as below:

TFHE-rs guarantees the probability of failure for FHE operations below 2-128, ensuring computational correctness and security in production scenarios.
New features
From TFHE-rs v1.3 on, users can call the function [.c-inline-code]expand_and_verify[.c-inline-code] on [.c-inline-code]ProvenCompactCiphertextList[.c-inline-code] and their non-proven counterparts. In this workflow, the Zero-Knowledge proof verification is executed on the CPU, and the expansion, a preprocessing step necessary for FHE computation on this type of ciphertexts, runs on the GPU.
Additionally, the noise squashing used in threshold decryption protocol can now be executed on GPU for better performance.
CPU: faster division and ZK proof enhancements
TFHE-rs v1.3 brings several improvements to CPU-side performance and cryptographic capabilities.
The division algorithm has been improved, reducing runtime from 8.6 seconds to 5.5 seconds when working with [.c-inline-code]FheUint64[.c-inline-code] inputs. Zero-Knowledge Proofs also see important enhancements. A new hashing scheme in ZK v2 trades a small increase in proof generation time for a 34–38% faster verification process, significantly improving proof efficiency.
In MPC settings, Noise Squashing primitives—first introduced in TFHE-rs 1.1.0—enable noise flooding to protect against leakage. However, these operations can produce large ciphertexts. To address this, v1.3 adds compression primitives that reduce the size of noise-squashed ciphertexts, making storage and transfer more efficient. Refer to the documentation for more details.
To simplify working with evolving cryptographic parameters, this release introduces a key upgrader mechanism. It allows ciphertexts encrypted with older keys or parameter sets to be updated to newer configurations. This process requires keys and ciphertexts to be correctly tagged so the upgrade path can be identified and applied with the provided upgrade keys.
Additionally, TFHE-rs v1.3 introduces a new variant of the modulus switch technique for the binary key distribution. Unlike the previous approaches, this variant allows to achieve the same 2-128 probability of failure, without the extra key material.
Finally, TFHE-rs v1.3 now supports parameter sets with smaller moduli for the keyswitch, which can improve performance when a large modulus (like 264) is not required for keyswitch correctness.
HPU: enhanced backend
TFHE-rs v1.3 expands the HPU backend:
- Adds support for divisions, shifts, rotations, max operations, bit counting, and more via the high-level API.
- FPGA reloading is now significantly faster thanks to PCIe-based “tandem mode,” removing the slow flash memory write process on the V80 board.
The default cryptographic parameters for HPU now align with CPU and GPU, achieving <2-128 error probability with minimum impact on performance.
Additional links
- Star Zama's TFHE-rs GitHub repository to endorse our work.
- Review Zama's TFHE-rs documentation.
- Get support on our community channels.
- Participate in the Zama Bounty Program to get rewards in cash.