TFHE-rs v1.3: Faster Division on CPU, Key Upgrader & Memory Tracking on GPU

July 16, 2025

—

Jean-Baptiste Orfila, Arthur Meyre, Agnes Leroy

TFHE-rs v1.3 brings several major improvements and new features across CPU, GPU, and HPU backends, enhancing performance, usability, and security for Fully Homomorphic Encryption(FHE) workloads.

Highlights

GPU performance boost: Integer logarithm is now 4× faster, and several other operations are up to 20% faster.
GPU memory tracking: Utility functions help you predict memory usage and avoid out-of-memory errors.
CPU division speed-up: 36% faster division.
Zero-Knowledge Proof v2: New hashing mode improves verification speed by 34–38% with a slight proof time increase.
Noise squashing compression: Compress large ciphertexts generated in MPC.
Key upgrader: Update old ciphertexts to new cryptographic parameters for performance and security
HPU security alignment: Default cryptographic parameters now match CPU and GPU standards, reducing error probability to < 2^-128 with minimal performance impact.
HPU backend improvements: More supported operations and faster FPGA reloading.

GPU: memory tracking and performance improvements

Memory tracking

GPU memory is used to store input and output ciphertexts, the server key, and temporary buffers needed during the FHE operations. Those buffers’ size depend on the operations, for example, a 64-bit addition requires relatively little memory, while a 128-bit multiplication consumes significantly more.

Before v1.3, when using the high-level API of TFHE-rs, developers couldn’t know how much GPU memory an operation would require before executing it. This made it challenging to schedule large computations or parallel workloads without risking out-of-memory errors.

TFHE-rs v1.3 introduces utility functions that return the amount of GPU memory each operation requires, and allow users to check whether the GPU has enough memory available to execute it.

Here’s a minimal example showing how you can track GPU memory usage:

use rand::Rng;
use tfhe::prelude::*;
use tfhe::{
    set_server_key, ClientKey, CompressedServerKey, ConfigBuilder,
    FheInt32, GpuIndex,
};
fn main() {

    let config = ConfigBuilder::default();
    let client_key = ClientKey::generate(config);
    let csks = CompressedServerKey::new(&client_key);
    let server_key = csks.decompress_to_gpu();
    set_server_key(server_key);
    let mut rng = rand::thread_rng();
    let clear_a = rng.gen_range(1..=i32::MAX);
    let clear_b = rng.gen_range(1..=i32::MAX);
    let mut a = FheInt32::try_encrypt(clear_a, &client_key).unwrap();
    let mut b = FheInt32::try_encrypt(clear_b, &client_key).unwrap();
    
    // Determine how much memory a and b will use
    let ciphertexts_size = a.get_size_on_gpu() + b.get_size_on_gpu();
    
    // Check there is enough space in GPU memory to store a and b
    check_valid_cuda_malloc_assert_oom(ciphertexts_size, GpuIndex::new(0));

    // Move a and b to the GPU memory
    a.move_to_current_device();
    b.move_to_current_device();

    // Determine how much memory on GPU will be used to add a and b
    let add_size = a.get_add_size_on_gpu(&b);

    // Check the GPU has enough memory for the addition to happen
    check_valid_cuda_malloc_assert_oom(add_size, GpuIndex::new(0));

    // Perform the addition
    a += &b;
}

In this example, you first calculate the GPU memory needed to store the ciphertexts:

    // Determine how much memory a and b will use
    let ciphertexts_size = a.get_size_on_gpu() + b.get_size_on_gpu();

Then you calculate the memory required for the addition itself:

    // Determine how much memory on GPU will be used to add a and b
    let add_size = a.get_add_size_on_gpu(&b);

Finally, you check whether the GPU has enough memory available to perform the addition safely:

    // Check the GPU has enough memory for the addition
    check_valid_cuda_malloc_assert_oom(add_size, GpuIndex::new(0));

These new utility functions help to avoid out-of-memory errors during execution.

Improved performance

TFHE-rs v1.3 brings significant GPU performance improvement in particular:

Multiplying or dividing a ciphertext by a clear value is now ~20% faster.
The integer logarithm operation is up to 4× faster compared to the previous version.

As TFHE-rs is the underlying library of the Zama Confidential Blockchain Protocol, to illustrate real-world performance, consider an ERC20 transfer that requires executing the following sequence of operations:

    fn erc20_transfer(
        from_amount: &FheUint64,
        to_amount: &FheUint64,
        amount: &FheUint64,
    ) -> (FheUint64, FheUint64)    
        let (new_from, did_not_have_enough) = (from_amount).overflowing_sub(amount);
        let did_not_have_enough = &did_not_have_enough;
        let had_enough_funds = !did_not_have_enough;

        let (new_from_amount, new_to_amount) = rayon::join(
            || did_not_have_enough.if_then_else(from_amount, &new_from),
            || to_amount + (amount * FheType::cast_from(had_enough_funds)),
        );
        (new_from_amount, new_to_amount)
    }

On 8xH100 GPUs, the latency and throughput for this ERC20 transfer are as below:

TFHE-rs guarantees the probability of failure for FHE operations below 2^-128, ensuring computational correctness and security in production scenarios.

New features

From TFHE-rs v1.3 on, users can call the function [.c-inline-code]expand_and_verify[.c-inline-code] on [.c-inline-code]ProvenCompactCiphertextList[.c-inline-code] and their non-proven counterparts. In this workflow, the Zero-Knowledge proof verification is executed on the CPU, and the expansion, a preprocessing step necessary for FHE computation on this type of ciphertexts, runs on the GPU.

Additionally, the noise squashing used in threshold decryption protocol can now be executed on GPU for better performance.

CPU: faster division and ZK proof enhancements

TFHE-rs v1.3 brings several improvements to CPU-side performance and cryptographic capabilities.

The division algorithm has been improved, reducing runtime from 8.6 seconds to 5.5 seconds when working with [.c-inline-code]FheUint64[.c-inline-code] inputs. Zero-Knowledge Proofs also see important enhancements. A new hashing scheme in ZK v2 trades a small increase in proof generation time for a 34–38% faster verification process, significantly improving proof efficiency.

In MPC settings, Noise Squashing primitives—first introduced in TFHE-rs 1.1.0—enable noise flooding to protect against leakage. However, these operations can produce large ciphertexts. To address this, v1.3 adds compression primitives that reduce the size of noise-squashed ciphertexts, making storage and transfer more efficient. Refer to the documentation for more details.

To simplify working with evolving cryptographic parameters, this release introduces a key upgrader mechanism. It allows ciphertexts encrypted with older keys or parameter sets to be updated to newer configurations. This process requires keys and ciphertexts to be correctly tagged so the upgrade path can be identified and applied with the provided upgrade keys.

Additionally, TFHE-rs v1.3 introduces a new variant of the modulus switch technique for the binary key distribution. Unlike the previous approaches, this variant allows to achieve the same 2^-128 probability of failure, without the extra key material.

Finally, TFHE-rs v1.3 now supports parameter sets with smaller moduli for the keyswitch, which can improve performance when a large modulus (like 2⁶⁴) is not required for keyswitch correctness.

HPU: enhanced backend

TFHE-rs v1.3 expands the HPU backend:

Adds support for divisions, shifts, rotations, max operations, bit counting, and more via the high-level API.
FPGA reloading is now significantly faster thanks to PCIe-based “tandem mode,” removing the slow flash memory write process on the V80 board.

The default cryptographic parameters for HPU now align with CPU and GPU, achieving <2^-128 error probability with minimum impact on performance.

Additional links

Star Zama's TFHE-rs GitHub repository to endorse our work.
Review Zama's TFHE-rs documentation.
Get support on our community channels.
Participate in the Zama Bounty Program to get rewards in cash.

‍