TFHE-rs v1.1: Fine-Grained GPU Control and More Operators

April 10, 2025

—

Jean-Baptiste Orfila, Arthur Meyre, Agnes Leroy

This month, Zama released TFHE-rs v1.1, bringing several major improvements and new features for both GPU and CPU backends.

On the GPU side, the backend is upgraded to adopt the same default cryptographic parameters as the CPU, reducing the probability of computational errors to less than 2⁻¹²⁸ — all with minimal impact on performance. Multi-GPU support has been significantly improved as well: users can now explicitly choose which GPU to run on, enabling close to 500 encrypted 64-bit additions per second on 8×H100 GPUs.

On the CPU side, this release expands the operator set by supporting more scalar cases, making homomorphic computations more versatile and efficient.

In this blog post, we’ll dive into the details of what’s new in TFHE-rs v1.1.

Better multi-GPU throughput

Before v1.1, TFHE-rs' High-Level API automatically dispatched workloads across all available GPUs using a hardcoded strategy for any encrypted operations. While this is effective for very large integer precisions(>128 bits), and for operations that load the GPU extensively (such as multiplication), it isn’t ideal for smaller operations, typically the 64-bit encrypted additions or comparisons.

Starting with v1.1, developers can select exactly which GPU to use for each operation, optimizing performance on multi-GPU setups.

Here’s a quick example of executing a hundred 64-bit encrypted additions per GPU in parallel, where each addition is computed on a single GPU:

use tfhe::{ConfigBuilder, set_server_key, ClientKey, CompressedServerKey, FheUint64, GpuIndex};
use tfhe::prelude::*;
use rayon::prelude::*;
use tfhe::core_crypto::gpu::get_number_of_gpus;
use rand::{thread_rng, Rng};
fn main() {
    let config = ConfigBuilder::default().build();

    let client_key = ClientKey::generate(config);
    let compressed_server_key = CompressedServerKey::new(&client_key);

    let num_gpus = get_number_of_gpus();
    let sks_vec = (0..num_gpus)
        .map(|i| compressed_server_key.decompress_to_specific_gpu(GpuIndex::new(i)))
        .collect::>();

    let batch_size = num_gpus * 100;

    let mut rng = thread_rng();
    let left_inputs = (0..batch_size)
        .map(|_| FheUint64::encrypt(rng.gen::(), &client_key))
        .collect::>();
    let right_inputs = (0..batch_size)
        .map(|_| FheUint64::encrypt(rng.gen::(), &client_key))
        .collect::>();

    let chunk_size = (batch_size / num_gpus) as usize;
    left_inputs
        .par_chunks(chunk_size)
        .zip(
            right_inputs
                .par_chunks(chunk_size)
        )
        .enumerate()
        .for_each(
            |(i, (left_inputs_on_gpu_i, right_inputs_on_gpu_i))| {
                left_inputs_on_gpu_i
                    .par_iter()
                    .zip(right_inputs_on_gpu_i.par_iter())
                    .for_each(|(left_input, right_input)| {
                        set_server_key(sks_vec[i].clone());
                        left_input + right_input;
                    });
            },
        );
}

What’s happening here? The first thing that differs from a usual GPU computation with TFHE-rs is the way the server key is defined:

let sks_vec = (0..num_gpus)
        .map(|i| compressed_server_key.decompress_to_specific_gpu(GpuIndex::new(i)))
        .collect::>();

Here, a vector of server keys is created, each on a specific GPU.

Then, instead of calling [.c-inline-code]par_iter()[.c-inline-code] onto the inputs as one would naturally do, the inputs are chunked to be distributed onto all the GPUs:

left_inputs
        .par_chunks(chunk_size)
        .zip(
            right_inputs
                .par_chunks(chunk_size)
        )
        .enumerate()
        .for_each(
            |(i, (left_inputs_on_gpu_i, right_inputs_on_gpu_i))| {
                left_inputs_on_gpu_i
                    .par_iter()
                    .zip(right_inputs_on_gpu_i.par_iter())
                    .for_each(|(left_input, right_input)| {
                        set_server_key(sks_vec[i].clone());
                        left_input + right_input;
                    });
            },
        );

By setting the server key corresponding to the GPU associated to each chunk with [.c-inline-code]set_server_key(sks_vec[i].clone())[.c-inline-code], the additions get computed on all the GPUs independently. Note that when doing [.c-inline-code]sks_vec[i].clone()[.c-inline-code], the pointer to the server key is copied to the specific thread, not the content of the server key itself, so it does not induce additional overhead. You can go further to maximize multi-GPU throughput by following our dedicated tutorial.

With this logic set up, TFHE-rs can now achieve close to 500 additions of 64-bit encrypted integers per second on 8xH100 GPUs.

New operators on the CPU backend

TFHE-rs v1.1 brings several additions and improvements to the CPU backend:

Scalar support for [.c-inline-code]select[.c-inline-code]: Previously, the [.c-inline-code]select[.c-inline-code] operation only worked with encrypted values. In v1.1, you can now use scalar (plaintext) values as selectable operands. For 64-bit inputs, this operation executes in approximately 20 milliseconds.
Improved subtraction: Subtraction now supports a scalar on the left-hand side, making expressions like [.c-inline-code]scalar - encrypted[.c-inline-code] possible. For 64-bit operands, this operation takes around 79 milliseconds.
New dot product operator: v1.1 introduces a dot product operation between a vector of [.c-inline-code]FheBool[.c-inline-code] values and any supported scalar type. On a vector of 1,024 elements, execution time is approximately 2 seconds.

All performance benchmarks were measured on an [.c-inline-code]AWS hpc7a.96xlarge[.c-inline-code] instance.

Smarter key generation

To better support operations in memory-constrained environments, v1.1 also introduces “chunked” bootstrapping key generation. This feature allows the bootstrapping key to be generated in smaller chunks, which can later be assembled into a full key on higher-capacity servers used for encrypted computation.

The next release of TFHE-rs will continue to improve performance and introduce new features. Stay tuned!

Additional links

Star Zama's TFHE-rs GitHub repository to endorse our work.
Review Zama's TFHE-rs documentation.
Get support on our community channels.
Participate in the Zama Bounty Program to get rewards in cash.

‍

Latest Blog Posts

Zama Partners with OpenZeppelin to Bring Confidential Smart Contracts to DeFi and Digital Assets

Announcements

Today, we're taking a decisive step toward the future of confidential blockchain, and it involves our new partners at OpenZeppelin

TFHE-rs v1.3: Faster Division on CPU, Key Upgrader & Memory Tracking on GPU

TFHE-rs

TFHE-rs v1.3 brings several major improvements and new features across CPU, GPU, and HPU backends.

Zama Product Releases - July 2025

Announcements

With these releases, Zama continues to build its suite of products to make homomorphic encryption accessible, easy, and fast.

Read more →

Back to blog

Privacy is necessary for an open society in the electronic age. Privacy is not secrecy. A private matter is something one doesn't want the whole world to know, but a secret matter is something one doesn't want anybody to know. Privacy is the power to selectively reveal oneself to the world.If two parties have some sort of dealings, then each has a memory of their interaction. Each party can speak about their own memory of this; how could anyone prevent it? One could pass laws against it, but the freedom of speech, even more than privacy, is fundamental to an open society; we seek not to restrict any speech at all. If many parties speak together in the same forum, each can speak to all the others and aggregate together knowledge about individuals and other parties. The power of electronic communications has enabled such group speech, and it will not go away merely because we might want it to.Since we desire privacy, we must ensure that each party to a transaction have knowledge only of that which is directly necessary for that transaction. Since any information can be spoken of, we must ensure that we reveal as little as possible. In most cases personal identity is not salient. When I purchase a magazine at a store and hand cash to the clerk, there is no need to know who I am. When I ask my electronic mail provider to send and receive messages, my provider need not know to whom I am speaking or what I am saying or what others are saying to me; my provider only need know how to get the message there and how much I owe them in fees. When my identity is revealed by the underlying mechanism of the transaction, I have no privacy. I cannot here selectively reveal myself; I must always reveal myself.Therefore, privacy in an open society requires anonymous transaction systems. Until now, cash has been the primary such system. An anonymous transaction system is not a secret transaction system. An anonymous system empowers individuals to reveal their identity when desired and only when desired; this is the essence of privacy.Privacy in an open society also requires cryptography. If I say something, I want it heard only by those for whom I intend it. If the content of my speech is available to the world, I have no privacy. To encrypt is to indicate the desire for privacy, and to encrypt with weak cryptography is to indicate not too much desire for privacy. Furthermore, to reveal one's identity with assurance when the default is anonymity requires the cryptographic signature.We cannot expect governments, corporations, or other large, faceless organizations to grant us privacy out of their beneficence. It is to their advantage to speak of us, and we should expect that they will speak. To try to prevent their speech is to fight against the realities of information. Information does not just want to be free, it longs to be free. Information expands to fill the available storage space. Information is Rumor's younger, stronger cousin; Information is fleeter of foot, has more eyes, knows more, and understands less than Rumor.We must defend our own privacy if we expect to have any. We must come together and create systems which allow anonymous transactions to take place. People have been defending their own privacy for centuries with whispers, darkness, envelopes, closed doors, secret handshakes, and couriers. The technologies of the past did not allow for strong privacy, but electronic technologies do.We the Cypherpunks are dedicated to building anonymous systems. We are defending our privacy with cryptography, with anonymous mail forwarding systems, with digital signatures, and with electronic money.Cypherpunks write code. We know that someone has to write software to defend privacy, and since we can't get privacy unless we all do, we're going to write it. We publish our code so that our fellow Cypherpunks may practice and play with it. Our code is free for all to use, worldwide. We don't much care if you don't approve of the software we write. We know that software can't be destroyed and that a widely dispersed system can't be shut down.Cypherpunks deplore regulations on cryptography, for encryption is fundamentally a private act. The act of encryption, in fact, removes information from the public realm. Even laws against cryptography reach only so far as a nation's border and the arm of its violence. Cryptography will ineluctably spread over the whole globe, and with it the anonymous transactions systems that it makes possible.For privacy to be widespread it must be part of a social contract. People must come and together deploy these systems for the common good. Privacy only extends so far as the cooperation of one's fellows in society. We the Cypherpunks seek your questions and your concerns and hope we may engage you so that we do not deceive ourselves. We will not, however, be moved out of our course because some may disagree with our goals.The Cypherpunks are actively engaged in making the networks safer for privacy. Let us proceed together apace.Onward.Eric Hughes9 March 1993