Announcing Concrete-core v1.0.0-gamma with GPU acceleration

July 6, 2022

—

The Zama Team

This release is the third major step toward the 1.0.0 release of Zama’s low-level crypto library, Concrete-core. Check out previous blog posts on the topic (V1.0.0-alpha and V1.0.0-beta), where we explain how Concrete-core is designed to experiment and integrate new FHE-related hardware-acceleration with ease!

Today, it is our pleasure to deliver Concrete-core V1.0.0-gamma, with Cuda acceleration! Visit the Github release note for the full list of changes. Overall, this version introduces:

A new Cuda backend, dedicated to the acceleration of the programmable bootstrap and the keyswitch on Nvidia GPUs (Nvidia GPUs are widely available on compute environments and highly popular in machine learning, which is why we are targeting this brand specifically).
A split of the Core backend in order to enable multi-platform support.
Support for serialization: entities now have a serialization version attached to them, so that it’s possible to know which version of Concrete-core generated them. Serialization engines are also implemented.

This release comes with two new APIs on top of Concrete-core: a C API and a WASM API. Both only wrap a subset of Concrete-core at the moment. The C API covers the use case of the Concrete Framework’s Compiler, while the WASM API provides all necessary engines for a client application. We plan to make their generation automatic and to offer full coverage of Concrete-core’s features. The aim is to enable the widest possible range of applications to be built on top of Concrete-core.

In the sections below, more details are given on the new multi-platform support and the Cuda acceleration introduced in this release.

Enabling multi-platform support

From the start, the V1 design has been aimed at supporting a wide variety of platforms and hardware. Adapting Concrete-core V0.1.10 required an extensive restructuring of the code base; in order to see it through safely, we needed a testing infrastructure. This infrastructure was introduced with the V1.0.0-beta release in April 2022, which set us free to proceed with the main restructuring of Concrete-core: multi-backend support. The former `Core` backend is now split into:

- A default backend that does not depend on specialized hardware, unless specific compile-time configuration is performed:

It is the main backend for encryption, decryption, and key creation.
It supports a variety of leveled operations on ciphertexts, but nothing involving the FFT (e.g. bootstrapping, Cmux). It should have an FFT support relying on RustFFT in the future.
The bootstrap key creation can be accelerated with multithreading, relying on the `rayon` dependency.
Encryption can be performed using `rdseed` and/or `aesni` on x86_64 platforms that support it. Otherwise, a unix seeder can be used in place of `rdseed` and CSPRNG software in place of `aesni`. This choice is made at compile time, so it is possible to build the default backend for a chosen target.

- An FFTW backend that implements operations involving polynomial multiplication (bootstrap, Cmux, external product) with FFTW acceleration.

- Additionally, a new Cuda backend is introduced that provides GPU acceleration for the bootstrap and keyswitch.

This new structure is represented in the figure below:

Now let’s dive into more details of the new Cuda backend introduced in this release.

Cuda acceleration of TFHE’s programmable bootstrap

Cuda acceleration is now available in Concrete-core. TFHE’s programmable bootstrap is the bottleneck in terms of performance, which is why this is the first operation we’ve targeted for Cuda acceleration. Since it usually comes together with a keyswitch, we also provide a Cuda accelerated version of the keyswitch. In order to cover different use cases, we offer two different implementations of the bootstrap. In both, the bootstrap operation is accelerated via a single Cuda kernel, a function executed on the GPU that runs a set of instructions onto Cuda threads, themselves part of Cuda blocks:

The Low Latency Bootstrap (available from the `CudaEngine`), which operates on single input LWE ciphertexts. This uses several Cuda blocks per input ciphertext in the bootstrap kernel. It can also operate on vectors of inputs, but it is aimed at launching restricted numbers of bootstraps simultaneously (from 1 to about 10, depending on the chosen cryptographic parameters).
The Amortized Bootstrap (available in the `AmortizedCudaEngine`), which is very similar to the ones proposed by NuFHE and CuFHE except that it supports more sets of parameters. It uses one block of threads per input LWE ciphertext, and is aimed at accelerating large numbers of bootstraps (> 10).

Below is a comparison between the nuFHE implementation, the Amortized Bootstrap, and the Low Latency Bootstrap. The parameter set is fixed so as to be supported by nuFHE: 32-bit integers are used along with an LWE dimension of 500, a polynomial size of 1024, one GLWE dimension, two levels of decomposition, and a base logarithm of 10 for the decomposition. nuFHE exposes two PBS implementations: one relying on an NTT (in yellow) and another relying on an FFT (in green). The time it takes to execute one bootstrap when launching various amounts of bootstraps at once is compared considering from one up to 10,000 bootstraps launched at once.

The amortized bootstrap of Concrete-core is plotted in blue and the Low Latency one in red (the latter can only launch a restricted amount of PBS at once, which is why there are only two points on the curve). This figure above shows that for small amounts of bootstraps launched at once, the Low Latency implementation of Concrete performs best. On the other hand, when launching large amounts of bootstraps at once, the nuFHE implementations and the Amortized bootstrap implementation perform similarly. For intermediate amounts of bootstraps, the nuFHE implementation relying on the FFT performs best, though it only supports a very limited set of cryptographic parameters. Further, these benchmark results were obtained on an Nvidia Tesla V100-SXM2-16GB GPU. One bootstrap using the same parameters on a CPU requires 27 ms (11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz).

The benchmarking results shown above only relate to one set of cryptographic parameters, so bear in mind that performance varies depending on the parameters chosen. In order to ease the user’s life, we have introduced cost and noise models for the Cuda accelerated operations. Those are being integrated into Zama’s Optimizer and Compiler, which take care of choosing the best parameters and hardware for the user.

Check out our tutorial to see how to start using the Cuda backend.

Summing Up

With this release, we’re getting very close to the final V1 release. We hope this V1.0.0-gamma version will give you the opportunity to try out your applications with Cuda acceleration. Here are the links to both our user documentation and the Rust documentation to get you started.

We’re excited to see what you build!

Additional Links

- Release notes

- Github repo

- Documentation

- List of contributors

Latest Blog Posts

Zama Bounty Program Season 9: Build a privacy-preserving DCA bot

Announcements

Calling all developers to build a privacy-preserving DCA bot with transaction batching using the Zama Protocol.

Zama Partners with OpenZeppelin to Bring Confidential Smart Contracts to DeFi and Digital Assets

Announcements

Today, we're taking a decisive step toward the future of confidential blockchain, and it involves our new partners at OpenZeppelin

TFHE-rs v1.3: Faster Division on CPU, Key Upgrader & Memory Tracking on GPU

TFHE-rs

TFHE-rs v1.3 brings several major improvements and new features across CPU, GPU, and HPU backends.

Read more →

Back to blog

Privacy is necessary for an open society in the electronic age. Privacy is not secrecy. A private matter is something one doesn't want the whole world to know, but a secret matter is something one doesn't want anybody to know. Privacy is the power to selectively reveal oneself to the world.If two parties have some sort of dealings, then each has a memory of their interaction. Each party can speak about their own memory of this; how could anyone prevent it? One could pass laws against it, but the freedom of speech, even more than privacy, is fundamental to an open society; we seek not to restrict any speech at all. If many parties speak together in the same forum, each can speak to all the others and aggregate together knowledge about individuals and other parties. The power of electronic communications has enabled such group speech, and it will not go away merely because we might want it to.Since we desire privacy, we must ensure that each party to a transaction have knowledge only of that which is directly necessary for that transaction. Since any information can be spoken of, we must ensure that we reveal as little as possible. In most cases personal identity is not salient. When I purchase a magazine at a store and hand cash to the clerk, there is no need to know who I am. When I ask my electronic mail provider to send and receive messages, my provider need not know to whom I am speaking or what I am saying or what others are saying to me; my provider only need know how to get the message there and how much I owe them in fees. When my identity is revealed by the underlying mechanism of the transaction, I have no privacy. I cannot here selectively reveal myself; I must always reveal myself.Therefore, privacy in an open society requires anonymous transaction systems. Until now, cash has been the primary such system. An anonymous transaction system is not a secret transaction system. An anonymous system empowers individuals to reveal their identity when desired and only when desired; this is the essence of privacy.Privacy in an open society also requires cryptography. If I say something, I want it heard only by those for whom I intend it. If the content of my speech is available to the world, I have no privacy. To encrypt is to indicate the desire for privacy, and to encrypt with weak cryptography is to indicate not too much desire for privacy. Furthermore, to reveal one's identity with assurance when the default is anonymity requires the cryptographic signature.We cannot expect governments, corporations, or other large, faceless organizations to grant us privacy out of their beneficence. It is to their advantage to speak of us, and we should expect that they will speak. To try to prevent their speech is to fight against the realities of information. Information does not just want to be free, it longs to be free. Information expands to fill the available storage space. Information is Rumor's younger, stronger cousin; Information is fleeter of foot, has more eyes, knows more, and understands less than Rumor.We must defend our own privacy if we expect to have any. We must come together and create systems which allow anonymous transactions to take place. People have been defending their own privacy for centuries with whispers, darkness, envelopes, closed doors, secret handshakes, and couriers. The technologies of the past did not allow for strong privacy, but electronic technologies do.We the Cypherpunks are dedicated to building anonymous systems. We are defending our privacy with cryptography, with anonymous mail forwarding systems, with digital signatures, and with electronic money.Cypherpunks write code. We know that someone has to write software to defend privacy, and since we can't get privacy unless we all do, we're going to write it. We publish our code so that our fellow Cypherpunks may practice and play with it. Our code is free for all to use, worldwide. We don't much care if you don't approve of the software we write. We know that software can't be destroyed and that a widely dispersed system can't be shut down.Cypherpunks deplore regulations on cryptography, for encryption is fundamentally a private act. The act of encryption, in fact, removes information from the public realm. Even laws against cryptography reach only so far as a nation's border and the arm of its violence. Cryptography will ineluctably spread over the whole globe, and with it the anonymous transactions systems that it makes possible.For privacy to be widespread it must be part of a social contract. People must come and together deploy these systems for the common good. Privacy only extends so far as the cooperation of one's fellows in society. We the Cypherpunks seek your questions and your concerns and hope we may engage you so that we do not deceive ourselves. We will not, however, be moved out of our course because some may disagree with our goals.The Cypherpunks are actively engaged in making the networks safer for privacy. Let us proceed together apace.Onward.Eric Hughes9 March 1993