Announcing HPU on FPGA: The First Open-source Hardware Accelerator for FHE
.png)
This month, Zama’s hardware team released the first fully open-source hardware accelerator for FHE, along with TFHE-rs v1.2, which includes a new backend to support this Homomorphic Processing Units (HPU) running on an Field-Programmable Gate Array (FPGA).
After more than two years of development, our team at Zama is excited to share this milestone and to gather feedback from both application developers and FHE hardware acceleration community.
The complete systemVerilog implementation of HPU is available in this repository.
What is HPU?
HPU is a processor designed to compute directly on encrypted data. Unlike hardware that simply accelerates cryptographic primitives like bootstrapping or re-linearization, the HPU is a complete on-chip processor. It includes a register file, control logic, and an arithmetic unit that executes a stream of instructions over encrypted operands.
Running today on an FPGA, the FHE hardware accelerator HPU contains all the logic needed to execute boolean and integer operations on encrypted operands of various sizes.

The HPU is driven via the new tfhe-hpu-backend, available in TFHE-rs v1.2. This new TFHE-rs backend initializes HPU’s keys, LUT and Digit Operations (DOp) firmware — the micro-code associated with each of the input instructions named Integer Operations (IOp).
For example, when an application needs to do an encrypted addition [.c-inline-code]a + b[.c-inline-code], where [.c-inline-code]a[.c-inline-code] and [.c-inline-code]b[.c-inline-code] are [.c-inline-code]FheUint[.c-inline-code], [.c-inline-code]a[.c-inline-code] and [.c-inline-code]b[.c-inline-code] are sent to the large memory of the HPU, along with the command [.c-inline-code]ADD IOp[.c-inline-code]. The [.c-inline-code]ADD IOp[.c-inline-code] is then translated into a series of DOp instructions by the embedded controller. Each DOp is handled by one of the processing elements of the HPU.
HPU is not limited to run addition & multiplication. It has been designed to be a versatile FHE processor, so the HPU IOp instruction set is fully customizable. Developers can not only modify existing operations, but also create new IOps and write their own corresponding DOps firmware to execute on chip.
HPU includes all the processing elements necessary to run all your TFHE operations on chip. It interacts with the host memory & CPU only when required by the running application. For those familiar with TFHE cryptographic scheme, it includes a module implementing complete Programmable BootStrapping (PBS) — a core requirement for fast and scalable FHE execution. To learn more about the TFHE scheme, check out our blog post series TFHE deep dive to understand why PBS module is a must-have for this accelerator.
How to use HPU?
Zama's HPU is integrated into the TFHE-rs library. When the HPU backend is enabled, it receives its encrypted operands and instructions from TFHE-rs.
Using the HPU is easy — it works with the same high-level API TFHE-rs used to run encrypted operations on CPU or GPU. Simply select an HPU device, and all operations on your ciphertexts will be executed on the HPU.
A comprehensive documentation is available here.
Let’s talk about performance
How fast is HPU? Well, it’s pretty fast! Today, it is running at 350Mhz on the 7nm FPGA of the AMD/Xilinx V80 board equipped with x2 HBM2e. It can process around 13k PBS/s consuming around 200W. Some FHE operations are still being optimized, but you can already find the first benchmark results that we published in the documentation.
.png)
Why use HPU? And when?
The HPU is a specific machine designed for FHE. It is faster and more power-efficient than CPU or GPU for the same task. HPU also has lower cost per unit thanks to the affordability and energy efficiency of FPGAs compared to both CPUs and GPUs.
While Zama’s HPU on FPGA is not fully production-ready yet, Zama’s teams are actively working on improving it. We aim to integrate it in the Zama product line before the end of 2025. In the meantime, we are working on multi-HPU servers that will further reduce the latency of FHE operations and increase the number of operations per dollar and per hour.
The cool part is that the entire HPU stack is fully open source — from the systemVerilog hardware design and firmware to the TFHE-rs integration backend. Developers and researchers can not only use the HPU accelerator, but also audit, extend, and repurpose it for their own prototypes. If you’re eager to test it out now, it’s simple:
- Get your hands on an V80 board
- Load the provided HPU bitstream
- Or compile your own version from: zama-ai/hpu_fpga
We’re excited to see what the community builds with it.
Additional links
- Star Zama's HPU FPGA GitHub repository to endorse our work.
- Review Zama's TFHE-rs documentation.
- Get support on our community channels.
- Participate in the Zama Bounty Program to get rewards in cash.