Concrete ML v1.9: TFHE-rs Compatibility and Faster LLM Fine-tuning

April 10, 2025
Andrei Stoian

Concrete ML v1.9 introduces support for the TFHE-rs ciphertext format, enabling seamless integration of Concrete ML models into Rust-based FHE pipelines using TFHE-rs. This release also brings performance improvements to the LoRA LLM fine-tuning protocol, along with new example notebooks demonstrating its use.

In parallel, Zama is also launching the Concrete ML Extensions, a new client SDK designed for building FHE-enabled browser and mobile applications. This SDK highlights the potential of fully homomorphic encryption to empower mobile users to securely process their sensitive data—without ever exposing it in the clear.

In this blog post, we’ll dive into the key features of this release and explore what’s possible with the latest advancements in Concrete ML.

TFHE-rs ciphertext format support

Concrete ML now supports the TFHE-rs radix ciphertext format, making encrypted ML workflows compatible with the Rust ecosystem. TFHE-rs uses a universal parameter set with backward compatibility, meaning that ciphertexts encrypted with these parameters today remain compatible in the future.

With this new compatibility in Concrete ML v1.9, you can now use these ciphertexts as inputs and outputs of ML models. The following snippet shows how to compile models to use TFHE-rs ciphertexts.

model.compile(x, ciphertext_format=CiphertextFormat.TFHE_RS)
y_pred_tfhers = model.predict(fhe_test_data, fhe="execute")

You can also use TFHE-rs ciphertexts with the client/server API. A new use-case demonstrates how to use TFHE-rs post-processing on the logits output by a decision tree classifier. Note that using TFHE-rs ciphertexts format requires a conversion layer in the ML model, which may introduce a 4–5x latency overhead.

New use-case:  Fine-tuned LLAMA for math problem solving

Concrete v1.9 brings a new example of a fully functional encrypted fine-tuning pipeline on GPU. In the example notebook, the LLAMA 1B model is  fine-tuned with LoRA on a math problem dataset—entirely under FHE and accelerated on GPU.

We compare model quality using perplexity scores across encrypted vs. cleartext training runs. With performance optimizations, the FHE fine-tuning pipeline now achieves up to 64 tokens/second on a desktop GPU.

Here’s how fine-tuning improves the model's reasoning.

Before fine-tuning, the original LLAMA model produces the following output for a simple math problem:

Prompt: When you multiply a number by 7, it becomes 98. What is that number?
Response: If you multiply a number by 7, it becomes 98. So, the number you're asking about is 98.

After fine-tuning, the model solves the problem correctly:

Prompt: When you multiply a number by 7, it becomes 98. What is that number?
Response: To find the number, you need to divide 98 by 7. 98 ÷ 7 = 14

Training on the full dataset takes 28 hours across 50 desktop GPUs.

Mobile client SDK: Concrete ML Extensions

Today, mobile phones store sensitive information for billions of people, making privacy and security more important than ever. At the same time, the rise of AI has shown how personal data can unlock powerful services—from healthcare insights and genetic analysis to personalized recommendations and targeted ads. It’s a double-edged sword: the same data that can benefit us also puts our privacy at risk. By integrating FHE into mobile apps, we can enable personalized  features while keeping user data completely private and secure.

Concrete ML v1.9 introduces Concrete ML Extensions, a new SDK designed for building FHE-enabled client-side apps. Developers can compile this SDK to Swift, enabling iOS applications to perform encryption, decryption, and key generation natively.

A step-by-step tutorial is available to guide you through compiling the Swift library and integrating it into your iOS apps. In the coming weeks, we’ll also be releasing a series of demo iOS applications—stay tuned!

Additional links

Read more related posts

No items found.