Concrete ML v1.1.0: Faster inference and a first demo of FHE LLMs

July 25, 2023
Andrei Stoian

Concrete ML 1.1.0 introduces optimization tools that can accelerate the Fully Homomorphic Encryption (FHE) inference time of neural network models by up to a factor of 20.

An implementation of a large language model using FHE has been provided, which ensures both user privacy and the protection of the model owner's Intellectual Property (IP).

This new version also offers enhanced support for built-in neural networks and classical models. Additional resources have been included to guide users through the process of optimizing and deploying FHE-based machine learning models.

Reducing latency for neural networks

Concrete ML compiles quantized neural networks to FHE computations. The quantization step involves reducing the precision of the intermediary values obtained during the model inference step.

In some applications, the quantization may reduce this precision down to as little as 4 bits. Zama’s FHE libraries offer a specialized cryptographic primitive for this quantization, efficiently rounding off the least significant bits of an encrypted integer.

In this new version of Concrete ML, rounding is accelerated in FHE, and, when coupled with other optimization tools, it achieves a 20x speed-up for a VGG-like convolutional neural network.

Secure Large Language-Models with Concrete ML

A new use case demonstrates how to introduce encrypted layers in large language models (LLMs). This was done by using the Hugging Face transformers library to start the model, but it was changed to add quantization and make it compatible with Concrete Python which compiles to FHE.

This work gives you a chance to try out large language models and see how FHE can be used with them.  

More resources to get started with Concrete ML

Putting FHE ML models in production is now easier than ever. This blog post on the topic shows you how to deploy to client-server settings using model serialization.

A first video tutorial has also been released showing the ease of use of Concrete ML to convert scikit-learn models to FHE.

Going from model development to deployment is now a matter of hours, as shown in the new Health Diagnosis notebook and its associated live demo on Hugging Face.

Finally, a new example on credit scoring shows how FHE could be used to process sensitive financial data.

Additional links

Read more related posts