Concrete ML v1.2.0: Hybrid Deployment and Inference Speed Improvements

October 17, 2023
Andrei Stoian

This new version of Concrete ML adds support for hybrid deployment and K-nearest neighbor classification. Hybrid deployment with Fully Homomorphic Encryption (FHE) is an approach that improves on-premise deployment by converting parts of the model to remote FHE computation, in order to protect model intellectual property (IP), ensure license compliance and facilitate usage monitoring. Concrete ML v1.2 also adds improvement to the built-in neural networks, making them 10x faster out-of-the-box.

On-premise hybrid deployment

Large Language-Models (LLMs) can enable large productivity increases when unleashed on confidential data that companies store in their knowledge bases. Many companies have policies to forbid their employees from using cloud based LLMs, since this may leak such confidential data. On the other hand, developers of proprietary LLMs want to ensure their model IP is protected. Indeed, in essence, LLMs are the result of extensive training and optimization processes, the weights and biases of an LLM are intrinsic to its value, performance, and identity. Protecting them is akin to safeguarding the intellectual, ethical, and economic interests embedded in the model. 

Concrete ML introduces hybrid on-premise deployment for both LLMs and regular CNNs (Convolutional Neural Networks), which allows the model to be deployed partly on premise and partly in the cloud with FHE. Such a configuration affords the best of both worlds to all stakeholders: protecting both confidentiality and model IP.  This use case example shows this scenario in action: some linear layers in the LLM are executed with FHE on the server-side and the client does not obtain those weights. Having clients make requests for each token generated makes billing easy and facilitates license compliance monitoring. 

K-nearest neighbor (KNN) classification

KNN is a simple non-parametric machine learning model that proves very useful in many applications. Furthermore, the same underlying algorithm can perform similarity search, when a threshold on the distance is applied instead of class labels. By using Programmable Bootstrapping, TFHE can support top-k selection on encrypted distances. Thus, the full KNN algorithm is performed on encrypted data and the model training data is not exposed to the risk of leakage. See this notebook for a demo of the KNN classifier.

Optimized neural networks

Right bit-shift has been implemented in Concrete with a low-level cryptographic primitive, and it can divide encrypted input values by power-of-two scalars, with a much smaller cost than a full high-precision PBS. On the other hand, quantization aware training for neural networks can ensure that the quantization operation can be computed with such a division operation. Concrete ML now benefits from the combination of these two features and adds them under-the-hood to built-in neural networks. Results show that this optimization reduces inference time by an order of magnitude while preserving model accuracy and simplifying the built-in neural networks.

New development pipeline

To speed up bug-fixes and releases, Concrete ML development has moved to the public repository. Developers will thus be able to work with the latest code, in which we include bug-fixes requested through the community forums. 

Additional links

Read more related posts