Announcing Concrete ML v0.2

April 21, 2022

—

The Zama Team

Today, we are announcing the release of Concrete ML as a public alpha. The package is built on top of Concrete Numpy.

The goal is to allow data scientists without any prior knowledge of cryptography to automatically turn classical machine learning (ML) models into their FHE equivalent. This release provides APIs which are as close as possible to what data scientists are already using.

Easy to use APIs to compile ML models to FHE

A major goal of this release is to make the adoption of Concrete ML, for users of popular machine learning frameworks, as simple as it can be. Here’s an example, with a very basic linear model and scikit-learn.

In scikit-learn, it would be something like this:

from sklearn.linear_model import LinearRegression as SklearnLinearRegression
linreg = SklearnLinearRegression()
linreg.fit(x_train, y_train)
y_pred = linreg.predict(x_test)

‍

The equivalent in Concrete ML, with the same model but operating on encrypted data, is:

from concrete.ml.sklearn import LinearRegression as ConcreteLinearRegression
q_linreg = ConcreteLinearRegression(n_bits={"inputs": 6, "weights": 1})
q_linreg.fit(x_train, y_train)
q_linreg.compile(calib_data)
y_test_pred_fhe = q_linreg.predict(x_test, execute_in_fhe=True)

‍

where q_linreg stands for quantized linear regression, calib_data is a representative unlabeled dataset, and n_bits is the number of bits of quantization. compile() is the function which turns the model into its FHE equivalent.

We emphasize that, for linear models and trees, we don’t reimplement model training in Concrete ML, so you can use any kind of variation of these models that scikit-learn supports. This allows you to enjoy all the features of scikit-learn, and to use pipelines or grid search on Concrete ML models, as shown in this tutorial. Once the models are trained, they can be compiled to FHE whatever the training setting passed to scikit-learn.

Overview of our FHE ML classifiers

In the following illustrations, we compare models between scikit-learn and Concrete ML. You can reproduce these experiments with the tutorial from the documentation. This tutorial trains classifiers on three datasets and generates graphs of the decision boundaries while also showing the accuracy obtained on each test set. The accuracies of the Concrete ML classifiers, shown as percentages on each plot, are measuredy on encrypted data (i.e., in FHE) while, to reduce execution time of the notebook, the red / blue decision function contours are computed without FHE (through a simulator called the Virtual Lib). The first dataset is make_moons, the second one is make_circles, and the third is a simple, almost linearly separable dataset. Let’s look at the results of this tutorial in more detail.

Linear models

For these simple 2D linear models, we have good performance in FHE, similar to the performance of their unencrypted scikit-learn counterparts. However, in the current release, the performance of our heavily quantized classifiers rapidly degrades with an increasing number of dimensions. This will be improved in future releases.

Decision-tree models

Tree-based classifiers in Concrete ML achieve excellent accuracy on encrypted data. Although tree models require comparisons (non-linear operations on encrypted data), Zama’s unique take on FHE supports Programmable Bootstrapping that enables this feature with ease. Thus, performance in FHE for tree-based models is as good as that of their scikit-learn/xgboost counterparts. This holds even on datasets which have a high number of dimensions and, in general, tree-based models are usually the most performant ones when it comes to tabular data. Thus, once we have integrated some deployment APIs, you can put our Decision Trees, Random Forests, and Gradient Boosted Trees into production.

If you’re interested in going further with this, have a look at this tutorial which uses a decision tree to classify a spam dataset or at this tutorial which uses XGBoost on a diabetes prediction task.

Neural network models

Finally, neural-networks are also available in Concrete ML. Since several layers increase the number of computations in these classifiers, the effect of precision loss is compounded, and consequently these classifiers currently have poor performance in FHE. Later versions of the package will improve these results, including with quantization-aware training options and more precision available in next versions of Concrete Framework.

What about deep learning?

We also have made efforts to support generic, user-provided torch models. For this use case, tutorials are available for a Fully Connected Neural Network and for a Convolutional Neural Network. Let’s look at some usage examples.

from concrete.ml.torch.compile import compile_torch_model
quantized_compiled_module = compile_torch_model(net, X_train, n_bits=3)
q_input = quantized_compiled_module.quantize_input(input)
quantized_compiled_module.forward_fhe.run(q_input)

‍

Here the torch network net is assumed to be trained by the user through any kind of torch training pipeline. The compile_torch_model() function will quantize the network post-training, using the X_train dataset for calibration and will then compile the network to FHE.

While performance of networks quantized post-training will degrade rapidly with increasing complexity of the networks (currently 2-3 neurons are supported at most), our efforts for now are focused on feature-completeness. To this end, Concrete ML currently supports a wide array of operators in torch networks, by using an ONNX conversion pipeline. We’re also making bigger networks work well under FHE constraints.

Concrete ML brings an appealing solution for private computation of ML models. Tree-based classifiers are highly performant and are especially well suited to FHE thanks to Zama’s Programmable Bootstrapping. Concrete ML makes them easy to use. We are also working to bring linear models and neural networks to this standard in the near future.

Additional Links

- Release notes

- Github repo

- Documentation

- List of contributors

Related Blog Posts

[Video Tutorial] Improving Multiple-GPU Throughput Using TFHE-rs

Tutorials

In this tutorial, Zama team member Agnes Leroy, shows you how to improve multiple-GPU throughput using TFHE-rs.

Zama Bounty Program Season 8

Announcements

Announcing the winning submissions from Season 7 and the new bounties for Season 8.

Call For Builders: Onboard The Next Trillions In DeFi With Confidential Lending

Confidential Blockchain

DeFi is fast, open, and efficient—but too transparent for institutions. What if it offered Swiss-bank-level privacy?

Read more →

Back to blog

Privacy is necessary for an open society in the electronic age. Privacy is not secrecy. A private matter is something one doesn't want the whole world to know, but a secret matter is something one doesn't want anybody to know. Privacy is the power to selectively reveal oneself to the world.If two parties have some sort of dealings, then each has a memory of their interaction. Each party can speak about their own memory of this; how could anyone prevent it? One could pass laws against it, but the freedom of speech, even more than privacy, is fundamental to an open society; we seek not to restrict any speech at all. If many parties speak together in the same forum, each can speak to all the others and aggregate together knowledge about individuals and other parties. The power of electronic communications has enabled such group speech, and it will not go away merely because we might want it to.Since we desire privacy, we must ensure that each party to a transaction have knowledge only of that which is directly necessary for that transaction. Since any information can be spoken of, we must ensure that we reveal as little as possible. In most cases personal identity is not salient. When I purchase a magazine at a store and hand cash to the clerk, there is no need to know who I am. When I ask my electronic mail provider to send and receive messages, my provider need not know to whom I am speaking or what I am saying or what others are saying to me; my provider only need know how to get the message there and how much I owe them in fees. When my identity is revealed by the underlying mechanism of the transaction, I have no privacy. I cannot here selectively reveal myself; I must always reveal myself.Therefore, privacy in an open society requires anonymous transaction systems. Until now, cash has been the primary such system. An anonymous transaction system is not a secret transaction system. An anonymous system empowers individuals to reveal their identity when desired and only when desired; this is the essence of privacy.Privacy in an open society also requires cryptography. If I say something, I want it heard only by those for whom I intend it. If the content of my speech is available to the world, I have no privacy. To encrypt is to indicate the desire for privacy, and to encrypt with weak cryptography is to indicate not too much desire for privacy. Furthermore, to reveal one's identity with assurance when the default is anonymity requires the cryptographic signature.We cannot expect governments, corporations, or other large, faceless organizations to grant us privacy out of their beneficence. It is to their advantage to speak of us, and we should expect that they will speak. To try to prevent their speech is to fight against the realities of information. Information does not just want to be free, it longs to be free. Information expands to fill the available storage space. Information is Rumor's younger, stronger cousin; Information is fleeter of foot, has more eyes, knows more, and understands less than Rumor.We must defend our own privacy if we expect to have any. We must come together and create systems which allow anonymous transactions to take place. People have been defending their own privacy for centuries with whispers, darkness, envelopes, closed doors, secret handshakes, and couriers. The technologies of the past did not allow for strong privacy, but electronic technologies do.We the Cypherpunks are dedicated to building anonymous systems. We are defending our privacy with cryptography, with anonymous mail forwarding systems, with digital signatures, and with electronic money.Cypherpunks write code. We know that someone has to write software to defend privacy, and since we can't get privacy unless we all do, we're going to write it. We publish our code so that our fellow Cypherpunks may practice and play with it. Our code is free for all to use, worldwide. We don't much care if you don't approve of the software we write. We know that software can't be destroyed and that a widely dispersed system can't be shut down.Cypherpunks deplore regulations on cryptography, for encryption is fundamentally a private act. The act of encryption, in fact, removes information from the public realm. Even laws against cryptography reach only so far as a nation's border and the arm of its violence. Cryptography will ineluctably spread over the whole globe, and with it the anonymous transactions systems that it makes possible.For privacy to be widespread it must be part of a social contract. People must come and together deploy these systems for the common good. Privacy only extends so far as the cooperation of one's fellows in society. We the Cypherpunks seek your questions and your concerns and hope we may engage you so that we do not deceive ourselves. We will not, however, be moved out of our course because some may disagree with our goals.The Cypherpunks are actively engaged in making the networks safer for privacy. Let us proceed together apace.Onward.Eric Hughes9 March 1993