How to Deploy a Machine Learning Model With Concrete ML

May 30, 2023
Luis Montero

× Concrete ML is a Privacy-Preserving Machine Learning set of tools that aims to simplify the use of Fully Homomorphic Encryption (FHE) for developers so they can automatically turn machine learning models into their homomorphic equivalent.

Github Documentation

Concrete ML v1.0.0 introduced several new features, such as improved performance and better model development assistance. Let’s look at how to use Concrete ML v1.0.0 to deploy machine learning models. The scripts in this blog post are illustrative of the deployment tools that you can build.

To start, access the code examples used in this simple Concrete ML model that performs breast cancer classification. Keep in mind that some of them are not part of the Concrete ML PyPI package. The scripts are based on Boto3, and they deploy Concrete ML models to an AWS EC2 hosted FastAPI server.

Let's review the provided example that focuses on the confidential diagnosis of breast cancer using Fully Homomorphic Encryption (FHE).

Model serialization

So you trained a Concrete ML model and you want to serve it to your users?

First, compile the model and use the simulation feature to make sure that the predictions of your model match what you expect.

model.predict(fhe="simulate")

Once you’re happy with the accuracy, serialize your model for deployment:

dev = FHEModelDev("./path_to_model", model)
dev.save()

Replace “./path_to_model” with a directory name of your choice. You can use:

dev.save(via_mlir=True)  

for cross-platform compatibility (e.g., training your model on a Mac M1 and deploying to an Intel based server). 

Train the model with train_with_docker.sh, with:

# From root of Concrete ML repository
cd ./use_case_examples/deployment/breast_cancer_builtin  

# Will create a dev folder with client/server.zip files
bash train_with_docker.sh

This will pull Concrete ML’s Docker image, which may take a bit of time.

After the model is trained and saved, all assets are ready for deployment to a cloud provider. 

Deployment

Deploying the server to AWS.

You can test the deployment locally on your personal machine, but in production environments, FHE works best when deployed to a cloud provider that offers a wide variety of powerful machines. Since FHE can be computationally intensive, it makes sense to deploy your model on a compute-optimized instance.

Concrete ML includes utility scripts to ease deployment to AWS. This takes the form of a simple CLI that leverages Boto3 under the hood.

# Create a AWS EC2 instance and launches server
python -m concrete.ml.deployment.deploy_to_aws \
	--path-to-model "./dev" \
	--port 5000 \
	--instance-type "c5.4xlarge" \
	--instance-name "my_super_model" \
	--verbose 1 \
	--wait-bar 1

In this command line, the options can be changed: --instance-type can be replaced with the instance types available on AWS, while --instance-name is just an identifier that lets you find your instance in the AWS console. 

This command line performs the following steps:

  • Creates an AWS EC2 instance with proper permissions, security-group, ssh-keys, public IP address
  • Waits until the instance is available through SSH (note that instance start-up can take a few seconds)
  • Copies the needed files, mainly the source for the server application and the serialized model, from your local machines to the remote instance using scp
  • Installs all needed dependencies on the server (in a tmux session)
  • Runs the server (also in a tmux session)

For more advanced users who may want to have a look at how it works under the hood, details are given in server.py.

That’s it! You have now deployed a Concrete ML model.

In the logs of the deployment script, you'll find the URL of the FastAPI server. You will need to keep this URL in order to use it in the client code.

Creating the client.

Depending on your data or use case, you might need to develop a client application. Here are several examples of how to write client application code.

For the example discussed here on Breast Cancer Diagnosis, the client code is fairly straightforward and shows how simple an FHE client can be. To run it, build the Docker image for the client using the script provided, then launch the Docker container using the appropriate script:

# Build docker image
python build_docker_client_image.py  

# Run and attach the terminal to the docker container
bash client.sh  

# Launch in the container. This will trigger the inference using the remote server
URL="" python client.py

When launching the inference of your model in FHE, specify the IP address of the FHE endpoint as the URL environment variable.

Going further.

Check out the Concrete ML documentation for more information. You’ll find an example on the usage of the Client/Server APIs and a section about deployment

Keep in mind that the script to deploy to AWS makes some assumptions on the tools available on the machine (like ssh) used to deploy the model. These assumptions may not be true on some systems. A future release of Concrete will include the use of AWS ECR and AWS ECS, but in the meantime you can use the approach described here on most Linux systems.

If you use a different cloud provider or if you encounter an issue deploying to AWS, you can always use the provided Dockerfile and the corresponding API to build your own Docker image to serve your model.

Note that the server currently holds all public keys in memory and that, for some models, this might be an issue. To solve this, you can either modify the server to use a database to manage the keys or you can use a deployment machine with enough RAM to hold your keys in memory.

If you don’t like bash scripting, the documentation shows you how to perform all the steps in this tutorial without leaving Python.

Conclusion

With a few simple scripts, you can deploy Concrete ML models on AWS. These scripts, though minimalistic, are illustrative of the deployment tools that you can build, which also feature user management, key management, and more. These scripts can also be used for prototyping and identifying the bottlenecks within your FHE application (key-size, FHE runtime, …) in a real-world client-server setting.

Additional links

Read more related posts