Winning the TikTok Hackathon using Zama's Concrete ML and Fully Homomorphic Encryption

October 29, 2024

—

Jeremiah Au, Nigel Lee, PJ Anthony, Vansh Nath

This is a guest blog post written by Jeremiah Au, Nigel Lee, PJ Anthony, Vansh Nath.

TLDR: Online advertising today depends heavily on browser cookies that track and identify user behavior, often at the expense of privacy. With Fully Homomorphic Encryption (FHE), it's possible to revolutionize this model by enabling privacy-preserving ad targeting, where user preferences are encrypted and only the user can access them. During the TikTok Hackathon, a group of students from NUS developed an ad-serving system built on Zama's Concrete ML, showcasing how FHE can power a new, privacy-respecting era for online advertising.

About the TikTok TechJam

We are a group of Computer Science students from the National University of Singapore (NUS), and this year, we had the exciting opportunity to participate in the 2024 edition of the TikTok TechJam. This global hackathon challenges students to innovate, create, and demonstrate technical excellence by solving real-world problems across various domains.

Among the various problem statements, the one focused on privacy immediately caught our attention. It introduced us to fascinating concepts we had never encountered before, including ‘Private Set Intersection,’ ‘Differential Privacy,’ and the one that intrigued us the most: "Fully Homomorphic Encryption (FHE)". Our curiosity about these groundbreaking technologies, combined with the growing importance of privacy in today’s digital world, motivated us to take on this challenge.

Searching for the perfect FHE library

After much brainstorming, we developed a proof of concept to demonstrate how Fully Homomorphic Encryption (FHE) can enhance privacy protection in targeted advertising. This set us on a search for libraries and frameworks that could help us turn our idea into reality. During our search, we discovered Zama’s Concrete ML on GitHub and immediately saw its potential for our project.

At first glance, Concrete ML seemed complex, with extensive documentation covering different models, configuration states, bootstrapping, and more. However, the quick-start guide and code examples were comprehensive and user-friendly. While some aspects were initially unclear, we quickly found clarity by exploring the clean and well-structured source code. The abstraction provided by Concrete ML was invaluable, allowing us to integrate FHE into our solution without needing deep expertise in the underlying cryptographic details.

About our project: AnonymousAds

In today’s era of 'freemium' software, many companies rely on personalized advertising to generate revenue, often at the cost of user privacy. As users become increasingly privacy-conscious, they remain hesitant to pay for services they’ve come to expect for free. This tension made our objective clear: to create a secure protocol that allows businesses to serve personalized ads without compromising user privacy.

To achieve this, we developed a proof of concept that leverages search engine queries to predict user interest categories and serve targeted ads, while ensuring that the server-side remains completely unaware of the user's identity or preferences.

Project and solution’s architecture

Pre-processing:
First, the user’s search engine queries are processed and cleaned using natural language techniques such as stemming and case-folding to prepare the data for encryption.

Encryption:
Once cleaned, the data is encrypted on the client side using a Fully Homomorphic Encryption (FHE) private key, ensuring that sensitive information never leaves the user’s device in plain text. The encrypted data is then sent to the server.

Transformation:
On the server side, the encrypted data is processed through a neural network regressor model, which was trained to convert search engine usage statistics into "user interest categories" for ad targeting. Due to the properties of FHE, this transformation happens on encrypted data, meaning the server never sees the actual queries or the resulting predictions.

Result encryption and transmission:
The model’s output, which is the predicted interest categories, remains encrypted thanks to FHE and is sent back to the client. Since the server does not have access to the private key, it cannot see either the input (search queries) or the output (interest categories).

Decryption and aggregation:
Once the encrypted result reaches the client, it is decrypted using the FHE private key. As the user makes more search queries, the model continues to refine its predictions using Bayesian updates, combining new data with historical information to provide increasingly accurate recommendations.

Noise injection:
To further protect user privacy, the client obfuscates the true predictions by requesting ads based on a mixture of genuine interest categories and randomly generated ones, a technique known as noise injection. This makes it impossible for the server to pinpoint the user’s true interests.

Finally, the client selects and displays the most relevant ads based on the true interest categories, ensuring a personalized experience without compromising privacy.

Implementation and deployment

Our project was divided into three main components, each playing a distinct role in the system:

dev: This component was responsible for generating and training the Concrete ML models used in the project. Here, we focused on building and optimizing the neural network that would later be deployed for encrypted inference.
search-engine: This represented the user’s device and served as the front-end, where users interacted with the search engine. It handled the query processing, encryption, decryption, and the user interface for receiving targeted ads.
server: This component simulated the company’s server, where the encrypted data was processed using the Concrete ML model. The server performed the computations on encrypted data, ensuring that sensitive information remained private throughout the process.

n_inputs = NUM_KEYWORDS
n_outputs = NUM_CATEGORIES
params = {
	"module__n_layers": 3,
	"module__activation_function" : nn.ReLU,
	"module__n_hidden_neurons_multiplier" : 4,
    
	"module__n_w_bits" : 4,
	"module__n_a_bits" : 4,
    
	"max_epochs": 150,
	"verbose" : True,
	"lr" : 1e-3,
}

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

concrete_regressor = NeuralNetRegressor(**params)
concrete_regressor.fit(X_train, y_train)
y_pred = concrete_regressor.predict(X_test)

print(np.sum((y_pred - y_test) ** 2) / y_pred.shape[0])

concrete_regressor.compile(X_train)
dev = FHEModelDev(path_dir=FHE_FILE_PATH, model=concrete_regressor)

clear_fhe_dir()
dev.save()}

Fig 2: The parameters that were used, and the compilation of the Concrete ML model.

We ran the entire project on our local machine, leveraging Docker for containerization to ensure a consistent and isolated environment. The Concrete ML model was carefully tuned with a variety of parameters to strike the optimal balance between performance and accuracy.

For hosting the web interface, we used Flask, while libraries such as Numpy, Pandas, and Sklearn were utilized to handle data processing tasks. These tools helped us efficiently manage and preprocess data before encryption.

Additionally, we implemented an extensive logging system throughout the project. This allowed us to track and demonstrate each step of our protocol, providing the judges with clear insights into what was happening behind the scenes during the encrypted data processing.

2024-07-07 22:25:56,819 - INFO - Received 200 OK response from server
2024-07-07 22:25:56,821 - INFO - Starting decryption of received vector
2024-07-07 22:25:56,834 - INFO - Cleaned and normalized predictions
2024-07-07 22:25:56,834 - INFO - Predictions received and decrypted from server:
2024-07-07 22:25:56,834 - INFO - 1. Sports    Probability: 0.9636
2024-07-07 22:25:56,835 - INFO - 2. Food      Probability: 0.0187
2024-07-07 22:25:56,835 - INFO - 3. Gaming    Probability: 0.0177
2024-07-07 22:25:56,835 - INFO - 4. Music     Probability: 0.0000
2024-07-07 22:25:56,835 - INFO - 5. Tv        Probability: 0.0000
2024-07-07 22:25:56,835 - INFO - ------------------------------------------------
2024-07-07 22:25:56,835 - INFO - Starting process to update existing predictions using bayesian inference
2024-07-07 22:25:56,836 - INFO - Total predictions: 6
2024-07-07 22:25:56,843 - INFO - Prediction file './tmp/predict.txt' has been written to.
2024-07-07 22:25:56,843 - INFO - Successfully updated existing predictions using bayesian inference
2024-07-07 22:25:56,843 - INFO - Updated predictions:
2024-07-07 22:25:56,843 - INFO - 1. Sports    Probability: 0.9385
2024-07-07 22:25:56,843 - INFO - 2. Food      Probability: 0.0290
2024-07-07 22:25:56,843 - INFO - 3. Gaming    Probability: 0.0165
2024-07-07 22:25:56,843 - INFO - 4. Music     Probability: 0.0161
2024-07-07 22:25:56,843 - INFO - 5. Tv        Probability: 0.0000
2024-07-07 22:25:56,849 - INFO - 172.27.0.1 - - [07/Jul/2024 22:25:56] "POST /send_search_history HTTP/1.1" 200 -
2024-07-07 22:25:56,863 - INFO - Starting send_ads process
2024-07-07 22:25:56,863 - INFO - Successfully selected best ads
2024-07-07 22:25:56,863 - INFO - Successfully added noisy ads
2024-07-07 22:25:56,863 - INFO - Completed send_ads process
2024-07-07 22:25:56,940 - INFO - 172.27.0.1 - - [07/Jul/2024 22:25:56] "POST /get_ads HTTP/1.1" 200 -

Fig 3: Example of the logs that were generated.

Conclusion

Participating in the TikTok TechJam was an exhilarating experience that pushed us to think outside the box and explore technologies we had never worked with before. Integrating Zama’s Concrete ML into our solution was not only fun but also a valuable learning experience about the potential of Fully Homomorphic Encryption (FHE). Looking back, it’s incredible to think that our project won! You can check out our complete solution on Github: Anonymous Ads.

A special thank you to the Zama team for the opportunity to share our journey in this blog post.

Jeremiah Au, Nigel Lee, PJ Anthony, Vansh Nath.

Additional links

Star Zama's Concrete ML GitHub repository to endorse Zama’s work and help them reach the 1,000 stars ⭐️ milestone!
Review the Concrete ML documentation.
Get support on our community channels.

Latest Blog Posts

TFHE-rs v1.3: Faster Division on CPU, Key Upgrader & Memory Tracking on GPU

TFHE-rs

TFHE-rs v1.3 brings several major improvements and new features across CPU, GPU, and HPU backends.

Zama Product Releases - July 2025

Announcements

With these releases, Zama continues to build its suite of products to make homomorphic encryption accessible, easy, and fast.

Announcing Our Series B and the Zama Confidential Blockchain Protocol

Announcements

Introducing the Zama Protocol

Read more →

Back to blog

Privacy is necessary for an open society in the electronic age. Privacy is not secrecy. A private matter is something one doesn't want the whole world to know, but a secret matter is something one doesn't want anybody to know. Privacy is the power to selectively reveal oneself to the world.If two parties have some sort of dealings, then each has a memory of their interaction. Each party can speak about their own memory of this; how could anyone prevent it? One could pass laws against it, but the freedom of speech, even more than privacy, is fundamental to an open society; we seek not to restrict any speech at all. If many parties speak together in the same forum, each can speak to all the others and aggregate together knowledge about individuals and other parties. The power of electronic communications has enabled such group speech, and it will not go away merely because we might want it to.Since we desire privacy, we must ensure that each party to a transaction have knowledge only of that which is directly necessary for that transaction. Since any information can be spoken of, we must ensure that we reveal as little as possible. In most cases personal identity is not salient. When I purchase a magazine at a store and hand cash to the clerk, there is no need to know who I am. When I ask my electronic mail provider to send and receive messages, my provider need not know to whom I am speaking or what I am saying or what others are saying to me; my provider only need know how to get the message there and how much I owe them in fees. When my identity is revealed by the underlying mechanism of the transaction, I have no privacy. I cannot here selectively reveal myself; I must always reveal myself.Therefore, privacy in an open society requires anonymous transaction systems. Until now, cash has been the primary such system. An anonymous transaction system is not a secret transaction system. An anonymous system empowers individuals to reveal their identity when desired and only when desired; this is the essence of privacy.Privacy in an open society also requires cryptography. If I say something, I want it heard only by those for whom I intend it. If the content of my speech is available to the world, I have no privacy. To encrypt is to indicate the desire for privacy, and to encrypt with weak cryptography is to indicate not too much desire for privacy. Furthermore, to reveal one's identity with assurance when the default is anonymity requires the cryptographic signature.We cannot expect governments, corporations, or other large, faceless organizations to grant us privacy out of their beneficence. It is to their advantage to speak of us, and we should expect that they will speak. To try to prevent their speech is to fight against the realities of information. Information does not just want to be free, it longs to be free. Information expands to fill the available storage space. Information is Rumor's younger, stronger cousin; Information is fleeter of foot, has more eyes, knows more, and understands less than Rumor.We must defend our own privacy if we expect to have any. We must come together and create systems which allow anonymous transactions to take place. People have been defending their own privacy for centuries with whispers, darkness, envelopes, closed doors, secret handshakes, and couriers. The technologies of the past did not allow for strong privacy, but electronic technologies do.We the Cypherpunks are dedicated to building anonymous systems. We are defending our privacy with cryptography, with anonymous mail forwarding systems, with digital signatures, and with electronic money.Cypherpunks write code. We know that someone has to write software to defend privacy, and since we can't get privacy unless we all do, we're going to write it. We publish our code so that our fellow Cypherpunks may practice and play with it. Our code is free for all to use, worldwide. We don't much care if you don't approve of the software we write. We know that software can't be destroyed and that a widely dispersed system can't be shut down.Cypherpunks deplore regulations on cryptography, for encryption is fundamentally a private act. The act of encryption, in fact, removes information from the public realm. Even laws against cryptography reach only so far as a nation's border and the arm of its violence. Cryptography will ineluctably spread over the whole globe, and with it the anonymous transactions systems that it makes possible.For privacy to be widespread it must be part of a social contract. People must come and together deploy these systems for the common good. Privacy only extends so far as the cooperation of one's fellows in society. We the Cypherpunks seek your questions and your concerns and hope we may engage you so that we do not deceive ourselves. We will not, however, be moved out of our course because some may disagree with our goals.The Cypherpunks are actively engaged in making the networks safer for privacy. Let us proceed together apace.Onward.Eric Hughes9 March 1993