Winning the TikTok Hackathon using Zama's Concrete ML and Fully Homomorphic Encryption
This is a guest blog post written by Jeremiah Au, Nigel Lee, PJ Anthony, Vansh Nath.
TLDR: Online advertising today depends heavily on browser cookies that track and identify user behavior, often at the expense of privacy. With Fully Homomorphic Encryption (FHE), it's possible to revolutionize this model by enabling privacy-preserving ad targeting, where user preferences are encrypted and only the user can access them. During the TikTok Hackathon, a group of students from NUS developed an ad-serving system built on Zama's Concrete ML, showcasing how FHE can power a new, privacy-respecting era for online advertising.
About the TikTok TechJam
We are a group of Computer Science students from the National University of Singapore (NUS), and this year, we had the exciting opportunity to participate in the 2024 edition of the TikTok TechJam. This global hackathon challenges students to innovate, create, and demonstrate technical excellence by solving real-world problems across various domains.
Among the various problem statements, the one focused on privacy immediately caught our attention. It introduced us to fascinating concepts we had never encountered before, including ‘Private Set Intersection,’ ‘Differential Privacy,’ and the one that intrigued us the most: "Fully Homomorphic Encryption (FHE)". Our curiosity about these groundbreaking technologies, combined with the growing importance of privacy in today’s digital world, motivated us to take on this challenge.
Searching for the perfect FHE library
After much brainstorming, we developed a proof of concept to demonstrate how Fully Homomorphic Encryption (FHE) can enhance privacy protection in targeted advertising. This set us on a search for libraries and frameworks that could help us turn our idea into reality. During our search, we discovered Zama’s Concrete ML on GitHub and immediately saw its potential for our project.
At first glance, Concrete ML seemed complex, with extensive documentation covering different models, configuration states, bootstrapping, and more. However, the quick-start guide and code examples were comprehensive and user-friendly. While some aspects were initially unclear, we quickly found clarity by exploring the clean and well-structured source code. The abstraction provided by Concrete ML was invaluable, allowing us to integrate FHE into our solution without needing deep expertise in the underlying cryptographic details.
About our project: AnonymousAds
In today’s era of 'freemium' software, many companies rely on personalized advertising to generate revenue, often at the cost of user privacy. As users become increasingly privacy-conscious, they remain hesitant to pay for services they’ve come to expect for free. This tension made our objective clear: to create a secure protocol that allows businesses to serve personalized ads without compromising user privacy.
To achieve this, we developed a proof of concept that leverages search engine queries to predict user interest categories and serve targeted ads, while ensuring that the server-side remains completely unaware of the user's identity or preferences.
Project and solution’s architecture
Pre-processing:
First, the user’s search engine queries are processed and cleaned using natural language techniques such as stemming and case-folding to prepare the data for encryption.
Encryption:
Once cleaned, the data is encrypted on the client side using a Fully Homomorphic Encryption (FHE) private key, ensuring that sensitive information never leaves the user’s device in plain text. The encrypted data is then sent to the server.
Transformation:
On the server side, the encrypted data is processed through a neural network regressor model, which was trained to convert search engine usage statistics into "user interest categories" for ad targeting. Due to the properties of FHE, this transformation happens on encrypted data, meaning the server never sees the actual queries or the resulting predictions.
Result encryption and transmission:
The model’s output, which is the predicted interest categories, remains encrypted thanks to FHE and is sent back to the client. Since the server does not have access to the private key, it cannot see either the input (search queries) or the output (interest categories).
Decryption and aggregation:
Once the encrypted result reaches the client, it is decrypted using the FHE private key. As the user makes more search queries, the model continues to refine its predictions using Bayesian updates, combining new data with historical information to provide increasingly accurate recommendations.
Noise injection:
To further protect user privacy, the client obfuscates the true predictions by requesting ads based on a mixture of genuine interest categories and randomly generated ones, a technique known as noise injection. This makes it impossible for the server to pinpoint the user’s true interests.
Finally, the client selects and displays the most relevant ads based on the true interest categories, ensuring a personalized experience without compromising privacy.
Implementation and deployment
Our project was divided into three main components, each playing a distinct role in the system:
- dev: This component was responsible for generating and training the Concrete ML models used in the project. Here, we focused on building and optimizing the neural network that would later be deployed for encrypted inference.
- search-engine: This represented the user’s device and served as the front-end, where users interacted with the search engine. It handled the query processing, encryption, decryption, and the user interface for receiving targeted ads.
- server: This component simulated the company’s server, where the encrypted data was processed using the Concrete ML model. The server performed the computations on encrypted data, ensuring that sensitive information remained private throughout the process.
We ran the entire project on our local machine, leveraging Docker for containerization to ensure a consistent and isolated environment. The Concrete ML model was carefully tuned with a variety of parameters to strike the optimal balance between performance and accuracy.
For hosting the web interface, we used Flask, while libraries such as Numpy, Pandas, and Sklearn were utilized to handle data processing tasks. These tools helped us efficiently manage and preprocess data before encryption.
Additionally, we implemented an extensive logging system throughout the project. This allowed us to track and demonstrate each step of our protocol, providing the judges with clear insights into what was happening behind the scenes during the encrypted data processing.
Conclusion
Participating in the TikTok TechJam was an exhilarating experience that pushed us to think outside the box and explore technologies we had never worked with before. Integrating Zama’s Concrete ML into our solution was not only fun but also a valuable learning experience about the potential of Fully Homomorphic Encryption (FHE). Looking back, it’s incredible to think that our project won! You can check out our complete solution on Github: Anonymous Ads.
A special thank you to the Zama team for the opportunity to share our journey in this blog post.
Additional links
- Star Zama's Concrete ML GitHub repository to endorse Zama’s work and help them reach the 1,000 stars ⭐️ milestone!
- Review the Concrete ML documentation.
- Get support on our community channels.