End-to-end Encrypted Shazam Using Fully Homomorphic Encryption

February 14, 2024

—

Andrei Stoian

× Pssst! We just announced the Seaon 5 of the Zama Bounty Program 🏆🚀 See it on Github here.

‍

During the Zama Bounty Program Season 4, we asked the community to create a privacy-preserving version of Shazam using Fully Homomorphic Encryption (FHE) and Zama’s libraries. In this blog post, we are taking a look at the design of the winning solutions alongside their implementation:
🥇 1st place: A submission by Iamayushanand
‍🥈 2nd place A submission by GoktuEk

Introduction

There is no need to introduce Shazam. Indeed, despite being over 20 years old, it remains a top choice for music recognition. Today, song identification is not only used for recognizing a specific tune, it is also widely used in music streaming apps, as it enhances user experience by facilitating deeper exploration of music libraries and offering a more personalized listening experience.

At Zama, we are convinced that FHE is the future, paving the way for a new era of privacy-preserving applications. Considering Shazam's popularity, integrating FHE in its design shows how mainstream apps can benefit from it, without altering the user experience. And this is what we call privacy by design.

We also believe that it is the developers who will be in charge of safeguarding the privacy for millions of their users and this is why we are building in the open for them. The motivation behind the Zama Bounty Program is to give developers a platform to learn FHE and be rewarded for it. To date, we have distributed tens of thousands of euros in rewards. In our most recent bounty season, we recognized two innovative solutions that enhance Shazam with privacy features.

Shazam's algorithm, detailed in a 2003 paper, involves users sending queries to its servers for matching in its central database—a prime scenario for applying FHE to protect user data. We challenged bounty participants to create a Shazam-esque app using FHE, allowing for any machine learning technique or the method outlined in Shazam's original paper.

Both of the winning solutions structured their work into two steps:

Song signature extraction on cleartext songs, performed on the client side
Look-up of the encrypted signatures in the music database on the server side

The winning solution (1st place) is based on a custom implementation using machine learning models, and matches a song against a 1000-song database in less than half a second, while providing a client/server application based on the Concrete ML classifiers.

The second solution (2nd place) reproduces the Shazam paper implementation while leveraging the power of Zama’s Concrete compiler.

Designing a privacy-preserving Shazam

First step: extract the audio features

Spectrograms are a popular approach to extracting features from raw audio. Spectrograms are created by applying the Fourier transform to overlapping segments of an audio track. They are resilient to noise and audio compression artifacts, common in recordings made by phones in noisy environments. Furthermore, by normalizing the amplitude spectrum it is possible to add invariance to audio volume.

Second step: convert the original Shazam algorithm to FHE

The original Shazam algorithm utilizes spectrograms, identifying local maxima within these and recording their time and frequency of occurrence.

Peaks in the spectrogram are key, yet identifying individual songs from billions demands even greater specificity. The Shazam paper highlights the use of peak pairs, close in time, to distinguish songs from brief recordings. A collection of such peak pairs, marking high-amplitude peaks with their frequencies and the time gap between them, uniquely identifies a song. Encoding frequencies on 10 bits and the time delta on 12 bits results in a 32-bit number which can be considered as a hash of the song.

The second prize-winning solution generates hashes from cleartext spectrograms, later encrypting these hashes for server transmission. To adapt, the precision of time deltas is scaled down to 8 bits, yielding more compact 28-bit hashes.

Identifying a song in the database now relies on counting the matching 28-bit hashes between the query song and each song of the database. Following Shazam's method, a sorted index is employed to compile a list of likely matching songs, with a final decision made based on the timing of these matches.

However, in an FHE-based implementation, a sorted index is not feasible since branching execution cannot occur with encrypted values. The second prize solution circumvented this by comparing the query hashes against all database hashes directly. For every song in the database, it tallied the matches, ultimately returning a list detailing the match count for each song.

Matching hashes involves an equality comparison. And that is where the TFHE scheme used by Zama stands out, as it allows exact comparisons thanks to the Programmable Bootstrapping (PBS). The cost of a PBS varies depending on the input precision. To optimize its solution, the second-prize winner divides each hash into 4-bit values. Consequently, comparing two 28-bit hashes transforms into a quicker series of equality checks across 7 4-bit chunks, followed by a summation and an additional test to verify all chunks match. This process demands 8 4-bit PBS operations per hash, with each song comprising 25 hashes.

The matching process for each song in the database takes approximately 3 seconds on a modern 8-core machine, with this model achieving an accuracy of 97%.

Spotlight on the winner: the classification-based approach

The winner of the challenge looked at the problem in a different way. They also based their work on spectrograms but stopped short of extracting ad-hoc features from them. They cut up the song in half-second windows, generated spectrograms for each, and extracted Mel-frequency cepstral coefficients (MFCC) from these. The coefficients extracted from 15-second segments of a song are compiled in a single feature vector

For each song in the database, a few dozen feature vectors are extracted and assigned the song's ID as their label. Following this, a multi-class logistic regression model is trained using Concrete ML, designed to classify each feature vector and match it to its corresponding song ID.

The linear models in Concrete ML operate without the need for programmable bootstrapping, making them significantly faster than methods that require bootstrapping. This efficiency enables the model to compare a query song against the entire database in less than half a second, all while maintaining a 95% accuracy rate.

Conclusion

We are very thrilled by the innovative approaches these two winning solutions used to tackle this challenge. Both participants demonstrated a creative use of the Concrete libraries, at both the machine learning level and in the cryptographic implementations. Once again, the developers have shown us that FHE can be a cornerstone for privacy protection, and that it has the power to enhance privacy in the popular apps used by millions.

Additional links

The official Bounty Program and Grant Program Github repository (Don't forget to star it ⭐️).
Our community channels, for support and more.
Zama on Twitter.

Related Blog Posts

[Video Tutorial] Improving Multiple-GPU Throughput Using TFHE-rs

Tutorials

In this tutorial, Zama team member Agnes Leroy, shows you how to improve multiple-GPU throughput using TFHE-rs.

Zama Bounty Program Season 8

Announcements

Announcing the winning submissions from Season 7 and the new bounties for Season 8.

Call For Builders: Onboard The Next Trillions In DeFi With Confidential Lending

Confidential Blockchain

DeFi is fast, open, and efficient—but too transparent for institutions. What if it offered Swiss-bank-level privacy?

Read more →

Back to blog

Privacy is necessary for an open society in the electronic age. Privacy is not secrecy. A private matter is something one doesn't want the whole world to know, but a secret matter is something one doesn't want anybody to know. Privacy is the power to selectively reveal oneself to the world.If two parties have some sort of dealings, then each has a memory of their interaction. Each party can speak about their own memory of this; how could anyone prevent it? One could pass laws against it, but the freedom of speech, even more than privacy, is fundamental to an open society; we seek not to restrict any speech at all. If many parties speak together in the same forum, each can speak to all the others and aggregate together knowledge about individuals and other parties. The power of electronic communications has enabled such group speech, and it will not go away merely because we might want it to.Since we desire privacy, we must ensure that each party to a transaction have knowledge only of that which is directly necessary for that transaction. Since any information can be spoken of, we must ensure that we reveal as little as possible. In most cases personal identity is not salient. When I purchase a magazine at a store and hand cash to the clerk, there is no need to know who I am. When I ask my electronic mail provider to send and receive messages, my provider need not know to whom I am speaking or what I am saying or what others are saying to me; my provider only need know how to get the message there and how much I owe them in fees. When my identity is revealed by the underlying mechanism of the transaction, I have no privacy. I cannot here selectively reveal myself; I must always reveal myself.Therefore, privacy in an open society requires anonymous transaction systems. Until now, cash has been the primary such system. An anonymous transaction system is not a secret transaction system. An anonymous system empowers individuals to reveal their identity when desired and only when desired; this is the essence of privacy.Privacy in an open society also requires cryptography. If I say something, I want it heard only by those for whom I intend it. If the content of my speech is available to the world, I have no privacy. To encrypt is to indicate the desire for privacy, and to encrypt with weak cryptography is to indicate not too much desire for privacy. Furthermore, to reveal one's identity with assurance when the default is anonymity requires the cryptographic signature.We cannot expect governments, corporations, or other large, faceless organizations to grant us privacy out of their beneficence. It is to their advantage to speak of us, and we should expect that they will speak. To try to prevent their speech is to fight against the realities of information. Information does not just want to be free, it longs to be free. Information expands to fill the available storage space. Information is Rumor's younger, stronger cousin; Information is fleeter of foot, has more eyes, knows more, and understands less than Rumor.We must defend our own privacy if we expect to have any. We must come together and create systems which allow anonymous transactions to take place. People have been defending their own privacy for centuries with whispers, darkness, envelopes, closed doors, secret handshakes, and couriers. The technologies of the past did not allow for strong privacy, but electronic technologies do.We the Cypherpunks are dedicated to building anonymous systems. We are defending our privacy with cryptography, with anonymous mail forwarding systems, with digital signatures, and with electronic money.Cypherpunks write code. We know that someone has to write software to defend privacy, and since we can't get privacy unless we all do, we're going to write it. We publish our code so that our fellow Cypherpunks may practice and play with it. Our code is free for all to use, worldwide. We don't much care if you don't approve of the software we write. We know that software can't be destroyed and that a widely dispersed system can't be shut down.Cypherpunks deplore regulations on cryptography, for encryption is fundamentally a private act. The act of encryption, in fact, removes information from the public realm. Even laws against cryptography reach only so far as a nation's border and the arm of its violence. Cryptography will ineluctably spread over the whole globe, and with it the anonymous transactions systems that it makes possible.For privacy to be widespread it must be part of a social contract. People must come and together deploy these systems for the common good. Privacy only extends so far as the cooperation of one's fellows in society. We the Cypherpunks seek your questions and your concerns and hope we may engage you so that we do not deceive ourselves. We will not, however, be moved out of our course because some may disagree with our goals.The Cypherpunks are actively engaged in making the networks safer for privacy. Let us proceed together apace.Onward.Eric Hughes9 March 1993