TFHE-rs v0.5: Detecting Overflows, Running on GPU and More

January 19, 2024
Jean-Baptiste Orfila

This new version of TFHE-rs introduces two key enhancements: GPU acceleration for improved performance and overflow detection for increased reliability in projects. Additionally, this version marks the start of ongoing backward compatibility in TFHE-rs, aimed at providing a more seamless and consistent experience for developers.

Overflow detection

Rust offers a feature where the program can panic and abort if there's an overflow, meaning an operation's result doesn't fit its output type. However, in homomorphic computations, this gets trickier. Since the server can't see the results due to the nature of Fully Homomorphic Encryption (FHE), detecting overflows is more complex. The newest version of TFHE-rs tackles this by introducing homomorphic operators that can detect overflow. It's important to note that using these overflow-detecting operators might slow things down a bit compared to standard operations. The table below shows some timing based on the type and precision involved.

Operation\Size Fhe8 Fhe16 Fhe32 Fhe64 Fhe128 Fhe256
Signed Add 67.21 ms 86.98 ms 106.01 ms 134.17 ms 167.69 ms 202.61 ms
Signed Overflow Add 76.54 ms 84.78 ms 104.23 ms 134.38 ms 162.99 ms 202.56 ms
Signed Sub 70.95 ms 87.87 ms 105.98 ms 136.15 ms 168.46 ms 202.87 ms
Signed Overflow Add 82.46 ms 86.92 ms 104.41 ms 132.21 ms 168.06 ms 201.17 ms
Signed Mul 123.09 ms 164.53 ms 221.56 ms 410.67 ms 1.05 s 3.41 s
Signed Overflow Mul 277.91 ms 365.67 ms 571.22 ms 1.21 s 3.57 s 12.84 s
Unsigned Add 60.43 ms 83.32 ms 107.84 ms 123.67 ms 149.67 ms 194.31 ms
Unsigned Overflow Add 63.67 ms 84.11 ms 107.95 ms 120.8 ms 147.38 ms 191.28 ms
Unsigned Sub 65.38 ms 82.05 ms 108.44 ms 126.25 ms 153.09 ms 193.85 ms
Unsigned Overflow Sub 68.89 ms 81.83 ms 107.63 ms 120.38 ms 150.21 ms 190.39 ms
Unsigned Mul 115.9 ms 163.65 ms 225.83 ms 410.53 ms 1.04 s 3.39 s
Unsigned Overflow Mul 140.76 ms 191.85 ms 272.65 ms 510.61 ms 1.34 s 4.51 s
Table 1: Timings (in ms) of overflowing operations as a function of the precision and the type (running on the AMD Epyc processor provided by the AWS hpc7a)

Overflow detection in TFHE-rs is designed to be efficient. It's not enabled by default to prevent slowing down operations. Instead, it's available through specific functions. The approach involves storing overflow information in a unique ciphertext. This ciphertext must be decrypted by the client for verification. Below is an example demonstrating how to use these new operations.

/// Adds two [FheUint] and returns a boolean indicating overflow.
///
/// * The operation is modular, i.e on overflow the result wraps around.
/// * On overflow the [FheBool] is true, otherwise false

use tfhe::prelude::*;
use tfhe::{generate_keys, set_server_key, ConfigBuilder, FheUint16};

let (client_key, server_key) = generate_keys(ConfigBuilder::default());
set_server_key(server_key);

let a = FheUint16::encrypt(u16::MAX, &client_key);
let b = FheUint16::encrypt(1u16, &client_key);

let (result, overflowed) = (&a).overflowing_add(&b);
let result: u16 = result.decrypt(&client_key);
assert_eq!(result, u16::MAX.wrapping_add(1u16));

// Check that the overflow has been detected
assert_eq!(overflowed.decrypt(&client_key), true);

GPU Powered Homomorphic Computation

TFHE-rs now harnesses GPU power through a CUDA implementation, enhancing its cryptographic capabilities. This update includes almost all operations for homomorphic unsigned integers. The table below summarizes how the timings vary depending on the precision.

Integrating the GPU backend into existing programs is straightforward, requiring minimal changes. Example 2 illustrates this process. The primary adjustment is in the key settings: the client needs to create a compressed server key, which is then sent to the server operating the GPU backend. The server's role is simple – decompress the key and then proceed with the standard homomorphic instructions.

Operation \ Size FheUint8 FheUint16 FheUint32 FheUint64 FheUint128 FheUint256
Add/Sub 103.33 ms 129.26 ms 156.83 ms 186.99 ms 320.96 ms 528.15 ms
Bitwise operations (and/or/xor) 26.1 ms 26.21 ms 26.57 ms 27.23 ms 43.05 ms 65.0 ms
Equality/Difference 52.82 ms 53.0 ms 79.4 ms 79.58 ms 96.37 ms 145.25 ms
Comparisons 104.7 ms 130.23 ms 156.19 ms 183.2 ms 213.43 ms 288.76 ms
Max/Min 156.7 ms 182.65 ms 210.74 ms 251.78 ms 316.9 ms 442.71 ms
Mul 219.73 ms 302.11 ms 465.91 ms 955.66 ms 2.71 s 9.15 s
Negation 103.26 ms 129.4 ms 157.19 ms 187.09 ms 321.27 ms 530.11 ms
Table 2: Timings (in ms) of unsigned operations using the GPU backend as a function of the precision (running on a single Nvidia V100 provided by the AWS p3.2xlarge)

For detailed guidance on configuring the GPU, please refer to the TFHE-rs documentation.

use tfhe::{ConfigBuilder, set_server_key, FheUint8, ClientKey, CompressedServerKey};
use tfhe::prelude::*;

fn main() {
	//Client-side
	let config = ConfigBuilder::default().build();

	let client_key= ClientKey::generate(config);
	let compressed_server_key = CompressedServerKey::new(&client_key);

	let clear_a = 27u8;
	let clear_b = 128u8;

	let a = FheUint8::encrypt(clear_a, &client_key);
	let b = FheUint8::encrypt(clear_b, &client_key);

	//Server-side
    let gpu_key = compressed_server_key.decompress_to_gpu();
    set_server_key(gpu_key);
	let result = a + b;

	//Client-side
	let decrypted_result: u8 = result.decrypt(&client_key);

	let clear_result = clear_a + clear_b;

	assert_eq!(decrypted_result, clear_result);
}

Miscellaneous features

TFHE-rs has been updated with several new features and enhancements, including:

  • Data Backward Compatibility: Despite TFHE-rs not being in a stable version, which leads to occasional breaking changes between releases, the library provides tools to help FHE application developers seamlessly update their data from TFHE-rs 0.4 to 0.5. For migration scenarios and examples, see documentation.
  • Enhanced KS-PBS Timings: The keyswitch operation in TFHE-rs has been optimized to be 20% to 35% faster. This improvement means that the KS-PBS atomic pattern, a crucial operation in TFHE, now executes in approximately 12 ms on the CPU, using the same benchmark configuration as before.
  • Accelerated Addition for Vector of ciphertexts: The process for adding vectors of ciphertexts in TFHE-rs has been significantly optimized, achieving up to a 5x speedup compared to the previous version.
  • In the lower levels, there's an update to how large integers are handled. Now, integers based on the Residue Number System (RNS) are easier to use in homomorphic circuits, especially for operations like additions and multiplications.
  • Now, there's a homomorphic circuits simulator for debugging. It uses simple ciphertexts, speeding up execution times significantly. This means all operations mimic their encrypted counterparts, making the debugging process quicker and easier.

Additional links

Read more related posts