It’s now common practice to train machine-learning models in the cloud, using the cloud’s ability to scale performance, its lower maintenance cost, and access to on-demand computing. But the cloud is not always secure.
To safeguard data, one solution is to train AI models with encrypted data sets, without making the secret key available in the cloud. This way, the cloud would train a model using data it cannot “see” to produce a model that only the data owner could decrypt and manipulate. Homomorphic encryption (HE) makes that possible.
This technology enables oblivious computation over encrypted data. For instance, if you are a data scientist working in a bank, typically you’d have to go through various hoops – bureaucratic and data anonymization stages – to get access to anonymized data. Work takes a lot of time, models are not so accurate, and one can’t really rely on the power of the cloud. The anonymization process often goes beyond removing personally identifying information by adding noise, blurring, or dropping crucial details that might improve model accuracy substantially. With HE, however, clear text data is never exposed, meaning you’re not restricted to use only partial or anonymized information, allowing you to build more accurate models anywhere.
Homomorphic encryption as a concept isn’t new – it’s been discussed and researched in academic circles for over a decade.1 But it is not yet widely integrated in production environments because of two stumbling blocks, usability and performance. IBM Research has been making great strides to address both and to transition homomorphic encryption from a theoretical concept to a production-ready solution.
Our most recent milestone is HE4Cloud, a Fully Homomorphic Encryption (FHE) Cloud Service. This platform is aimed at deploying privacy-preserving compute on the cloud. In a cloud-native SaaS experience, it allows our clients to deploy their machine learning models and use encrypted data either to train them or to simply run inference requests.
The platform uses IBM’s HELayers, a software development kit for FHE, that lets data scientists and business analysts to smoothly transition from cleartext to encrypted data analytics. This way, data scientists can continue to use tools that they are familiar with, such as PyTorch or Keras. HELayers takes care of the rest, automatically choosing the FHE scheme, implementation, hyper parameters, configurations, packing, and so on to provide the best experience.
IBM’s HE cloud service is aimed at helping our customers’ data science teams to take the next step in their cloud journey. And it offers more than just using the cloud as a storage location for an encrypted data lake: With HE, users would be able to unlock the power of the cloud, storing, analyzing, and training over data that remains encrypted throughout the data-science processes.
When it comes to performance, we have continuously been pushing the limits of what’s possible. A few years ago, a simple two-layer neural network would have taken minutes to run, and deep neural networks were beyond reach. Now, we can run a two-layer neural network in less than 20 milliseconds, and a 20-layer neural network in just a few minutes. And we can run models such as XGBoost (for fraud detection for example), ARIMA (for time series data), and others over encrypted data, showing the breadth of possibilities and the practicality in terms of performance and accuracy.
We hope that this work will enable more companies to use the power of the cloud and lead to more industry collaborations on AI projects.
Learn more about the IBM Research projects in FHE:
Date08 Dec 2022
Gentry, Craig. A fully homomorphic encryption scheme. Stanford University, 2009. ↩