Quality machine learning models require quality training data that is sometimes difficult to acquire due to privacy or regulatory constraints. Federated learning is a technology that solves this issue by allowing multiple parties to collaboratively train a single machine learning model without sharing any of their training data. Federated learning not only improves model generalization, but also opens new doors for enterprises to train machine learning models that extract knowledge of training data that cannot be directly accessed. In some cases we may need more than the standard version of federated learning to keep data private.
Some federations have stringent privacy requirements or are subject to regulation that may dictate that the system needs to add additional protection mechanisms to prevent inference of private information. In those cases, transmitting the model updates in plaintext may allow potential adversaries to infer private data. Fully homomorphic encryption (FHE) can help us reduce the risk by hiding the final model from the aggregator and only revealing the aggregated version to the parties.
FHE is a cryptographic system that allows an entity to perform computations on encrypted data without decrypting it. In other words, homomorphic encryption enables the computation of a function over encrypted inputs and produces the result in encrypted form.
In standard implementations of federated learning with FHE, parties agree on a private key that they use to encrypt all model updates before been sent to the aggregator.
We have included fully homomorphic encryption into our IBM federated learning framework and now you can try it out. We believe there are some clear advantages with this approach:
- The aggregator doesn’t get access to any of the model updates preventing potential inference attacks. This enables parties to trust a third-party aggregator to orchestrate the learning process without revealing the produced model or transmitting their raw data.
- Only parties get access to global models, preventing any model steal attack from the aggregator.
- Because parties operate over the plain-text model, the type of model we can train is not restricted. For example, a neural network can have any activation function even though the overall system is protected by fully homomorphic encryption.
- It is possible to use algorithms that utilize the number of data samples during the fusion step, such as FedAvg without revealing the number of training samples in the dataset.
The training times exceeded our expectations and are not as you might expect:
For those curious about the technical details of this process, check out our more detailed discussion:
The below figure shows how FHE keys are distributed among parties and the aggregator. We also demonstrate how the federated learning process proceeds. Interestingly the training process happens in plain text allowing the utilization of any type of model.
Fully homomorphic encryption is a lattice-based public key cryptosystem that allows operations on encrypted messages. Being a public key cryptosystem means there is a public key with which messages can be encrypted and another key (called secret key) with which ciphertexts can be decrypted. With FHE, we can perform operations on ciphertexts. This is done with two functions provided by the FHE scheme: the add and mul functions. Given two ciphertexts, these functions output a ciphertext whose underlying message is the sum and product (respectively) of the underlying messages in the input.
This means that two numbers can be encrypted, sent to an untrusted party who computes the sum and product of these two numbers without decrypting and learning the values of the number. Since FHE is lattice-based it is quantum-safe and comes with a security parameter that can be increased to be mathematically-proven safe (under the standard assumptions) against any adversary with finite resources.
Although addition and multiplication seem like two simple primitives, they turn out to be very efficient. In fact, any polynomial can be computed with just these two operations, and it has been shown that for any algorithm, its output can be expressed as a polynomial of its input. This is what we call being “Turing complete” (i.e. being able to compute anything a Turing machine can compute). It is easy to be convinced of this if we remember that all our computers implement Boolean operations, and we observe that the Boolean operations of or and and can be implemented with add and mul over binary data. Specifically, a and b = a * b and a or b = a + b – a*b.
While this works well in theory, in practice there are some obstacles that need to be overcome to have FHE run in a reasonable amount of time. There are two main obstacles, but they can be overcome:
Binary operations, arithmetic operations, and approximations.
The first obstacle to consider is the reduction to binary operations we have made. Think how two 16-bit numbers are added: It requires a circuit that performs multi-digit addition just like we learn as children. This is very inefficient especially since every FHE-operation is transformed to an operation of two lattice points — something that eventually translates to operations on high-degree polynomials. Luckily, FHE supports operations on numbers and not only on bits.
The problem starts when we need non-polynomial operations, for example the inverse operation, which is required to perform the weighted average or sigmoid(x), which requires exponentiation. In these cases, we can compute a polynomial approximation of the operation we are interested in. For example, we can use the Taylor expansion to compute the inverse function, or we can use the polynomial 0.5 − 1.73496 · (x/8) + 4.19407 · (x/8)^3 −5.43402 · (x/8)^5 + 2.50739 · (x/8)^7 as an approximation for the Sigmoid function in the segment [-8,8].
Of course, we can also replace the non-polynomial operations with other operations, as done by several researchers who replace non-polynomial activation functions (such as RELU or Sigmoid) with a polynomial activation function (e.g. squaring).
Single Instruction Multiple Data (SIMD)
The second obstacle involves speeding up FHE operations cryptographers to take advantage of the mathematical structure that lattice cryptography has to offer and they pack a vector of messages into a single ciphertext. This effectively makes FHE operations equivalent to vector-machine operations.
Writing a code for a vector machine is not always easy and sometimes requires a lot of ingenuity. To write efficient code more easily we have developed the HELayers library that provides tools for developers for writing code that takes advantage of the SIMD feature.
Today, using our HELayers library, we are able to provide a privacy-preserving federated learning solution. The running time of using the private version is similar to the non-private solution. With this new feature, organizations, such as hospitals, that could not collaborate before now can. Collaboratively, they can train on more data and derive new more accurate models.