Federated Systems

Overview

In a traditional machine learning pipeline all the relevant data is centrally stored in a single location to be accessed for training a machine learning model. However, this is not always possible: data may be gathered in a decentralised manner by users and communicating it to a central server can be infeasible due to privacy restrictions and the associated cost in transmitting large files. Federated learning can offer effective solutions to this problem. In a federated learning scenario users can collaboratively learn a common model while keeping their respective data private. This means data privacy can be maintained more easily as it never leaves the user's device, and also the size of the model is typically much smaller than the dataset size. Additionally, from a servers perspective this distributes the majority of the computation across participating devices.

IBM Federated Learning

IBMFL is a python framework developed to enable federated learning in an enterprise environment. It provides a basic fabric for FL on which advanced features can be added. It is not dependent on any specific machine learning framework and supports different learning topologies, e.g., a shared aggregator, and protocols. It is meant to provide a solid basis for federated learning that enables a large variety of federated learning models, topologies, learning models etc., in particular in enterprise and Hybrid Cloud settings.

MUSKETEER

MUSKETEER is an EU Horizon project for federated learning with an emphasis on privacy preserving scenarios. The massive increase in data collected and stored worldwide calls for new ways to preserve privacy while still allowing data sharing among multiple data owners. Today, the lack of trusted and secure environments for data sharing inhibits data economy while legality, privacy, trustworthiness, data value and confidentiality hamper the free flow of data. MUSKETEER aims to create a validated, federated, privacy-preserving machine learning platform tested on industrial data that is interoperable, scalable and efficient enough to be deployed in real use cases. MUSKETEER aims to alleviate data sharing barriers by providing secure, scalable and privacy-preserving analytics over decentralised datasets using machine learning. Data can continue to be stored in different locations with different privacy constraints, but shared securely. The MUSKETEER cross-domain platform will validate progress in the industrial scenarios of smart manufacturing and health and outcomes are validated in an operational setting. A data economy is fostered by creating a rewarding model capable of fairly monetising datasets according to the real data value.

Robustness of Federated Learning

While FL is an elegant framework for learning models across a variety of clients without explicitly sharing data, the vanilla form of FL incurs significant shortcomings when faced with disruptive scenarios. These could include scenarios where some of the participating clients send corrupted updates owing to accidental malfunction or deliberate efforts where clients supply malicious updates to undermine the learning process. FL systems are also vulnerable to backdoor attacks where the compromised model exhibits unexpected behaviour for inputs containing specific triggers, or membership inference attacks where the attacker tries to assert if a data point has been used for training the learning algorithm.

Our team addresses these challenges and risks by devising methods that analyse and help mitigate the threats against Federated Learning systems. We formally analyse the different threats from the point of view of attack surface, attacker's capabilities and attacker's goals which we leverage to build tools that can help investigate the robustness of Federated Learning applications.