How to launch a federated learning program

This is second article in a two-part series on federated learning (FL). Part 1 -- How to know if federated learning should be part of your data strategy -- will help you decide if federated learning is right for a use case you have in mind. This article will outline the steps involved in adapting federated learning to your organization.

1. Start with a test case

The first step in the process of adopting FL is to perform a small-scale test on a single machine to determine if your data is suitable for federated learning. This test will also help establish an initial business case by demonstrating (or not) that the model created by federated learning is accurate enough for your problem.

To conduct the test, you need to collect a relatively small sample of data that is representative of the data distribution across your data silos, split it up in the same way, and then train a model using a federated learning algorithm on that split dataset. In essence, you simulate federated learning over distributed workers on a single machine and compare the performance of a model trained on the entire dataset to the performance of the model learned in a federated way. The reason you can do this test locally is because federated learning algorithms are agnostic to the data being in different locations and only assume that the worker models are being trained on separate datasets. If the test results are satisfactory, you can move on the next step.

2. Get buy-in from data owners

The second step in the path to adopting federated learning is to get buy in from the data owners regarding a production deployment or a proof of concept. The most common concerns data owners have is that the FL process will accidentally expose personally identifiable information or confidential business secrets. You can address these concerns by understanding the threat model that the data owners have in mind. For instance, if they are worried about the models themselves revealing information, or they do not trust the entity controlling the central server, then technical deep dives into what kind of information is retained in the models will be useful. If they are still not convinced, you can adopt more advanced privacy preserving technologies such as homomorphic encryption to hide the worker models from everybody without impeding federated learning. Researchers are continuously considering these issues, so you should consult the relevant literature when answering specific issues.

Another important concern data owners often have is that federated learning could expose their data to competitors. For example, imagine your organization is a consortium of additive manufacturers and you want to build AI-based process controllers and quality assurance systems using confidential build data from members of your consortium. In such a scenario, the data owners are often worried that if they participate in federated learning, then their data could end up helping their competitors gain an advantage over them.

This is a legitimate concern, and you can tackle it in one of several ways depending on the use case. For instance, federated learning may be focused on a problem that will not result in an advantage to any of the participants but that increases the overall market penetration of the group as a whole. Consider the additive manufacturer example: The goal is to learn optimal process controllers for additive manufacturing (AM) through federated learning. AM is a relatively new and promising sub-vertical but is not yet widely adopted. The complexity of AM processes means that machine learning is necessary to optimize various steps within the manufacturing pipeline to enable wider adoption of the technology. Using federated learning to share insights from confidential data and build better optimizers would thus appeal to a hypothetical consortium of additive manufacturers.

In general, dispelling this concern around competitive advantage requires understanding the broader context of the business and then developing the FL approach in that light.

3. Build your system

Once you've completed initial testing, developed the business case, and convinced the data owners to proceed, the final step is to actually build and deploy the federated learning solution. I will outline this step now, but please be forewarned that what follows will be more subjective than what we've covered so far, as it will be colored by my own experience building a federated learning library at my organization.

In general, you have two options when deciding how to go about building and deploying a federated learning solution: adopt an existing solution or build your own. Depending on your needs and the level of expertise available within your organization, the latter option might be far more preferable. To understand why, I will briefly survey the current state of affairs.

There are many federated learning libraries available for popular machine learning platforms; however, most of them were designed for research and experimentation rather than deployment. This means they are either too immature or lack the feature set necessary for a robust real-world application.

The two main exceptions, from my perspective, are the Clara framework from Nvidia and the open source FATE framework from WeBank. Clara is actually a large SDK for machine learning that targets certain parts of the healthcare industry and offers a built-in federated learning functionality. If your use case fits this vertical, then Clara and the community around it will be a great place for you to start. FATE is another fairly large and feature-rich framework, originally meant for the finance industry, that implements many different types of algorithms and other privacy-preserving technologies. Depending on the scenario you are considering, FATE may also be a great option for you. However, I would encourage you to do your own research to better understand what may be suitable for you -- new frameworks are being developed all the time.

If your particular problem does not fit the applications that these frameworks are targeting, you will need to put in a fair amount of effort to understand and adapt it to your use case and technology stack. So it may be preferable to build your own library/solution, which, as it turns out, is feasible for a small group of experienced software and machine learning engineers.

The reason for it is feasible is because a federated learning solution, at a high level consists of three components, each of which is within the grasp of such a team: the federated learning algorithm, the communication infrastructure, and the security infrastructure:

The federated learning algorithm specifies how the models from the workers should be combined into one global model at the server and how the global model should then be integrated back in by the workers. There are some well established algorithms (such as FedAvg), which generally works without requiring significant modifications, and researchers are developing new algorithms all the time. As I mentioned above, these algorithms are agnostic to the underlying communication substrate, and any machine learning engineer should be able to implement them without too much effort.

The communication infrastructure is necessary to pass models from the worker to the server and then back again in a secure and reliable fashion. If the number of potential workers/data owners is consortium scale (in the thousands, which is the case for many of the examples I mentioned in the first article), then the communication infrastructure can be implemented using a web server, and experienced software engineers should be able to implement this.

Security infrastructure is the final component. The requirements for this vary across applications, but a standard version involves ensuring that communication between the server and the worker is secure and that the workers are properly authenticated and managed. This, again, can be accomplished with standard technologies like https (that secures websites) and digital signatures respectively and should be straightforward to implement for experienced software engineers.

An important issue I have glossed over is the process of actually building the machine learning model. This typically requires a lot of data analysis, exploration, setting up various pipelines, and exploring different kinds of model architectures and training approaches and so on. All of these tasks become more challenging in a federated setting. In a best case scenario, you can gather some data from across the silos at a central location and then proceed as usual to create the model architecture and data pipeline. You can then train the production model using federated learning. If this is not possible, then techniques like federated data analysis, federated neural architecture search are possible options. Overall, federated model development is another big topic that warrants an article of its own.

Hopefully this two-part series has whetted your appetite for adopting federated learning at your organization and will serve as a useful starting point.

M M Hassan Mahmud is a Senior AI and Machine Learning Technologist at Digital Catapult, with a background in machine learning within academia and industry.

VentureBeat is always looking for insightful guest posts related to enterprise data technology and strategy.

1. Start with a test case

2. Get buy-in from data owners

3. Build your system

More