LinkedIn open-sources toolkit to measure AI model fairness

LinkedIn today released the LinkedIn Fairness Toolkit (LiFT), an open source software library designed to enable the measurement of fairness in AI and machine learning workflows. The company says LiFT can be deployed during training and scoring to measure biases in training data sets, and to evaluate notions of fairness for models while detecting differences in their performance across subgroups.

There are countless definitions of fairness in AI, each capturing different aspects of fairness to users. Monitoring models along these definitions is a step toward ensuring fair experiences, but although several toolkits tackle fairness-related challenges, most don't address large-scale problems and are tied to specific cloud environments.

By contrast, LiFT can be leveraged for ad hoc fairness analysis or as a part of any large-scale A/B testing system. It's usable for exploratory analysis and in production, with bias measurement components that can be integrated into stages of a machine learning training and serving system. Moreover, it introduces a novel metric-agnostic testing framework that can detect statistically significant differences in performance as measured across different subgroups.

LiFT is reusable, LinkedIn says, with wrappers and a configuration language intended for deployment. At the highest level, the library provides a basic driver program powered by a simple configuration, enabling fairness measurement for data sets and models without the need to write code and related unit tests. But LiFT also provides access to higher-level and lower-level APIs that can be used to compute fairness metrics at all levels of granularity, with the ability to extend key classes to enable custom computation.

To achieve scalability, LiFT taps Apache Spark, loading data sets into an organized database with only the primary key, labels, predictions, and protected attributes. Data distributions are computed and stored on a single system in-memory to speed up the computation of subsequent fairness metric computations; users can operate on these distributions or deal with cached data sets for more involved metrics.

To date, LinkedIn says it has applied LiFT internally to measure the fairness metrics of training data sets for models prior to their training. In the future, the company plans to increase the number of pipelines where it's measuring and mitigating bias on an ongoing basis through deeper integration of LiFT.

"News headlines and academic research have emphasized that widespread societal injustice based on human biases can be reflected both in the data that is used to train AI models and the models themselves. Research has also shown that models affected by these societal biases can ultimately serve to reinforce those biases and perpetuate discrimination against certain groups," LinkedIn senior software engineer Sriram Vasudevan, machine learning engineer Cyrus DiCiccio, and staff applied researcher Kinjal Basu wrote in a blog post. "We are working toward creating a more equitable platform by avoiding harmful biases in our models and ensuring that people with equal talent have equal access to job opportunities."