The AI industry is built on geographic and social inequality, research shows

The arm of global inequality is long, rendering itself visible particularly in the development of AI and machine learning systems. In a recent paper, researchers at Cornell, the Universite de Montreal, the National Institute of Statistical Sciences (U.S.), and Princeton argue that this inequality in the AI industry involves a concentration of profits and raises the danger of ignoring the contexts to which AI is applied.

As AI systems become increasingly ingrained in society, they said, those responsible for developing and implementing such systems stand to profit to a large extent. And if these players are predominantly located in economic powerhouses like the U.S., China, and the E.U., a disproportionate share of economic benefit will fall inside of these regions, exacerbating the inequality.

Whether explicitly in response to this inequality or not, calls have been made for broader inclusion in the development of AI. At the same time, some have acknowledged the limitations of inclusion. For example, in an analysis of publications at two major machine learning conference venues, NeurIPS 2020 and ICML 2020, none of the top 10 countries in terms of publication index were located in Latin America, Africa, or Southeast Asia, the coauthors of this new study note. Moreover, the full lists of the top 100 universities and top 100 companies by publication index included no companies or universities based in Africa or Latin America.

This inequality manifests in part in data collection. Previous research has found that ImageNet and OpenImages, two large, publicly available image datasets, are U.S.- and Euro-centric. Models trained on these datasets perform worse on images from Global South countries. For example, images of grooms are classified with lower accuracy when they come from Ethiopia and Pakistan, compared to images of grooms from the United States. Along this vein, because of how images of words like "wedding" or "spices" are presented in distinctly different cultures, publicly available object recognition systems fail to correctly classify many of these objects when they come from the Global South.

Labels, the annotations from which AI models learn relationships in data, also bear the hallmarks of inequality. A major venue for crowdsourcing labeling work is Amazon Mechanical Turk, but an estimated less than 2% of Mechanical Turk workers come from the Global South, with the vast majority originating from the U.S. and India. Not only are the tasks monotonous and the wages low -- on Samasource, another crowdsourcing workload platform, workers earn around $8 a day -- but a number of barriers exist to participation. A computer and reliable internet connection are required, and on Amazon Mechanical Turk, U.S. bank accounts and gift cards are the only forms of payment.

As the researchers point out, ImageNet, which has been essential to recent progress in computer vision, wouldn't have been possible without the work of data labelers. But the ImageNet workers themselves made a median wage of $2 per hour, with only 4% making more than the U.S. federal minimum wage of $7.25 per hour -- itself a far cry from a living wage.

"As [a] significant part of the data collection pipeline, data labeling is an extremely low-paying job involving rote, repetitive tasks that offer no room for upward mobility," the coauthors wrote. "Individuals may not require many technical skills to label data, but they do not develop any meaningful technical skills either. The anonymity of platforms like Amazon's Mechanical Turk inhibit the formation of social relationships between the labeler and the client that could otherwise have led to further educational opportunities or better remuneration. Although data is central to the AI systems of today, data labelers receive only a disproportionately tiny portion of the profits of building these systems."

The coauthors also find inequality in the AI research labs established by tech giants like Google, Microsoft, Facebook, and others. Despite these centers' presence throughout South and Latin America, they tend to be concentrated in certain countries, especially India, Brazil, Ghana, and Kenya. And the positions there often require technical expertise which the local population might not have, as illustrated by AI researchers' and practitioners' tendency to work and study in places outside of their home countries. The coauthors cite a recent report from Georgetown University's Center for Security and Emerging Technology that found that while 42 of the 62 major AI labs are located outside of the U.S., 68% of the staff are located within the United States.

"Even with long-term investment into regions in the Global South, the question remains of whether local residents are provided opportunities to join management and contribute to important strategic decisions," the coauthors wrote. "True inclusion necessitates that underrepresented voices can be found in all ranks of a company's hierarchy, including in positions of upper management. Tech companies which are establishing a footprint in these regions are uniquely positioned to offer this opportunity to natives of the region."

The coauthors are encouraged by the efforts of organizations like Khipu and Black in AI, which have identified students, researchers, and practitioners in the field of AI and made improvements in increasing the number of Latin American and Black scholars attending and publishing at premiere AI conferences. Other communities based on the African continent, like Data Science Africa, Masakhane, and Deep Learning Indaba, have expanded their efforts with conferences, workshops, and dissertation awards and developed curricula for the wider African AI community.

But this being the case, the coauthors say a key component of future inclusion efforts should be to elevate the involvement and participation of those historically excluded from AI development. Currently, they argue, data labelers are often wholly detached from the rest of the machine learning pipeline, with workers oftentimes not knowing how their labor will be used nor for what purpose. The coauthors say these workers should be provided with education opportunities that allow them to contribute to the models they are building in ways beyond labeling.

"Little sense of fulfillment comes from menial tasks [like labeling], and by exploiting these workers solely for their produced knowledge without bringing them into the fold of the product that they are helping to create, a deep chasm exists between workers and the downstream product," the coauthors wrote. "Similarly, where participation in the form of model development is the norm, employers should seek to involve local residents in the ranks of management and in the process of strategic decision-making."

While acknowledging that it isn't an easy task, the coauthors suggest embracing AI development as a path forward for economic development. Rather than relying upon foreign spearheading of AI systems for domestic application, where returns from these systems often aren't reinvested domestically, they encourage countries to create domestic AI development activity focused on "high-productivity" activities like model development, deployment, and research.

"As the development of AI continues to progress across the world, the exclusion of those from communities most likely to bear the brunt of algorithmic inequity only stands to worsen," the coauthors wrote. "We hope the actions we propose can help to begin the movement of communities in the Global South from being just beneficiaries or subjects of AI systems to being active, engaged participants. Having true agency over the AI systems integrated into the livelihoods of communities in the Global South will maximize the impact of these systems and lead the way for global inclusion of AI."