AI Weekly: Facial recognition, deepfakes, privacy, and jobs automation defined 2019

As the year draws to a close, it's instructive to look back at the months preceding to see what the future has in store. History is cyclical in nature, and this is true of the field of AI. Consider that backpropagation, an algorithm widely used in the training of machine learning systems, appeared as a theory as early as 1980, but it wasn't until the 2010s that it returned thanks in part to cheap, powerful graphics card-based machines.

This year, four key issues in AI and machine learning rose to the fore: facial recognition; deepfakes and self-censorship in academia; privacy; and automation. In anticipation of 2020, here's a look back at some of the issues that defined the industry in 2019.

Facial recognition

Facial recognition found itself in the news this year perhaps more than any other application of AI.

In early January, a team of MIT scientists alleged that Amazon Web Services' facial recognition and analysis platform -- Rekognition -- distinguished gender among certain ethnicities less accurately than did competing solutions. Specifically, it failed to determine the sex of female and darker-skinned faces in select scenarios, mistakenly identifying pictures of women as men and darker-skinned women as men 19% and 31% of the time, respectively.

Amazon's disputations aside, the study presciently spotlighted the types of biases to which AI can easily become susceptible. Research published by the National Institute of Standards and Technology (NIST) just last week found that, when conducting a particular type of database search, a number of facial recognition algorithms falsely identified black and Asian faces 10-to-100 times more often than Caucasian faces.

Beyond the bias problem, facial recognition technology's scalability makes it ripe for abuse. This year, the NYPD ran a picture of actor Woody Harrelson through a facial recognition system because officers thought the suspect seen in drug store camera footage resembled the actor. We learned how China employs facial recognition to track the movements of its Uighur Muslim population. And AnyVision, a startup based outside of Tel Aviv, has come under scrutiny following reports that its products are used to watch Palestinians living in the West Bank.

A growing number of activists, academics, and lawmakers have called for restrictions or outright bans on facial recognition technology. This fall, California imposed a three-year moratorium on facial recognition use in law enforcement body cameras, and in May, San Francisco banned facial recognition use by police or city departments. Oakland followed suit in June, after which Berkeley passed a ban of its own. And in two House Oversight and Reform committee hearings last summer, some of the most prominent Republicans and Democrats in the U.S. Congress joined together in proposals for legislative reform, following the introduction of the Commercial Facial Recognition Privacy Act of 2019, which would require businesses to receive consent before using facial recognition software.

Given the fierceness of the debate in Congress, academia, statehouses, and public forums like Capitol Hill, it's fair to say that facial recognition was and will remain a hot-button topic.

Self-censorship and deepfakes

In a break from academic norms, OpenA in February opted not to make public the corpus used to train its state-of-the-art natural language processing model, known as GPT-2, nor the training code that accompanied it. In a blog post justifying its decision, OpenAI expressed concern that they might be used to generate synthetic financial news about specific companies, for instance, or screeds of racist or sexist text and fake reviews on sites like Amazon or Yelp.

OpenAI subsequently released several smaller and less complex versions of GPT-2 and studied their reception as well as the data sets on which they trained on. After concluding that there was "no strong evidence" of misuse, it published the full model -- which was trained on eight million text documents scraped from the web -- last month.

Critics of OpenAI's decision argued that the firm exaggerated the danger posed by their work, and that it inadvertently stoked mass hysteria about AI and machine learning in the process. This aside, they assert that OpenAI disadvantaged researchers by depriving them of access to breakthrough AI techniques, and that it effectively prevented the research community from identifying faults in GPT-2 or coming up with potential countermeasures.

They have a point, but OpenAI's fears weren't entirely unfounded. Deepfakes, or media that takes a person in an existing image, audio recording, or video and replaces them with someone else’s likeness using AI, multiplied quickly in 2019. Deeptrace found 14,698 deepfake videos on the internet during its most recent tally in June and July, up 84% from last December. That's troubling not only because deepfakes might be used to sway public opinion during an election or to implicate someone in a crime they didn't commit, but because they've already been used to produce pornographic material and to swindle companies out of hundreds of millions of dollars.

Tech giants including Facebook, Microsoft, and Amazon have teamed up with academic partners including MIT and Cornell to help fight the spread of AI-originated misleading media, but OpenAI's hesitancy to release its model is a bellwether of the challenges ahead. Indeed, Experian predicts that in 2020, cyber criminals will use AI technology to disrupt commercial enterprises' operations and create geopolitical confusion among nations.

Privacy

For all the good they've done, AI and machine learning algorithms have a major privacy problem.

The Royal Free London NHS Foundation Trust, a division of the U.K.'s National Health Service based in London, provided Alphabet's DeepMind with data on 1.6 million patients without their consent. Google (whose health data-sharing partnership with Ascension became the subject of scrutiny in November) abandoned plans to publish scans of chest X-rays over concerns that they contained personally identifiable information. This past summer, Microsoft quietly removed a data set (MS Celeb) with more than 10 million images of people after it was revealed that some weren't aware they had been included. And ImageNet, an open source library commonly used to train computer vision algorithms, was revealed to have at some point contained depictions of intimate acts scraped from Google, Flickr, and elsewhere.

Separately, tech giants including Apple and Google have been the subject of reports uncovering the potential misuse of recordings collected to improve assistants like Siri and Google Assistant. In April, Bloomberg revealed that Amazon employs contract workers to annotate thousands of hours of audio from Alexa-powered devices, prompting the company to roll out user-facing tools that quickly delete cloud-stored data.

That's all problematic given that increasingly, privacy isn't merely a question of philosophy but table stakes in the course of business. Laws at the state, local, and federal levels aim to make privacy a mandatory part of compliance management. Hundreds of bills that address privacy, cybersecurity, and data breaches are pending or have already been passed in 50 U.S. states, territories, and the District of Columbia. Arguably the most comprehensive of them all, the California Consumer Privacy Act was signed into law roughly two years ago. That's not to mention the Health Insurance Portability and Accountability Act (HIPAA), which requires companies to seek authorization before disclosing individual health information.

In response, Google and others have released libraries such as TensorFlow Privacy and PySyft for machine learning frameworks including TensorFlow and PyTorch, which provide strong privacy guarantees with techniques like differential privacy. Simultaneously, they've pursued techniques including federated learning, which trains AI across decentralized devices or servers (i.e., nodes) holding data samples without exchanging those samples, and homomorphic encryption, a form of cryptography that enables computation on plaintext (file contents) encrypted using an algorithm (also known as ciphertexts). And on the fully managed services side of the equation, tech giants like Amazon have moved to make their offerings comply with regulations like HIPAA.

Automation

While fears of job-stealing AI might have been overblown, automation is eroding the need for human labor.

A McKinsey Global Institute report published earlier this year found that women predominate in occupations that will be adversely changed by AI and machine learning. About 40% of jobs where men make up the majority in the 10 economies contributing over 60% of GDP collectively could be displaced by automation by 2030, compared with the 52% of women-dominated jobs with high automation potential.

These sentiments jibe with a March 2019 report from the U.K. Office for National Statistics (ONS), which found that 10% of the U.K.'s workforce (about 1.5 million workers) occupy jobs that are at "high risk" of automation. ONS forecasted that service workers -- chiefly waiters and waitresses, retail inventory restockers, and entry-level salespeople -- would be disproportionately affected, as well as those in agricultural, automotive, and service industries. And the department predicted that women, who in 2017 held 70.2% of high-risk jobs, would bear the brunt of the coming labor market shifts.

Whether they take up new work or acquire new skills in their current fields, it's anticipated that tens of millions of workers will have to make some sort of occupational transition by 2030. Forrester found that automation could eliminate 10% of U.S. jobs in the coming months. And the World Economic Forum, PricewaterhouseCoopers, McKinsey Global Institute, and Gartner have forecast that AI could make redundant as many as 75 million jobs by 2025.

Perhaps unsurprisingly, various forms of universal basic income, such as regular payments to citizens regardless of income, have the endorsements of luminaries such as Richard Branson and Elon Musk. U.S. presidential candidate Andrew Yang made it a central part of his campaign for the Democrats' nomination -- he asserts that payments furnished by a value-added tax could kick-start economic development in regions of the U.S. that haven't benefited from a wellspring of venture capital. As for Bill Gates, he's suggested imposing a "robot tax," whereby the government would extract a fee every time a business replaces an employee with automated software or machines.

Looking ahead

The challenges with AI are formidable. Facial recognition remains a potent and largely unregulated application of machine learning that's enhancing -- and in some cases creating -- surveillance states. Deepfakes weigh heavily on tech companies and academics, along with the general public. Definitive solutions to the privacy questions in AI are elusive. And no matter whether workers reskill, automation is predicted to impact the livelihoods of millions.

What answers might 2020 hold? Tough to say. But for all the dilemmas posed by AI, it's effected enormous positive change. AI this year achieved the state of the art in protein folding, which could pave the way for new therapies and medications. Various implementations of machine learning are being used to tackle global climate change. And AI has allowed people with speech and hearing impediments to use products that were previously unavailable to them.

As with any paradigm shift, there's invariably some bad with the good. The industry's task -- and indeed, our task -- is doing all within its power to advance the latter at the former's expense.

For AI coverage, send news tips to Khari Johnson and Kyle Wiggers and AI editor Seth Colaner — and be sure to subscribe to the AI Weekly newsletter and bookmark our AI Channel.

Thanks for reading,

Kyle Wiggers

AI Staff Writer