Why generative AI legal battles are brewing | The AI Beat

This morning, the New York Times’ Kevin Roose called what has been a big week for generative AI tools a “coming out” party [subscription required].

He detailed an actual party, on Monday night, which celebrated a massive funding round for Stability AI, the startup behind Stable Diffusion, the uber-popular image-generating algorithm that was only launched publicly two months ago.

But this week was chock-full of other significant news around generative AI (which refers to using unsupervised learning algorithms to learn from existing text, audio or images and create new content --- and now includes popular tools including GPT-3, DALL-E 2 and Imagen as well as nascent text-to-video options from OpenAI and Google).

There was the news that Microsoft would add DALL-E to its Office suite and to Azure AI, while Adobe was planning to add generative AI tools to Photoshop and also committed to transparency in its use of generative AI. Then, besides Stable Diffusion’s news, content generator Jasper also announced a massive funding round of $125 million, solidifying VC interest in the generative AI space.

There was also passionate online chatter about GitHub user Matthew Butterick, who claims GitHub Copilot, a generative AI tool that suggests computer code to developers, used his source code as training data. He is starting an investigation into Copilot with the intention of eventually starting a class-action lawsuit against GitHub and its parent company Microsoft.

This firehose of news (and I won’t even get into the whole “students are writing their papers with generative AI” thing) is leading to brewing legal battles far beyond GitHub Copilot, according to Bradford Newman, who leads the machine learning and AI practice of global law firm Baker McKenzie, in its Palo Alto office.

Will 'fair use' questions go to the Supreme Court?

I spoke to Newman back in August around issues of DALL-E image ownership – that is, the output of generative AI. Now, he said, legal questions are coming fast and furious around copyright and the “fair use” of the input – that is, the training data going into generative AI tools.

“Legally, right now, there is little guidance,” he warned. “There are the inevitable class actions, but the net net of it all is when you’re using the massive data sets that these AI applications are and you sprinkle on top of that open source licenses [as in the GitHub Copilot example], the arguments are going to be fair use versus infringement.”

Different courts, he predicted, will come to different conclusions. “Ultimately, I believe this is going to go to the Supreme Court.”

Newman isn’t the only one who thinks so: Legal scholar Andres Guadamuz, a reader in intellectual property law at the University of Sussex in the UK who has been studying legal issues around generative AI, said as much in a blog post this week – though he cautioned the legal battles could drag on for years.

The GitHub Copilot case, he said in the blog post, is “starting to look like the very first case dealing specifically with machine learning and fair use in the US.”

If it goes ahead, it could be the very first to test that theory, though he said he wouldn’t bet on a result. But one thing is clear, he emphasized: “If this case goes ahead it will take years, any lower court decision will be appealed, and the appeals could make it all the way to the US Supreme Court. So we’re talking years and years of uncertainty.”

Millions of images used for generative AI training data

Some copyright issues related to generative AI may well end up being straightforward, McKenzie pointed out. For example, Greg Rutkowski, a Polish digital artist, is said to have become a more popular prompt in Stable Diffusion than Picasso. A one-to-one situation like that, McKenzie said, may have a legal leg to stand on as far as copyright infringement.

But what if an AI or ML application is trained on millions of images of, say, Paris? Some of those images are clearly copyrighted, but how could anyone prove that their image, even if it were tagged in some way, led directly to the AI’s output?

“The laws weren’t really written for artificial intelligence,” said Newman. “Maybe we're going to need new IP laws to cover AI, which is something I've been shouting about for years – I believe we need new laws to account for these issues, which are about humans wanting money for things machines do.”

But Guadamuz, in his blog post, had a similar take. “I agree in principle that there is no direct case law dealing with fair use in training an AI,” he said. “However, there is a good argument to be made that training data is fair use, and this includes Author’s Guild v Google, and the aforementioned Google v Oracle. It is true that this is not decided, and as with the first assumption, a court case could easily go in favor of those claiming copyright infringement, but I don’t think that it’s a slam dunk argument by any stretch of the imagination.”

The legal future of generative AI is uncertain

Just to make things even more interesting, Guadamuz pointed out that other countries have already enacted legislation that says training machine learning is legal. The UK has had a text and data mining exception to copyright for research purposes since 2014, he wrote, and the EU has passed the Digital Single Market Directive in 2019 which contains an exception for text and data mining “for all purposes as long as the author has not reserved their right.”

“The practical result of the existence of these provisions is that while litigation is ongoing in the US, most data mining and training operations will move to Europe, and US AI companies could just license the trained models,” he said. “The result would be to place the US at a disadvantage in the AI arms race.”

The bottom line, Newman said, as passionate as many are on either side of the generative AI legal debate, “no one knows the answer” to these complex questions. At the end of the day, he explained, “The courts are going to have to figure it out.”

Meanwhile, Stable Diffusion, and many others, will party on.

Will 'fair use' questions go to the Supreme Court?

Millions of images used for generative AI training data

The legal future of generative AI is uncertain

More