Meta and Google news adds fuel to the open-source AI fire

The open-source AI debate is getting even hotter in Big Tech, thanks to recent headlines from Google and Meta.

On Tuesday evening, CNBC reported that Google's newest large language model (LLM) PaLM 2 "uses nearly five times more text data for training than its predecessor," even though when it announced the model last week, Google said it was smaller than the earlier PaLM but uses a more efficient “technique.” The article emphasized that "the company has been unwilling to publish the size or other details of its training data."

While a Google spokesperson declined to comment on the CNBC reporting, Google engineers were, to put it mildly, pissed off by the leak and eager to share their thoughts. In a now-removed tweet, Dmitry (Dima) Lepikhin, a senior staff software engineer at Google DeepMind, tweeted: "whoever leaked PaLM2 details to cnbc, sincerely fuck you!"

And Alex Polozov, a senior staff research scientist at Google, also weighed in with what he called a "rant," pointing out that the leak sets a precedent for increased siloing of research.

Lucas Beyer, a Google AI researcher in Zurich, agreed, tweeting: "It's not the token count (which I don't even know if it's correct) that upsets me, it's the complete erosion of trust and respect. Leaks like this lead to corpspeak and less openness over time, and an overall worse work/research environment. And for what? FFS."

Meta's LeCun: "The platform that will win will be the open one"

Not in response to the Google leak — but in coincidental timing — Meta chief AI scientist Yann LeCun did an interview focusing on Meta's open-source AI efforts with the New York Times, which published this morning.

The piece describes Meta's release of its LLaMA large language model in February as "giving away its AI crown jewels" — since it released the model's source code to "academics, government researchers and others who gave their email address to Meta [and could then] download the code once the company had vetted the individual."

“The platform that will win will be the open one,” LeCun said in the interview, later adding that the growing secrecy at Google and OpenAI is a “huge mistake” and a “really bad take on what is happening.”

In a Twitter thread, VentureBeat journalist Sean Michael Kerner pointed out that Meta has "actually already gave away one of the most critical AI/ML tools ever created — PyTorch. The foundational stuff needs to be open/and it is. After all, where would OpenAI be without PyTorch?"

Meta's take on open source is nuanced

But even Meta and LeCun will only go so far in terms of openness. For example, Meta had made LLaMA’s model weights available for academics and researchers on a case-by-case basis — including Stanford for its Alpaca project — but those weights were subsequently leaked on 4chan. That leak is what actually allowed developers around the world to fully access a GPT-level LLM for the first time, not the Meta release, which did not include releasing the LLaMA model for commercial use.

VentureBeat spoke to Meta last month about the nuances of its take on the open- vs. closed-source debate. Joelle Pineau, VP of AI research at Meta, said in our interview that accountability and transparency in AI models is essential.

"More than ever, we need to invite people to see the technology more transparently and lean into transparency," she said, explaining that the key is to balance the level of access, which can vary depending on the potential harm of the model.

“My hope, and it’s reflected in our strategy for data access, is to figure out how to allow transparency for verifiability audits of these models,” she said.

On the other hand, she said that some levels of openness go too far. “That’s why the LLaMA model had a gated release,” she explained. “Many people would have been very happy to go totally open. I don’t think that’s the responsible thing to do today.”

LeCun remains outspoken on AI risks being overblown

Still, LeCun remains outspoken in favor of open-source AI, and in the New York Times interview argued that the dissemination of misinformation on social media is more dangerous than the latest LLM technology.

“You can’t prevent people from creating nonsense or dangerous information or whatever,” he said. “But you can stop it from being disseminated.”

And while Google and OpenAI may become more closed with their AI research, LeCun insisted he — and Meta — remain committed to open source, saying "progress is faster when it is open."