Senators send letter questioning Mark Zuckerberg over Meta's LLaMA leak

Two U.S. Senators sent a letter today to Meta CEO Mark Zuckerberg that questions the leak of Meta's popular open-source large language model LLaMA, saying they are concerned about the "potential for its misuse in spam, fraud, malware, privacy violations, harassment, and other wrongdoing and harms."

Senator Richard Blumenthal (D-CT), who is chair of the Senate's Subcommittee on Privacy, Technology, & the Law and Josh Hawley (R-MO), its ranking member, wrote that "we are writing to request information on how your company assessed the risk of releasing LLaMA, what steps were taken to prevent the abuse of the model, and how you are updating your policies and practices based on its unrestrained availability."

The subcommittee is the same one that questioned OpenAI CEO Sam Altman, AI critic Gary Marcus and IBM chief privacy and trust officer Christina Montgomery at a Senate hearing about AI rules and regulation on May 16.

Letter points to Meta's LLaMA release in February

The letter points to LLaMA's release In February, saying that Meta released LLaMA for download by approved researchers, "rather than centralizing and restricting access to the underlying data, software, and model."

The letter continues: "While LLaMA was reportedly trained on public data, it differed from past models available to the public based on its size and sophistication. Regrettably, but predictably, within days of the announcement, the full model appeared on BitTorrent, making it available to anyone, anywhere in the world, without monitoring or oversight. The open dissemination of LLaMA represents a significant increase in the sophistication of the AI models available to the general public, and raises serious questions about the potential for misuse or abuse."

Calling out the LLaMA leak seems to be a swipe at the open source community, which has been having both a moment and a red-hot debate over the past months — following a wave of recent large language model (LLM) releases and an effort by startups, collectives and academics to push back on the shift in AI to closed, proprietary LLMs and democratize access to LLMs.

LLaMA, on its release, was immediately hailed for its superior performance over models such as GPT –3, despite having 10 times fewer parameters. Some open-source models released were tied to LLaMA. For example, Databricks announced the ChatGPT-like Dolly, which was inspired by Alpaca, another open-source LLM released by Stanford in mid-March. Alpaca, in turn, used the weights from Meta’s LLaMA model. Vicuna is a fine-tuned version of LLaMA that matches GPT-4 performance.

Senators criticize Meta's use of the word 'leak'

The Senators had harsh words for Zuckerberg regarding LLaMA's distribution and the use of the word "leak."

"The choice to distribute LLaMA in such an unrestrained and permissive manner raises important and complicated questions about when and how it is appropriate to openly release sophisticated AI models," the letter says.

"Given the seemingly minimal protections built into LLaMA’s release, Meta should have known that LLaMA would be broadly disseminated, and must have anticipated the potential for abuse," it continues. "While Meta has described the release as a leak, its chief AI scientist has stated that open models are key to its commercial success. Unfortunately, Meta appears to have failed to conduct any meaningful risk assessment in advance of release, despite the realistic potential for broad distribution, even if unauthorized."

Meta known as a particularly 'open' Big Tech company

Meta is known as a particularly “open” Big Tech company (thanks to FAIR, the Fundamental AI Research Team founded by Meta’s chief AI scientist Yann LeCun in 2013). It had made LLaMA’s model weights available for academics and researchers on a case-by-case basis — including Stanford for the Alpaca project — but those weights were subsequently leaked on 4chan. This allowed developers around the world to fully access a GPT-level LLM for the first time.

It’s important to note, however, that none of these open-source LLMs are available yet for commercial use, because the LLaMA model is not released for commercial use, and the OpenAI GPT-3.5 terms of use prohibit using the model to develop AI models that compete with OpenAI.

But those building models from the leaked model weights may not abide by those rules.

In an interview with VentureBeat in April, Joelle Pineau, VP of AI research at Meta, said that accountability and transparency in AI models is essential.

Meta VP of AI research cited need to 'lean into transparency'

“The pivots in AI are huge, and we are asking society to come along for the ride,” she said in the April interview. “That’s why, more than ever, we need to invite people to see the technology more transparently and lean into transparency.”

However, Pineau doesn’t fully align herself with statements from OpenAI that cite safety concerns as a reason to keep models closed. “I think these are valid concerns, but the only way to have conversations in a way that really helps us progress is by affording some level of transparency,” she told VentureBeat.

She pointed to Stanford’s Alpaca project as an example of “gated access” — where Meta made the LLaMA weights available for academic researchers, who fine-tuned the weights to create a model with slightly different characteristics.

“We welcome this kind of investment from the ecosystem to help with our progress,” she said. But while she did not comment to VentureBeat on the 4chan leak that led to the wave of other LLaMA models, she told the Verge in a press statement, “While the [LLaMA] model is not accessible to all … some have tried to circumvent the approval process.”

Pineau did emphasize that Meta received complaints on both sides of the debate regarding its decision to partially open LLaMA. “On the one hand, we have many people who are complaining it’s not nearly open enough, they wish we would have enabled commercial use for these models,” she said. “But the data we train on doesn’t allow commercial usage of this data. We are respecting the data.”

However, there are also concerns that Meta was too open and that these models are fundamentally dangerous. “If people are equally complaining on both sides, maybe we didn’t do too bad in terms of making it a reasonable model,” she said. “I will say this is something we always monitor and with each of our releases, we carefully look at the trade-offs in terms of benefits and potential harm.”

Letter points to Meta's LLaMA release in February

Senators criticize Meta's use of the word 'leak'

Meta known as a particularly 'open' Big Tech company

Meta VP of AI research cited need to 'lean into transparency'

More