Open-source AI continues to celebrate as Big Tech mulls over moats

The founders of AI startup Together, which made news last month by replicating Meta's LLaMA dataset with a goal to build open-source LLMs, is celebrating today after raising a $20 million seed round to build an open-source AI and cloud platform.

These days, it seems like everyone in open-source AI is raising a toast to recent success. For example, a wave of new open-source LLMs have been released that are close enough in performance to proprietary models from Google and OpenAI — or at least good enough for many use cases — that some experts say most software developers will opt for the free versions. This has led the open-source AI community to cheer the pushback on the shift in AI over the past year to closed, proprietary LLMs, which experts say will lead to "industrial capture," in which the power of state-of-the-art AI technology is controlled by a few deep-pocketed Big Tech companies.

And then there are the actual parties: Open-source hub Hugging Face got the party started in early April with its "Woodstock of AI" get-together that drew more than 5,000 people to the Exploratorium in downtown San Francisco. And this Friday, Stability AI, which created the popular open-source image generator Stable Diffusion, and Lightning AI, which developed PyTorch Lightning, will host a "Unite to Keep AI Open Source" gathering in New York City at a so-far "secret location."

Big Tech considers its moat, or lack thereof

As open-source AI parties on, Big Tech is weighing its options. Last week a leaked Google memo from one of its engineers, titled "We have no moat," claimed that the "uncomfortable truth" is that neither Google nor OpenAI is positioned to "win this arms race."

That, the engineer said, was because of open-source AI. "Plainly put, they are lapping us," the memo continued. "While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly.”

Some are saying that these concerns may reduce the willingness of Big Tech companies to share their LLM research. But Lightning AI CEO William Falcon told VentureBeat in March that this was already happening. OpenAI's release of GPT-4, he explained, included a 98-page Technical Report that was "masquerading as research."

"Now, because they have this pressure to monetize, I think literally today is the day where they became really closed-source," Falcon said after the GPT-4 launch. "They just divorced themselves from the community."

Last month, Meta's Joelle Pineau, VP of AI research at Meta, told VentureBeat that accountability and transparency in AI models is essential. “My hope, and it’s reflected in our strategy for data access, is to figure out how to allow transparency for verifiability audits of these models,” she said.

But even Meta, which has been known as a particularly “open” Big Tech company (thanks to FAIR, the Fundamental AI Research Team founded by Meta’s chief AI scientist Yann LeCun in 2013), may have its limits. In an MIT Technology Review article by Will Douglas Heaven yesterday, Pineau said that the company may not open its code to outsiders forever. “Is this the same strategy that we’ll adopt for the next five years? I don’t know, because AI is moving so quickly," she said.

How long can the open-source AI party last?

That's where the problem lies for open-source AI — and how their partying ways could suddenly screech to a halt. If Big Tech companies fully close up access to their models, their "secret recipes" could be even harder to suss out — as Falcon explained to VentureBeat. In the past, he explained, even though Big Tech models might not be exactly replicable, the open source community knew what the basic ingredients of the recipe were. Now, there may be ingredients no one can identify.

"Think about if I give you a recipe for fried chicken — we all know how to make fried chicken," he said. "But suddenly I do something slightly different and you’re like wait, why is this different? And you can’t even identify the ingredient. Or maybe it’s not even fried. Who knows?"

This, he said, sets a bad precedent. "You are going to have all these companies who are not going to be incentivized anymore to make things open-source, to tell people what they’re doing," he said, adding that the dangers of unmonitored models is real.

"If this model goes wrong, and it will, you’ve already seen it with hallucinations and giving you false information, how is the community supposed to react?" he said. "How are ethical researchers supposed to go and actually suggest solutions and say, this way doesn’t work, maybe tweak it to do this other thing? The community’s losing out on all this."

Big Tech considers its moat, or lack thereof

How long can the open-source AI party last?

More