Data is not the new oil

Businesses that capture the biggest reserves of data hold a sustainable advantage that will leave competitors in the dust — so say luminaries, including former Intel Chief Executive Brian Krzanich, China investment guru Kai Fu Lee, and Silicon Valley venture star Ann Winblad. After all, the market capitalizations of data-rich behemoths like Amazon.com Inc., Google parent Alphabet Inc., and Facebook Inc. prove it: Data is the new oil.

Data may have been tantamount to petroleum when the internet was virgin wilderness. But the evolution of digital business long since rendered that notion obsolete. Today, those who believe moats of data will keep rivals at bay risk becoming overrun.

If data bears any similarity to oil, it’s as a lubricant more than a fuel. Artificial intelligence does rely on real-world photos and measurements, and it needs a squirt of fresh data from time to time. But that doesn’t mean the gears will spin faster and faster. For that you need to overhaul the engine. What keeps the tech giants ahead is not their abundance of digital bits, but ceaseless cycles of product development and improvement.

British data scientist Clive Humby is widely credited as having been first to call out data as the crucial fuel of the information economy in 2006. But the analogy has always been strained. Unlike oil, bits don’t get used up; they persist and potentially become more useful over time. And they can’t be drained dry. In fact, they’re infinitely renewable.

Most important, bits aren’t scarce. Today, they’re a commodity.

In the early 2000s, anyone bent on amassing a large-scale data set first had to acquire a mountain of computing and data-storage equipment. The same year Mr. Humby made his comparison, though, Amazon launched its Web Services division, offering pay-as-you-go access to hardware to anyone with an internet connection. Today startups can choose among dozens of cloud-computing services for pennies an hour, where they can cull and process data from the web on a grand scale. Meanwhile, researchers are busy assembling free data sets like ImageNet’s 14 million digital photos or Linguistic Data Consortium’s library of 63,000 spoken-English sentences.

Of course, the internet giants do have advantages, like astronomical numbers of customers and sprawling cloud networks built to their own specifications. But even when they’ve pulled ahead in a data-intensive business, they haven’t been able to maintain the lead.

Take Apple Inc.’s Siri. The first consumer-grade AI assistant was a marvel of engineering in 2011, when it introduced consumers to voice-controlled computing. Siri had unique access to a mounting archive of users’ spoken queries as well as their reactions to its answers. Yet Amazon’s Alexa promptly eclipsed Siri upon arrival three years later.

Alexa didn’t overtake Siri because Amazon had more speech samples but because Amazon devised a way to converse with the computer hands-free. “Hey Alexa!” made it possible to put the assistant in a speaker and talk to it freely. Amazon then sidelined Siri by opening the technology to other companies, fostering a broad ecosystem of Alexa-equipped products from lightbulbs to cars.

So will the data flooding into Amazon from all those devices confer a long-term advantage? Don't bet on it. Google's own always-listening assistant, launched two years after Alexa, not only understands verbal commands but answers questions based on a so-called knowledge graph the search giant developed to respond to queries with facts in addition to web links. Amazon’s share of the smart-speaker market recently fell by more than a third from a year ago, while Google’s nearly doubled, according to market researcher Strategy Analytics.

Amazon is still ahead in smart speakers, with 42 percent share in the second quarter to Google’s 28 percent, but for how much longer?

The same story has played out time and time again. Flight data collected by drone maker DJI hasn’t kept it safe from Skydio, which devised better algorithms for avoiding obstacles. Uber’s bumper crop of data about drivers, passengers, and routes hasn’t fended off Lyft. Facebook, even with snapshots spanning nearly a third of humanity, had to buy Instagram to neutralize an existential threat. This isn’t new: Yahoo, which in 1998 had more web-search data than anyone, got crushed by then-upstart Google.

The data advantage is short-term and getting shorter all the time. That’s true even in specialized fields where data isn’t sloshing around the net. Certainly Paige.AI’s exclusive access to Memorial Sloan Kettering Cancer Center’s library of tissue slides gives it a head start in the race to automate cancer diagnosis. But soon enough the effort will face challengers that manage to obtain slides from other institutions. Then Paige.AI — like the internet giants and the unicorn startups — will have no choice but to keep innovating.

Sitting on a pool of data doesn’t turn a company into a high-tech Saudi Arabia. In a networked world, long-term advantage comes from maintaining a pace of innovation that keeps you abreast of tech trends and ahead of customer needs.

Reza Zadeh is an adjunct professor of artificial intelligence at Stanford University and founder and CEO of Matroid, a computer-vision startup.

More