Head over to our on-demand library to view sessions from VB Transform 2023. Register Here
If data is the new gold, then today’s “gold” comes in the form of priceless insights into trends and customer behaviors for growth-seeking organizations. But possessing an abundance of data — though fortunate — remains problematic, at least for now.
Most organizations have a tremendous amount of data available at their fingertips, yet don’t have the infrastructure or equipment to process all of it. 2.5 quintillion bytes of data are currently being generated daily, and it’s accelerating alongside the proliferation of IoT technologies on one end, and centralized cloud services catering to billions of daily users on the other end. today’s standard computer chips — central processing units (CPUs) — have reached a performance ceiling where the cost of computing outweighs the benefits.
As illustrated by the famous gold rush of the 19th century, there is a natural tendency to follow familiar paths, even at the cost of climbing a steep slope and achieving less-than-ideal results. Many gold miners may have fared far better by creating new paths. Similarly, forging a new path toward data analysis is essential in finding the ideal path to the “new” gold.
VB Transform 2023 On-Demand
Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.
Make no mistake – data has already led to countless breakthroughs and provided incredible benefits. But if we are to truly squeeze all of the value out of this new gold, now is the time to move beyond CPUs and explore next-gen alternatives that unlock a whole universe of insights at unprecedented speeds.
To truly understand where and how big data processing is falling short, a look at the evolution of artificial intelligence (AI) can be extremely enlightening.
The prerequisite for the AI revolution
AI’s first landmark use cases trace back decades to the various research projects that explored algorithms and their applications. One of the earliest was the minimax algorithm designed for playing checkers. It has since evolved to play chess, becoming quite a formidable opponent.
But beyond the scope of board games, AI’s growing list of applications and use cases soon sparked its second breakthrough: the proliferation of entity services largely tasked with analyzing copious amounts of user data to help large-scale enterprises better understand customer needs.
Yet these algorithms and entities were ultimately only as good as the general-purpose processors they ran on. Although they excelled at logic- and memory-intensive workloads, their processing speeds were slow. This changed, however, in 2009, when Stanford researchers discovered that graphics processing units (GPUs) were significantly better than CPUs at processing deep neural networks due to their increased degree of compute parallelism — the ability to run multiple calculations or processes simultaneously. This novel computing infrastructure sparked AI’s third and most decisive breakthrough, the era of deep neural networks.
GPUs did not only accelerate the way AI algorithms ran. The shift towards neural networks created unprecedented levels of algorithmic performance that opened up a whole world of opportunity for new algorithms that were, until then, impossible or inefficient due to the limitations of CPUs. These include large language models that transformed our search engines and the now popular generative AI services like DALL-E 2, Imagen, Stable Diffusion and Midjourney. The GPU revolution made it quite apparent that the right processing hardware was the key to sparking the modern AI revolution.
Big data’s missing element
The history of AI’s development can shed much light on the current state of data analytics.
First, like AI, Big Data research projects initially spawned a wide variety of algorithms and use cases. Second — again, similar to AI — a proliferation of data collection and analysis services followed. For example, there is an incredible amount of infrastructure built around big data analytics from all the major cloud providers such as Amazon, Google and Microsoft.
But unlike AI and its GPU “revolution,” Big Data has yet to mimic AI’s third breakthrough: the acquisition of its own unique computing infrastructure.
Currently, CPUs still serve as the basis for data analytics despite their inefficient processing rate, but unlike with AI, GPUs are not a suitable substitute. That means that as companies accumulate more data, they typically take on more servers to cope with the heavy load — until the cost of data analysis outweighs its benefits.
Forge a new path
If we can find a way to run data analytics workloads on dedicated processors with the efficiency that AI workloads now run on GPUs and other hardware accelerators, we can spark a similar “revolution,” cracking open the world of Big Data to create a new level of insights at previously unattainable speeds. But to do this, we must reexamine the hardware we use.
Failure to find a suitable computing infrastructure will prevent organizations from scaling their data utility, hindering their ability to cultivate new insights and foster additional innovations. Succeeding, on the other hand, could encourage a whole new era of Big Data.
The downfall of many gold-rush prospectors was their misguided urge to follow known paths to previously discovered gold. AI researchers, on the other hand, strayed from the common path and found a new one, the path toward GPUs and other accelerators, which continues to be the gold standard for deep learning. If Big Data researchers can forge their own path, they too may one day strike gold and push the boundaries of Big Data analytics far beyond anything anyone can imagine.
Adi Fuchs is lead core architect at Speedata.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!