Creating the next wave of computing beyond large language models

Presented by VAST Data

With access to just a sliver of the 2.5 quintillion bytes of data created every day, AI produces what often seem like miracles that human intellect can’t match -- identifying cancer on a medical scan, a viable embryo for IVF, new ways of tackling climate change and the opioid crisis and on and on. However, that’s not true intelligence; rather, these AI systems are just designed to link data points and report conclusions, to power increasingly disruptive automation across industries.

While generative AI is trending and GPT models have taken the world by storm with their astonishing capabilities to respond to human prompts, do they truly acquire the ability to perform reasoning tasks that humans find easy to execute? It’s important to understand that the current AI the world is working with has little understanding of the world it exists in, and is unable to build a mental model that goes beyond regurgitating information that is already known.

Yann LeCun, AI Chief at Meta, recently said that current artificial intelligence systems like ChatGPT “are not even as smart as a dog,” though the limited reasoning abilities of large language models (LLMs) are offset by their large associative memory capacity. This makes them “a bit like students who have learned the material by rote but haven't really built deep mental models of the underlying reality.”

So, for all the hype, generative AI as we know it is only the beginning of the deep learning and automated discovery era. We are now just starting to see a glimmer of something that is greater than the ability to correlate and generate data when using simple language models, says Jeff Denworth, co-founder at VAST Data.

“An AI that exists beyond the automation of routine tasks will be marked by machines that can understand the natural world—that can reason about that natural world,” he says, “and it will create mental models that will serve as the basis for entirely new discoveries.”

He points to AlphaDev: the artificial intelligence (AI) system built by Google DeepMind that recently uncovered brand-new sorting algorithms that are up to 70% faster for shorter sorting sequences and about 1.7% faster for large ones, blowing away the algorithms that data scientists and engineers have been fine-tuning for decades.

“That’s very different from asking a chatbot what the diameter of the earth is,” he adds. "Those are things that we know. But what you’re starting to see is that computers are starting to discover things that we don’t know.”

We’re on the cusp of what he calls “AI-automated discovery,” or the potential to evolve AI from LLMs, which are currently limited to performing routine tasks, like business reporting or collating and synthesizing known information, into data-driven triggers where AI is autonomously seeking answers to questions unprompted by humans as new, natural, rich data enters a dataset.

Unlocking brand-new knowledge at lightning speed

Humans can take 20 years to become domain specialists, and then apply that thinking toward solving real problems. That specialization can be achieved by an AI computer today in a matter of minutes or seconds.

A thousand data centers around the world all working on the same problem, each with trillions of cores and hundreds of petabytes or exabytes of data, can become a global computer, playing through scenarios in simulations at internet speeds—advancing the process of how we learn by light years, and making discoveries faster than humans will ever be capable of on their own. This kind of data-driven and event-driven automation expedites AI discovery for the types of use cases that impact all of humanity—and even expands the possibilities of discovery in areas uncharted or even not yet imagined by humans to date.

“Imagine these machines tackling crop production, new approaches to sustainable energy, the elimination of disease,” Denworth says. “We think that these machines will find and discover whole new domains of science and mathematics on their own that are beyond our current evolutionary step.”

But much has to change to make it happen, he adds. This new paradigm will require a brand-new way of approaching AI infrastructure.

Building a central corpus of the world’s data

This future of computing requires what Denworth refers to as a thinking machine (a nod to the 1980s parallel computing company), and will require us to embrace several new computing paradigms, from the nature of data structures to the nature of computing on data. And it will require a way to simplify and automate the process of implementing AI.

“It’s easy to say we have a lot of data and a lot of machines, and therefore we’re ready,” he explains. “But the hard job is bringing it all together, so that the machines can see and share the data, particularly when organizations deal with things like data gravity and data privacy. You need to build new approaches to extend our understanding of data on a global scale and to create a form of anti-gravity for data and data processors.”

The concept of a data platform also needs to change. Today’s leading data platform providers are largely integrating machine learning solutions upon systems that were fundamentally designed for business reporting, but numbers and tables are not the data constructs most humans use to interact with the world.

“Sight, sound, touch, smell and taste – these are the senses that humans use to perceive the natural world, and by synthesizing the real-time data that comes from these sensors with our neural networks we develop understandings and realizations,” he says. “We want to build a computer that acts like that, a system that understands data (not in tables) but one that creates structure and understanding from the rich natural data that comes to us from all over the world.”

Once this richer class of data gets fed into a thinking system, such a machine instantly and innately starts doing things like interpreting, correlating, and building new realizations upon this data, so that it’s just perpetually getting smarter about what’s happening around it, rather than just being prompted to process and learn upon human request.

In order to give AI systems the greatest chance of creating discoveries, we must put data at the center of the system as a knowledge store and an experience trigger, where each data event becomes an expansion against our past learnings that in turn create new understandings.

“If you can give training models access to the world’s data and the world’s processors and provide mechanisms for organization and processing, then we should be able to reduce the time it takes for us to achieve new discoveries,” he says. “At that point, machines won’t just assist humans to achieve new discoveries—these systems will allow us to advance the rate of discovery from generational cycles to processor clock cycles.”

The need for an unstructured database

This future depends on a next-generation approach to data management and database architecture, however, to lay the foundation for intelligent computers to collect, process and collaborate on data at a global scale in one unified computing environment.

“The reality is that the next era of deep learning requires an integrated solution designed for the imperatives of tomorrow,” Denworth says.

But here in the present, data-driven companies are launching increasingly sophisticated AI initiatives, and today’s data management constructs cannot easily deal with the multi-variant types of data these AI initiatives ingest. Organizations are forced to stitch together databases, data warehouses, data lakes, file systems and streaming platforms to make sense of this data deluge. What connects these systems are APIs, which often work with each other according to some lowest common denominator. VAST Data is simplifying the data management experience by breaking the tradeoffs that have resulted in this soup of infrastructure, and then by rethinking the relationship between structured data and unstructured data at the fundamental level.

“Unstructured data, GPUs, global data sets, a variety of on–premises and cloud computers — these are the hallmarks of the environments that are being deployed by leaders in deep learning,” Denworth says. “The biggest hyperscale organizations have been building infrastructure for decades, but this has been the property only of the computing elite. On August 1, VAST will take organizations to a new place where these systems won’t be built from independent technologies that have been designed upon legacy concepts. With a full rethink, we can democratize systems of AI-automated discovery for everyone.”

For a deep dive into VAST’s vision for the future of AI infrastructure, plus a look at how customers like Pixar, Zoom and The Allen Institute and partners like NVIDIA are harnessing this powerful new approach to deep learning, don’t miss VAST’s Build Beyond event on August 1st.

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Unlocking brand-new knowledge at lightning speed

Building a central corpus of the world’s data

The need for an unstructured database

More