VentureBeat presents: AI Unleashed - An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More
This article is part of a VB special issue. Read the full series here: The future of the data center: Handling greater and greater demands.
The launch of ChatGPT seven months ago was a watershed moment, shifting the world’s attention to generative artificial intelligence (AI). Built on OpenAI’s GPT series, the all-encompassing chatbot displayed an aptitude for dynamic conversations, offering individuals the opportunity to delve into the realm of machine intelligence and discover its potential in enhancing both their professional and personal queries.
For enterprises, ChatGPT was far from the first instance of AI exposure. Before the generative tool came to the fore, companies across sectors were already using AI and machine learning (ML) for different aspects of their work — in the form of computer vision, recommendation systems, predictive analytics and a lot more. If anything, the OpenAI bot only made sure that they doubled down on these efforts to remain competitive.
Today, enterprises are betting big on all sorts of next-gen workloads. However, this is no piece of cake. Take GPT-3, one of the models behind ChatGPT. The 175 billion-parameter technology needed about 3,640 petaflop-days in computing power for training. That is roughly one quadrillion calculations per second for a continuous period of 3,640 days.
An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.
How can data centers meet these extensive computing demands?
To handle the calculations demanded by next-gen workloads quickly and effectively, enterprises need massively parallel computing (MPP) in their data centers. MPP is a technique used in high-performance computing (HPC) that takes a complex task (like querying a complex database) and breaks it down into many smaller tasks, which then run on separate nodes working simultaneously. The results are combined to get the final output.
Many data centers run on general-purpose processors, which can handle traditional workloads but are not fast enough to run several complex calculations, like multiplying large matrices and adding complex vectors, at the same time. This drawback is pushing enterprises to rethink their data centers and focus on specialized processors such as GPUs.
“One of the most notable shifts is a trend towards offloading workloads to specialized hardware,” said Brandon Wardlaw, associate director for technology consulting at consulting firm Protiviti. “General-purpose compute nodes with heavy CPU capacity simply aren’t sufficient nor cost-effective for these next-gen workloads, and there’s been massive innovation among GPU OEMs and providers of more specialized FPGA (field programmable gate array) and ASIC (application-specific integrated circuit) hardware to support the highly parallel computation necessary to train models.”
One of the companies driving the shift towards specialized hardware-accelerated data centers is Nvidia. Its data center platform provides a diverse range of GPU options — from the highest-performing H100 to the entry-level A30 — to meet the intense computing demands of modern workloads, from scientific computing and research to large language model (LLM) training, real-time analysis of machine efficiency and generation of legal material.
In one case, Ecuador-based telecom company Telconet is using Nvidia’s DGX H100, a system of eight H100 GPUs combined, to build intelligent video analytics for safe cities and language services to support customers across Spanish dialects. Similarly, in Japan, these high-performance GPUs are being used by CyberAgent, an internet services company, to create smart digital ads and celebrity avatars.
Mitsui & Co., one of Japan’s largest business conglomerates, is also leveraging DGX H100, using as many as 16 instances of this system (128 GPUs) to run high-resolution molecular dynamics simulations and generative AI models aimed at accelerating drug discovery.
GPU-based acceleration comes with challenges
While GPU-based acceleration meets workload demands across various sectors, it can’t be fully effective unless certain limitations are addressed.
The problem is two-fold. First, implementing these add-on cards brings a major physical challenge as traditional one or two-rack unit “pizza-box” servers simply do not have the space to accommodate them. Second, this kind of dense computing hardware also results in high power draw (DGX H100 has a projected consumption of about 10.2 kW max) and thermal output, creating operational bottlenecks and increasing the total cost of ownership of the data center.
To address this, Wardlaw suggested making compensatory accommodations elsewhere, like increasing compute density with high-core count x64 chipsets and migrating general-purpose workloads to these platforms. He also emphasized taking a more proactive approach to thermal management and optimizing data center layouts to increase cooling efficacy and efficiency.
According to Steve Conner, vice president of sales and solutions engineering at Vantage Data Centers, the key to supporting HPC will be getting away from an air-cooled footprint. That’s because one has to control the temperatures on the CPUs and GPUs, and the only way to do that is to go to some sort of medium that has a much better heat exchange profile than air — e.g. liquid-assisted cooling.
“What we’ve seen working with other platforms from an HPC standpoint [is that] the only way to get that maximum performance is to deliver that liquid to the heat sink, both on the GPU and CPU side of the house,” he told VentureBeat.
Along with specialized hardware, enterprises can consider emerging workarounds like software-based acceleration to support some next-gen workloads in their data centers.
For instance, Texas-based ThirdAI offers a hash-based algorithmic engine that reduces computations and enables commodity x86 CPUs to train deep learning models while matching the performance of certain GPUs. This can not only be more affordable (depending on the workload) but also create fewer operational and physical roadblocks.
There’s also the option of optimization, using techniques like knowledge distillation to reduce a model’s size and make it easier to support it.
Such methods can result in some accuracy loss. But Bars Juhasz, CTO and cofounder of content generator Undetectable AI, said the company’s distilled model was 65% faster than the base one, while retaining 90% of the accuracy — a worthwhile tradeoff.
“Scaling the performance of models can be thought of in a similar manner to existing technology stacks, i.e. horizontally and vertically. Adding more GPUs would be akin to horizontal scaling, whereas optimizing the model and using accelerated software is akin to vertical scaling. The key to revamping performance is understanding the technical specifics of the model [workload] and choosing the right acceleration option to match,” Juhasz noted.
According to Wardlaw, if the AI/ML workloads are an “always-on” operation for the business, owning and managing the specialized hardware locally in the data center would be cost-effective at scale.
However, if these workloads aren’t an “always-on” operation and the business may not run these workloads at the scale or frequency required to justify the investment, it would be better to go for alternative acceleration methods or AI/ML-optimized hardware offered by a dedicated provider or cloud hyperscaler on an Infrastructure-as-a-Service (IaaS) model.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.