Vultr brings GPU options to a wider audience

GPU programming just got a bit easier and cheaper to explore. Today, Vultr announced it is adding GPU instances to its cloud so AI researchers and others in need of massively parallel computing can rent them by the hour. Its new machine options, called Vultr Talon, are built on the new Nvidia A100 Tensor Core GPU, promising to democratize the area because the new product can be rented in smaller, less expensive amounts.

Vultr is known for offering low-priced cloud computing for developers who want simple commodity services. Many of its basic instances start at below one cent per hour and some of the smallest can be rented for $2.50 for a month.

Still, the company prides itself on offering a broad platform with many options. Its footprint, for example, spans the globe and its instances can be found in 25 data centers. Indeed, Vultr bristles at the notion that it is smaller; it prefers the word "independent" to describe its company when compared to the biggest options.

Some other cloud providers also offer GPU options, but, in many cases, they're available only in much larger slices. For example, Amazon's machine with the A100 Tensor Core, the p4d.24xlarge, lists in configuration only at $32.77 per hour, with discounts for reserved instances. It comes with more than a terabyte of RAM and 8 GPUs.

"The big tech clouds are focused on meeting the needs of the richest and biggest-budgeted companies around the world" explained J.J. Kardwell, the CEO of Constant, the parent company of Vultr. "They're focused on those users who may want an eight card system delivered, essentially, as bare metal. That's something that could be, you know, 14 to 15,000 dollars a month."

Vultr is taking a different path. Instead of building fat machines for researchers with big datasets, it's finding a way to slice up or "fractionalize" the GPU into smaller parts for developers who may not need as much power. Instances start with as little as 10 gigabytes of RAM and may cost as little as $90 per month.

The right size for the right developer

The company, though, isn't just targeting the small developer – larger versions are also available. Bare-metal servers with four A100 cards and 24-core Xeon CPUs are included on the price list.

Developers can choose the right size for their current project. They can start with smaller machines when debugging and then switch to larger machines in production. Or they might rent the largest versions when training the model and use much smaller machines when the model is deployed because it can be much more computationally intensive to create a model than to apply it to new data. Pricing and throughput are pretty much linear.

"What's important to know is that the full power, the full feature set of the card, is delivered," said David Gucker, COO at Constant. "It's not like there's any sacrifices made to GPU power by segmenting them. The feature accessibility is fully there."

Vultr worked closely with Nvidia to develop the product, which also bundles a license to some of Nvidia's CUDA software packages like Nvidia AI Enterprise. The company is also bundling the NGC Catalog, a collection of frameworks, pre-trained models, helm charts and AI-focused software development kits (SDKs).

The A100 is built on Nvidia's Ampere Architecture and is optimized for data crunching and AI model building using 3rd generation Tensor Cores and TensorFloat-32 (TF32) precision.

"I don't view this as being like anything else in the market," said Ryan Pollock, the VP of product marketing and developer relations at Vultr. "I really think we're creating a new category here."

The virtualization foundation is already built into the chip, but the others have focused on servicing the customers with the biggest needs and the largest budgets. The A100 can be sliced into vGPUs or virtual GPUs that receive their own share of the RAM and behave just like an independent chip.

All but the smallest options from Vultr will be able to tap into Nvidia's Multi-Instance GPU (MIG) technology, which guarantees the compute throughput and quality of service by fully isolating the GPU's high bandwidth memory cache and compute cores. This will prevent what some cloud users call the "noisy neighbors" problem that comes when others sharing a machine suck away too many of the compute cycles.

GPU shortages

Vultr's move comes at a good time for the marketplace. Over the last few years, there have been occasional shortages of GPUs and many of the chips and cards have been in short supply, leading some to bid up the price on the secondary market.

The tension over access has led to many political battles. Nvidia itself started changing the architecture to prevent miners of cryptocurrency from using some boards, a move designed to prevent said miners from buying up all the hardware and starving AI and graphics users.

The sensitivity over this issue has even spilled over into battles over financial reports. In early May, the SEC announced that they were setting charges with Nvidia over the level of disclosure in some financial disclosure filings from 2018.

Vultr's move to democratize access with lower prices may be a welcome move for many with a pent-up demand for high-powered machines. Still, the size and scope of the new category is hard to measure or even estimate. Lowering the price point opens up the technology not just to smaller companies with tighter budgets, but also R&D teams in large companies without the full commitment of their managers. Skunk work projects become feasible when the line item on the budget is small enough to avoid too much scrutiny.

"It's super important for innovation globally for a lot of businesses and developers because often a single card cost can be too much to justify," said Kardwell. "This is truly a massive shift, a big change and a big disruption in the accessibility of arguably the most important technology for AI and machine learning innovation."

The right size for the right developer

GPU shortages

More