Amazon announces Graviton3 processors for AI inferencing

At its re:Invent 2021 conference today, Amazon announced Graviton3, the next generation of its custom ARM-based chip for AI inferencing applications. Soon to be available in Amazon Web Services (AWS) C7g instances, the company says that the processors are optimized for workloads including high-performance compute, batch processing, media encoding, scientific modeling, ad serving, and distributed analytics.

Alongside Graviton3, Amazon unveiled Trn1, a new instance for training deep learning models in the cloud -- including models for apps like image recognition, natural language processing, fraud detection, and forecasting. It's powered by Trainium, an Amazon-designed chip that the company last year claimed would offer the most teraflops of any machine learning instance in the cloud. (A teraflop translates to a chip being able to process 1 trillion calculations per second.)

As companies face pandemic headwinds including worker shortages and supply chain disruptions, they’re increasingly turning to AI for efficiency gains. According to a recent Algorithmia survey, 50% of enterprises plan to spend more on AI and machine learning in 2021, with 20% saying they will be "significantly" increasing their budgets for AI and ML. AI adoption is, in turn, driving cloud growth -- a trend of which Amazon is acutely aware, hence the continued investments in technologies like Graviton3 and Trn1.

Graviton3

AWS CEO Adam Selipsky says that Graviton3 is up to 25% faster for general-compute workload and provides two times faster floating-point performance for scientific workloads, two times faster performance for cryptographic workloads, and three times faster performance for machine learning workloads versus Graviton2. Moreover, Graviton3 uses up to 60% less energy for the same performance compared with the previous generation, Selipsky claims.

Graviton3 also includes a new pointer authentication feature that's designed to improve overall security. Before return addresses are pushed onto the stack, they're first signed with a secret key and additional context information, including the current value of the stack pointer. When the signed addresses are popped off the stack, they're validated before being used. An exception is raised if the address isn't valid, blocking attacks that work by overwriting the stack contents with the address of harmful code.

As with previous generations, Graviton3 processors include dedicated cores and caches for each virtual CPU, along with cloud-based security features. C7g instances will be available in multiple sizes, including bare metal, and Amazon claims that they're the first in the cloud industry to be equipped with DDR5 memory, up to 30Gbps of network bandwidth, and elastic fabric adapter support.

Trn1

According to Selipsky, Trn1, Amazon's instance for machine learning training, delivers up to 800Gbps of networking and bandwidth, making it well-suited for large-scale, multi-node distributed training use cases. Customers can leverage up to tens of thousands of clusters of Trn1 instances for training models containing upwards of trillions of parameters.

Trn1 supports popular frameworks including Google’s TensorFlow, Facebook’s PyTorch, and MxNet and uses the same Neuron SDK as Inferentia, the company’s cloud-hosted chip for machine learning inference. Amazon is quoting 30% higher throughput and 45% lower cost-per-inference compared with the standard AWS GPU instances.