This article is part of a VB special issue. Read the full series here: The CIO agenda: The 2023 roadmap for IT leaders.
And don’t miss additional articles providing new industry insights, trends, and analysis on how AI is transforming organizations. Find them all here.
Enterprises everywhere have recognized the central role of artificial intelligence (AI) in driving transformation and business growth. In 2023, many CIOs will shift from the “why” of AI to “how?” More specifically: “What’s the best way to quickly and economically grow AI production at scale that creates value and business growth?”
It’s a high-stakes balancing act: CIOs must enable rapid, wider development and deployment, and maintenance of impactful AI workloads. At the same time, enterprise IT leaders also need to more closely manage spending, including costly “shadow AI,” so they can better focus and maximize strategic investments in the technology. That, in turn, can help fund ongoing, profitable AI innovation, creating a virtuous cycle.
High-performance AI infrastructure — purpose-built platforms and clouds with optimized processors, accelerators, networks, storage and software — offers CIOs and their enterprises a powerful way to successfully balance these seemingly competing demands, enabling them to cost-effectively manage and accelerate orderly growth and “industrialization” of production AI.
In particular, standardizing on a public cloud-based, accelerated “AI-first” platform provides on-demand services that can be used to quickly build and deploy muscular, high-performing AI applications. This end-to-end environment can help enterprises manage related expenses, lower the barrier to AI, reuse valuable IP and, crucially, keep precious internal resources focused on data science and AI, not infrastructure.
Three major requirements for accelerating AI growth
A major benefit of focusing on AI infrastructure as a core enabler of AI and business growth is its ability to help enterprises successfully meet three major requirements. We and others have observed these in our own pioneering work in the area and, more broadly, in technology development and adoption over the last 20 years. They are: standardization, cost management and governance.
Let’s briefly look at each.
1. AI standardization
Enabling orderly, fast, cost-effective development and deployment
Like big data, cloud, mobile and PCs before it, AI is a transformative game-changer — with even greater potential impact, both inside and outside the organization. As with these earlier innovations — including virtualization, big data and databases, SaaS and many others — smart enterprises, after careful evaluation, will want to standardize on accelerated AI platforms and cloud infrastructure. Doing so brings a raft of well-understood benefits to this newest set of universal tools. Large banks, for example, owe much of their vaunted ability to quickly expand and grow to standardized, global platforms that enable fast development and deployment.
With AI, standardizing on optimized stacks, pre-integrated platforms and cloud environments helps enterprises avoid the host of negatives that often result from fielding a chaotic variety of products and services. Chief among them: unmanaged procurement, suboptimal development and model performance, duplicated efforts, inefficient workflows, pilots not easily replicated or scaled, more costly and complex support, and lack of specialist personnel. Perhaps most serious is the excessive time and expense associated with selecting, building, integrating, tuning, deploying and maintaining a complex stack of hardware, software, platforms and infrastructures.
To be clear: enterprise standardization of AI platform and cloud does not mean one-size-fits-all, exclusivity with one or two vendors, or a return to strictly centralized IT control.
To the contrary, modern AI cloud environments should offer tiered services optimized for a diverse range of use cases. The “standardized” AI platform and infrastructure should be purpose-built for different AI workloads, offering appropriate scalability, performance, software, networking and other capabilities. A cloud marketplace, familiar to many enterprise users, gives AI developers a variety of approved choices.
As for portability: containerization, Kubernetes and other open, cloud-native approaches offer easy movement across providers and multiclouds, easing concerns about lock-ins. And while enterprise standardization restores a CIO’s overall visibility and control, it can overlay on existing procurement policies and procedures, including decentralized approaches — a win-win.
2. AI cost management
Focusing and freeing funds for ongoing innovation and value
By various estimates, unauthorized spending, often by business groups, adds 30-50% to technology budgets. While specific figures for such “shadow AI” are hard to come by, surveys of enterprise IT priorities for 2023 show it’s a good bet that hidden investments on products and services will consume a good chunk of AI infrastructure costs. The good news is that centralized procurement and provisioning of enterprise-standard AI services restores institutional control and discipline, while providing flexibility for organizational consumers.
With AI, like any workload, cost is a function of how much infrastructure you must buy or rent. CIOs want to help groups developing AI avoid both over-provisioning (often with expense but underutilized on-premises infrastructure) and under-provisioning (which can slow model development and deployment, and lead to unplanned capital purchases or overages of cloud services).
To avoid these extremes, it’s wise to think of AI costs in a new way. Accelerated processing for inference or training may (or may not) initially cost more by using a powerful, optimized platform. Yet the work can be done more quickly, which means renting less infrastructure for less time, reducing the bill. And, importantly, the model can be deployed sooner, which can provide a competitive advantage. This accelerated time-to-value is analogous to the difference between total time driving to Dallas from Chicago (15 hours) or flying non-stop (5 hours). One might cost less (or with current gas prices, more); the other gets you there much faster. Which is more “valuable”?
In AI, reviewing development costs from a total cost of ownership standpoint can help you avoid the common mistake of looking just at raw expenses. As this analysis shows, the advantage of arriving more quickly, with less wear and tear and fewer possibilities for detours, accidents, traffic jams or wrong turns, is a smarter choice for our road trip. So it is with fast, optimized AI processing.
Faster training times speed time to insight, maximizing the productivity of an organization’s data science teams and getting the trained network deployed sooner. There’s also another important benefit: lower costs. Customers often experience a 40-60% cost reduction vs. a non-accelerated approach.
Training a sophisticated large-language model (LLM) on thousands of GPUs? Optimizing an existing model on a handful of GPUs? Doing real-time inferencing across the globe for inventory? As we noted above, understanding and budgeting AI workloads beforehand helps ensure provisioning that’s well-matched to the job and budget.
3. AI governance
Ensuring accountability, measurability, transparency
The term AI governance lately has acquired varied meanings, from ethics to explainability. Here it refers to the ability to measure cost, value, auditability and compliance with regulatory standards, especially around data and customer information. As AI expands, the ability of enterprises to easily and transparently ensure ongoing accountability will continue to be more crucial than ever.
Here again, a standardized AI cloud infrastructure can provide automations and metrics to support this crucial requirement. Moreover, multiple security mechanisms built into various layers of purpose-built infrastructure services — from GPUs, to networks, databases, developer kits and more, soon to include confidential computing — help provide defense in-depth and vital secrecy for AI models and sensitive data.
A final reminder about roles and responsibilities: Achieving profitable, compliant AI growth and maximum value and TCO quickly using advanced, AI-first infrastructure cannot be a solo act for the CIO. As with other AI initiatives, it requires a close collaboration with the chief data officer (or equivalent), data science leader and, in some organizations, chief architect.
Bottom line: Focus on how. Now.
Most CIOs today know the “why” of AI. It’s time to make “how” a strategic priority.
Enterprises that master this crucial capability — accelerating easy development and deployment of AI — will be far better positioned to maximize the impact of their AI investments. That can mean speeding up innovation and development of new applications, enabling easier and wider AI adoption across the enterprise or generally accelerating time-to-production-value. Technology leaders who fail to do so risk creating AI that sprouts wildly in expensive patches, slowing development and adoption and losing advantage to faster, better-managed competitors.
Where do you want to be at the end of 2023?
Visit the Make AI Your Reality hub for more AI insights.
#MakeAIYourReality #AzureHPCAI #NVIDIAonAzure
Nidhi Chappell is general manager of Azure HPC, AI, SAP, and confidential computing at Microsoft.
Manuvir Das is VP of enterprise computing at Nvidia.
VB Lab Insights content is created in collaboration with a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact firstname.lastname@example.org.