AI enterprise at scale: Faster, surer rollouts
Strategic efforts to expand AI into production systems are exploding. Here’s how you can accelerate operationalization.
Encouraged by industry success stories and the results of their own initial efforts, enterprises worldwide are investing heavily in expanding strategic AI initiatives into production. Top bosses and boards want low TCO with rapid ROI and time-to-value — including faster innovation, improved productivity, revenue growth, and ideally, a lift in stock price.
AI-based products and embedded capabilities will explode across the mainstream in 2023, many analysts predict. Indeed, aggressive AI expansion is already underway in manufacturing, healthcare, financial services and many other industries. Global researcher Omdia forecasts the number of companies actively developing live AI deployments will double by the end of 2025.
It’s go-time for mainstreaming AI. Ready?
More than ever, IT leaders and builders are being called upon to quickly develop, deploy and scale robust AI production systems to support key corporate goals like improving customer experience (CX) and service, developing new products, and driving sales and marketing.
In this in-depth coverage, we’ll look at current realities, challenges, technologies and strategies for successfully accelerating and expanding AI that can quickly deliver clear value. (Bottom line: A key to success is the ability to consistently scale pilots into production.)
A great place to start is with a close-up of Wayve. The UK global pioneer is working to scale next-generation autonomous vehicles (AVs) guided by embodied AI to 100 cities around the world.
Wayve: New road for AVs runs through the cloud
Becoming a cabbie in London requires memorizing the city’s famously bewildering labyrinth of 25,000 streets, along with every landmark and business. It’s considered to be a grueling test, taking years of learning. It’s an ideal place for Wayve to test and train its next-generation, fully autonomous driving system based on deep learning and embodied intelligence.
For the last six years, Wayve has pursued an ambitious mission to reimagine autonomous vehicles (AVs). They’ve pioneered the software, lean hardware and fleet learning technology platform needed to support development of large foundation, AI-based models that can quickly and safely adapt to new conditions without explicit programming. Instead of relying on the traditional AV stack, high-definition (HD) maps and hand-coded rules, the data-driven driver uses cloud-based supercomputing (HPC) that allows Wayve-powered AVs to scale, adapt and generalize their driving intelligence to places it has never seen before.
Commercial trials in London, including with grocery delivery partners, are a key step in the company’s plan to introduce Wayve-powered cars and vans to 100 cities across the globe. It’s AI scaling at a scale that few organizations will need or attempt. Yet Wayve’s innovations offers a dramatic lesson for any company about the importance of planning for foundational AI infrastructure to enable growth.
“Can’t get there from here”
After Wayve set out in 2017, Cofounder and CEO Alex Kendall had a growing revelation. It amounted to: “You can’t get there from here.” More specifically, it was becoming clear that on-premise AI infrastructure would not be up to the task of handling company goals.
In the small bedroom of a house in the startup’s headquarters in Cambridge, U.K., 50 or so GPUs cranked away, busily training prototypes of the next-generation autonomous driving system at the core of the company’s mission.
Kendall noticed that the small staff of 15 machine learning (ML) and robotics experts working on the venture-funded project was burning precious time servicing the on-premise AI processing and infrastructure.
“In a disruptive startup,” says Kendall, who holds a PhD in Computing Vision and Robotics from the University of Cambridge, “timing and speed are everything.” Wayve’s strategy has been to keep focus on their core product, an embodied AI for driving, and partner with others for speed and cost-effective development.
In 2019, the company decided to shift to a cloud-based approach to accelerate development and scaling of the AI that is the engine of the business. Working with Microsoft, they developed a platform that uses the PyTorch open-source machine learning framework with Microsoft Azure Machine Learning and “AI-first” infrastructure to handle end-to-end machine learning, rapid prototyping and quick iteration. Expansion continued over the next few years, including introduction of cloud-based NVIDIA GPUs (like T4s) and development platform.
90% faster training on neural nets with billions of parameters
Today, the entire Wayve ecosystem runs within the purpose-built AI environment – compute, storage, networking and software – gathering, managing and processing millions of samples of driving data per year, including petabytes of images, GPS and sensor data. Kendall says the scalable capacity of the cloud-based environment makes it faster to build, iterate and deploy large foundation models for driving in complex urban environments, adjust models more nimbly and adapt to new environments more readily.
Each week, according to Kendall, Wayve trains neural networks with billions of parameters — three orders of magnitude larger than before – and all 90% faster than previously. “None of this would have been attainable if we were on premise,” he says.
Simulation leads to faster insights never before possible
It’s not just faster development and scale. Working in a cloud-based AI environment also lets the company do things not previously possible. Take simulation. Real-world testing is a critical part of the development process but comes with major limitations. It costs significant time and money, edge-cases can be rare and scenarios cannot be recreated.
To overcome these challenges, Wayve developed its Infinity Simulator. With the push of a button, the simulator procedurally creates synthetic training data from diverse, large-scale virtual worlds. Using reinforcement learning and foundational models, Infinity generates complex and challenging driving scenarios that allow Wayve teams to train, understand and validate the AI model’s driving intelligence. Says Kendall: “We have orders-of-magnitude more ability to elastically create simulated scenarios, which gives us insights that would be vastly slower or even impossible to get from real-world testing.”
Countless variants of the same initial scenario can be run in parallel in Azure to provide huge numbers of training and test cases for driving intelligence models.
The purpose-built AI cloud infrastructure also extends to in-field data collection. Wayve deploys its hardware stack on vehicles operated by Wayve and its fleet partners, which send driving experiences back to the company through Azure cloud and IoT services.
Overall, Kendall says the ability to use ML to generate operating environments and run atop a Kubernetes service “lets us run at a fast distributed scale… and spin different systems up and down depending on internal demand.” Further, he says, the purpose-built, AI cloud infrastructure improves consumption flexibility and cost management by enabling quick switching to optimize for a particular GPU or to co-locate with data, for example.
“It’s all about scaling, speed and leveraging the latest technology.”
Armed with these leading-edge technologies, Wayve is currently focusing on scaling AV2.0, using Azure to further increase the size and complexity of its neural networks by “many orders of magnitude”. Last year, the company announced it is working with Microsoft to leverage Azure supercomputing infrastructure and technologies to further accelerate and power.
Kendall is confident Wayve is on the right track. “Building and safely deploying autonomous driving technology on a global scale requires powerful AI infrastructure that one day can train models with trillions of parameters and exabytes of image data,” he says. “It’s all about scale, speed and leveraging the latest technology. Trying to run all this on-premise infrastructure would distract our focus and mean we have to hire 100 people just to build a data center. That’s not our business.”
Challenge: Prioritize AI production with clear value
Enterprise AI is stuck. A widely cited survey by Gartner in 2019 found that 53% of enterprise AI projects made it from pilot to production. An update in mid-2022 reported an increase of only 1%. “Scaling AI continues to be a significant challenge,” concludes Frances Karamouzis, distinguished VP analyst at Gartner.
Only 54% of AI pilots will make it into production. Gartner
Large global studies by McKinsey and others also show that despite rapidly increasing investments, operationalization and adoption of AI has plateaued. Many enterprises are trapped in what the consulting firm calls “pilot purgatory”.
What’s behind the stubborn difficulty of turning “science experiments” and tactical pilots into production AI that can drive impactful business gains?
Stymied by complexity, talent and cost
For starters, says John Lee, AI Platforms & Infrastructure Principal Lead for Microsoft Azure, “the inherent difficulty, complexity and cost” make deployment of AI at scale challenging for all but the most advanced, deepest-pocketed enterprises like Tesla and hyperscalers.
“People see the opportunities, but not everybody has the same resources, capabilities, talent pool and understanding to implement and monetize it.” Affordable tools, platforms and processes are big barriers to adoption, he notes.
Beyond cost hurdles, industry studies say that enterprises trying to scale AI also struggle with inadequate data, slow model training, workforce resistance, poor alignment with key organizational goals and unclear ROI. In some cases, AI efforts divert technical resources and cycles from core operational systems, impeding and slowing both.
The key to avoiding stalled AI
Failure to advance from POC to pilot to production is an important issue for IT and business teams. Stalled, unscalable, poorly targeted AI without clear value burns precious funding, dampening management and shareholder support, jeopardizing further investment, stifling innovation and endangering competitiveness.
The solution lies in improving the technological, process and organizational maturity of enterprise AI. Doing so enables rapid, reliable, cost-effective development and deployment that delivers greater value with fewer resources. Industry research is clear:
Enterprises that prioritize scaling pilots into production systems that solve critical business problems and spot opportunities (with an eye toward eventual “industrialization” of AI) will enjoy continuing advantage and funding for innovation.
AI growth needs a solid technology foundation
With greater demands for data volume, speed and processing power, AI development and deployment requires different technology, processes and skills than traditional software. To quickly build and operate reliable solutions at scale, McKinsey and other experts underscore it’s crucial that enterprise leaders make the right investments in tech stacks and teams. Unfortunately, few appear well-positioned.
Only 20% of companies have the technology infrastructure in place to make the most of AI’s potential. Bain & Co.
End-to-end environments. For many organizations, efforts to mature and industrialize AI production will focus on a foundational, “AI-first” platform and infrastructure that includes cloud and data platforms and tools, GPUs and accelerators, software, architecture and services. As Accenture notes, this “AI core” works across the cloud continuum (e.g., migration, integration, growth and innovation), provides end-to-end data capabilities (foundation, management and governance), manages the machine learning lifecycle (workflow, model training, model deployment) and provides self-service capabilities. Pre-integrated, full-stack environments optimized for AI help accelerate building, operationalization, management and expansion.
The focus on the cloud is important. “Where” AI workloads run is rapidly evolving, says Robert Ober, Chief Platform Architect, NVIDIA. He notes that more enterprises are choosing public clouds and Infrastructure as a Service (IaaS) to build and deploy AI-enabled services and maximize infrastructure investments.
Cloud infrastructure from Microsoft and NVIDIA, purpose-built for AI, provides an end-to-end, full-stack solution that includes on-demand global access to the latest GPUs. This “AI-first” foundation combines the support, agility, simplified IT management and scalability of the cloud and performance-optimized, enterprise-grade software stack, delivering the highest scalable performance to accelerate the whole AI pipeline – from start to finish.
High-performing organizations have prioritized putting these foundational platforms in place for AI-related data science, data engineering and application development. Many have also adopted advanced scaling practices, such as using standardized tool sets to create production-ready data pipelines. AI workflow management can be simplified and accelerated with curated collections of use-case based content, which speed identification of compatible frameworks containers, models, Jupyter notebooks, detailed documentation and other resources.
AI services. Pay-as-you-go AI services offer another way for enterprises to speed delivery of cloud and intelligent edge AI applications and capabilities – in days instead of months. Microsoft Azure AI, for example, speeds development by giving data scientists and others access to vision, speech, language, decision-making models and built-in business logic through simple API calls.
The new Azure OpenAI Service lets organizations quickly create cutting-edge applications or modernize business processes without ML expertise. Access to advanced AI models — including GPT-3.5, Codex, DALL•E 2, — is backed by trusted enterprise-grade capabilities and AI-optimized infrastructure.
Accelerated hardware. Custom combinations of new, energy-efficient GPUs and DPUs accelerate compute performance and deliver the parallelism, scale and efficiency needed to build language models, recommenders and other leading-edge AI applications more quickly — with lower TCO, reduced carbon footprint and faster ROI than legacy architectures.
Software. “Enterprises often must choose between cloud computing and hybrid architectures, which can stifle productivity and slow time-to-value and innovation,” says Manuvir Das, Vice President, Enterprise Computing at NVIDIA. Advances in AI platform software can help businesses unify AI pipelines across all infrastructure types — on-prem, private cloud, public cloud, or hybrid cloud — and deliver a single, connected experience, says Das.
This ability to run AI software across different infrastructures brings several benefits, according to Das: Ending AI silos, allowing enterprises to balance costs against strategic objectives, regardless of project size or complexity, and providing access to virtually unlimited capacity for flexible development.”
Another advance: curated AI software stacks for horizontal use cases. These make it faster and easier for enterprises to bring AI into common business workflows via SaaS and other delivery modes. Similarly, industry-specific AI solutions for life science and other fast-growing verticals promise to speed go-live and expansion schedules by freeing enterprises from the time-consuming work of creating specialized but common capabilities and training models.
Accelerate production AI with autonomous systems
In addition to establishing a solid, end-to-end-core foundation for AI, several other fast-emerging new autonomous technologies and approaches can help significantly speed development, scaling and time-to-value of enterprise production AI.
Large language models (LLMs). Open AI’s GPT-3.5 (soon 4.0) and DALL•E 2 thrust the powerful capabilities of LLMs into the spotlight in 2022. LLMs find hidden patterns in unstructured data to support healthcare breakthroughs, advancements in science, better customer engagements and even major advances in self-driving transportation. Wayve, for example, is creating a single large foundation model similar to the large language models on the market today, but instead of text input, they use image. They believe the best approach to solving autonomous driving is a large-scale foundation neural network that’s trained using self-supervised learning that can really address diverse sets of data. Rapid R&D and commercialization will bring a universe of new business applications – from summarizing medical notes, generating catalog descriptions, more natural and helpful chatbots, instant translation into hundreds of languages — across multiple domains.
96% of businesses surveyed plan to use AI simulations this year. PWC
Of particular interest to AI scalers: the ability of these autonomous models to self-train without supervision and to be acquired pre-trained, ready for customization if desired. Removing the time-consuming work, normally done by hard-to-find data specialists, provides a huge advantage to enterprises pressed to show rapid progress and results. The emerging capabilities of LLMs to automate software coding and editing (including for AI) further increases their appeal.
Simulation. The ability of digital twins to accurately simulate the real world, capture and process massive amounts of data and encode autonomy, without the need for deep expertise, is another powerful tool for speeding testing, training and propagation of production AI.
Low-code/ No-code programming. These popular new development tools help enterprises sidestep talent bottlenecks and resulting delays. Using visual authoring capabilities, engineers and others can quickly build and add AI to systems, equipment and processes without writing code or algorithms. Result: fast solutions to complex problems and accelerated innovation.
AI high-performers are 1.6 times more likely than other organizations to engage nontechnical employees in creating AI applications by using low-code or no-code programs. McKinsey
Low-code/no-code development tools let non-specialists transform whiteboards into AI, speeding go-live dates and innovation.
Bottom line: Scale or fail
As with any adolescence, becoming a more mature AI organization inevitably will bring awkward moments, dashed hopes and growing pains. Microsoft’s Lee reminds about the importance of having realistic expectations for AI. And patience.
“If you think you’re going to scale once and hit a home run, you need to reconsider,” he cautions. “Expect that your first deployments might be science projects. Learn from them, and develop the needed muscles internally, so that your organization gets really good at it.”
For perspective, Lee believes it’s helpful to look at other disruptive technologies. The first mass-produced electric car, GM’s EV1, debuted in 1996. Yet it wasn’t until 2017 that Tesla’s Model 3 became business feasible. Similarly, he notes, pilot-to-production percentages for another modern technology, IoT, is less than 40% — lower than AI.
A growing gap between achievers and laggards
Even so, Lee and other industry experts have no doubt that the race to develop AI-infused systems and applications will continue to accelerate as more companies and industries invest. Eventually, many believe it will be easier to adopt AI and harvest greater and greater benefits.
And as with any major technology change, companies advancing their AI maturity will need to look at processes and people to succeed. Fortunately, there’s a fast-growing body of best practices for deployment and other AI challenges. AI standards remain rare, but here again, various industry, professional and government groups are busily working on developing them.
Finally, take note: Several industry studies highlight a clear and growing gap between laggards and AI high achievers seeing high financial returns from AI. Top performers are making larger investments in AI, adopting advanced practices known to enable faster AI scaling and development, and show signs of faring better in the tight market for AI talent.
Cynics dismiss these warnings at their own peril, says Lee. “If AI is the future and you don’t invest, your organization risks becoming irrelevant. History is a graveyard littered with the tombs of famous companies that failed to capitalize on the next big thing like smartphones, digital photography and media streaming. What do you want your legacy to be?”