Amazon, we don’t need another AI tool or APl, we need an open AI platform for cloud and edge

After Amazon's three-week re:Invent conference, companies building AI applications may have the impression that AWS is the only game in town. Amazon announced improvements to SageMaker, its machine learning (ML) workflow service, and to Edge Manager -- improving AWS' ML capabilities on the edge at a time when serving the edge is considered increasingly critical for enterprises. Moreover, the company touted big customers like Lyft and Intuit.

But Mohammed Farooq believes there is a better alternative to the Amazon hegemon: an open AI platform that doesn't have any hooks back to the Amazon cloud. Until earlier this year, Farooq led IBM's Hybrid multi-cloud strategy, but he recently left to join the enterprise AI company Hypergiant.

Here is our Q&A with Farooq, who is Hypergiant's chair, global chief technology officer, and general manager of products. He has skin in the game and makes an interesting argument for open AI.

VentureBeat: With Amazon's momentum, isn't it game over for any other company hoping to be a significant service provider of AI services, or at the least for any competitor not named Google or Microsoft?

Mohammed Farooq: On the one hand, for the last three to five-plus years, AWS has delivered outstanding capabilities with SageMaker (Autopilot, Data Wrangler) to enable accessible analytics and ML pipelines for technical and nontechnical users. Enterprises have built strong-performing AI models with these AWS capabilities.

On the other hand, the enterprise production throughput of performing AI models is very low. The low throughput is a result of the complexity of deployment and operations management of AI models within consuming production applications that are running on AWS and other cloud/datacenter and software platforms.

Enterprises have not established an operations management system -- something referred to within the industry as ModelOps. ModelOps are required and should have things like lifecycle processes, best practices, and business management controls. These are necessary to evolve the AI models and data changes in the context of the underlying heterogeneous software and infrastructure stacks currently in operation.

AWS does a solid job of automating an AI ModelOps process within the AWS ecosystem. However, running enterprise ModelOps, as well as DevOps and DataOps, will need not only AWS, but multiple other cloud, network, and edge architectures. AWS is great as far as it goes, but what is required is seamless integration with enterprise ModelOps, hybrid/multi-cloud infrastructure architecture, and IT operations management system.

Failures in experimentation are the result of average time needed to create a model. Today, successful AI models that deliver value and that business leaders trust take 6-12 months to build. According to the Deloitte MLOps Industrialized AI Report (released in December 2020), an average AI team can build and deploy, at best, two AI models in a year. At this rate, industrializing and scaling AI in the enterprise will be a challenge. An enterprise ModelOps process integrated with the rest of enterprise IT is required to speed up and scale AI solutions in the enterprise.

I would argue that we are on the precipice of a new era in artificial intelligence -- one where AI will not only predict but recommend and take autonomous actions. But machines are still taking actions based on AI models that are poorly experimented with and fail to meet defined business goals (key performance indicators).

VentureBeat: So what is it that holds the industry back? Or asked a different way, what is that holds Amazon back from doing this?

Farooq: To improve development and performance of AI models, I believe we must address three challenges that are slowing down the AI model development, deployment, and production management in the enterprise. Amazon and other big players haven't been able to address these challenges yet. They are:

AI data: This is where everything starts and ends in performant AI models. Microsoft [Azure] Purview is a direct attempt to solve the data problems of the enterprise data governance umbrella. This will provide AI solution teams (consumers) valuable and trustworthy data.

AI operations processes: These are enabled for development and deployment in the cloud (AWS) and do not extend or connect to the enterprise DevOps, DataOps, and ITOps processes. AIOps processes to deploy, operate, manage, and govern need to be automated and integrated into enterprise IT processes. This will industrialize AI in the enterprise. It took DevOps 10 years to establish CI/CD processes and automation platforms. AI needs to leverage the assets in CI/CD and overlay the AI model lifecycle management on top of it.

AI architecture: Enterprises with native cloud and containers are accelerating on the path to hybrid and multi-cloud architectures. With edge adoption, we are moving to pure distributed architecture, which will connect the cloud and edge ecosystem. AI architecture will have to operate on distributed architectures across hybrid and multi-cloud infrastructure and data environments. AWS, Azure, Google, and VMWare are effectively moving towards that paradigm.

To develop the next phase of AI, which I am calling "industrialized AI in the enterprise," we need to address all of these. They can only be met with an open AI platform that has an integrated operations management system.

VentureBeat: Explain what you mean by an "open" AI platform.

Farooq: An open AI platform for ModelOps lets enterprise AI teams mix and match required AI stacks, data services, AI tools, and domain AI models for different providers. Doing so will result in powerful business solutions at speed and scale.

AWS, with all of its powerful cloud, AI, and edge offerings, has still not stitched together a ModelOps that can industrialize AI and cloud. Enterprises today are using a combination of ServiceNow, legacy systems management, DevOps tooling, and containers to bring this together. AI operations adds another layer of complexity to an already increasingly complex model.

An enterprise AI operations management system should be the master control point and system of record, intelligence, and security for all AI solutions in a federated model (AI models and data catalogs). AWS, Azure, or Google can provide data, process, and tech platforms and services to be consumed by enterprises.

But lock-in models, like those currently being offered, harm enterprise's ability to develop core AI capabilities. Companies like Microsoft, Amazon, and Google are hampering our ability to build high-caliber solutions by constructing moats around their products and services. The path to the best technology solutions, in the service of both AI providers and consumers, is one where choice and openness is prized as a pathway to innovation.

You have seen companies articulate a prominent vision for the future of AI. But I believe they are limited because they are not going far enough to democratize AI access and usage with the current enterprise IT Ops and governance process. To move forward, we need an enterprise ModelOps process and an open AI services integration platform that industrializes AI development, deployment, operations, and governance.

Without these, enterprises will be forced to choose vertical solutions that fail to integrate with enterprise data technology architectures and IT operations management systems.

VentureBeat: Has anyone tried to build this open AI platform?

Farooq: Not really. To manage AI ModelOps, we need a more open and connected AI services ecosystem, and to get there, we need an AI services integration platform. This essentially means that we need cloud provider operations management integrated with enterprise AI operations processes and a reference architecture framework (led by CTO and IT operations).

There are two options for enterprise CIOs, CTOs, CEOs, and architects. One is vertical, and the other one is horizontal.

Dataiku, Databricks, Snowflake, C3.AI, Palantir, and many others are building these horizontal AI stack options for the enterprise. Their solutions operate on top of AWS, Google, and Azure AI. It's a great start. However, C3.AI and Palantir are also moving towards lock-in options by using model-driven architectures.

VentureBeat: So how is the vision of what you're building at Hypergiant different to these efforts?

Farooq: The choice is clear: We have to enable an enterprise AI stack, ModelOps tooling, and governance capabilities enabled by an open AI services integration platform. This will integrate and operate customer ModelOps and governance processes internally that can work for each business unit and AI project.

What we need is not another AI company, but rather an AI services integrator and operator layer that improves how these companies work together for enterprise business goals.

A customer should be able to use Azure solutions, MongoDB, and Amazon Aurora, depending on what best suits their needs, price points, and future agenda. What this requires is a mesh layer for AI solution providers.

VentureBeat: Can you further define this "mesh layer"? Your figure shows it is a horizontal layer, but how does it work in practice? Is it as simple as plugging in your AI solution on top, and then having access to any cloud data source underneath? And does it have to be owned by a single company? Can it be open-sourced, or somehow shared, or at least competitive?

Farooq: The data mesh layer is the core component, not only for executing the ModelOps processes across cloud, edge, and 5G, but it is also a core architectural component for building, operating, and managing autonomous distributed applications.

Currently we have cloud data lakes and data pipelines (batch or steaming) as an input to build and train AI models. However, in production, data needs to be dynamically orchestrated across datacenters, cloud, 5G, and edge end points. This will ensure that the AI models and the consuming apps at all times have the required data feeds in production to execute.

AI/cloud developers and ModelOps teams should have access to data orchestration rules and policy APIs as a single interface to design, build, and operate AI solutions across distributed environments. This API should hide the complexity of the underlying distributed environments (i.e., cloud, 5G, or edge).

In addition, we need packaging and container specs that will help DevOps and ModelOps professionals use the portability of Kubernetes to quickly deploy and operate AI solutions at scale.

These data mesh APIs and packaging technologies need to be open sourced to ensure that we establish an open AI and cloud stack architecture for enterprises and not walled gardens from big providers.

By analogy, look at what Twilio has done for communications: Twilio strengthened customer relationships across businesses by integrating many technologies in one easy-to-manage interface. Examples in other industries include HubSpot in marketing and Squarespace for website development. These companies work by providing infrastructure that simplifies the experience of the user across the tools of many different companies.

VentureBeat: When are you launching this?

Farooq: We are planning to launch a beta version of a first step of that roadmap early next year [Q1/2020].

VentureBeat: AWS has a reseller policy. Could it could crack down on any mesh layer if they wanted to?

Farooq: AWS could build and offer their own mesh layer that is tied to its cloud and that interfaces with 5G and edge platforms of its partners. But this will not help its enterprise customers accelerate the development, deployment, and management of AI and hybrid/multi-cloud solutions at speed and scale. However, collaborating with the other cloud and ISV providers, as it has done with Kubernetes (CNCF-led open source project), will benefit AWS significantly.

As further innovation on centralized cloud computing models have stalled (based on current functionality and incremental releases across AWS, Azure, and Google), the data mesh and edge native architectures is where innovation will need to happen, and a distributed (declarative and runtime) data mesh architecture is a great place for AWS to contribute and lead the industry.

The digital enterprise will be the biggest beneficiary of a distributed data mesh architecture, and this will help industrialize AI and digital platforms faster -- thereby creating new economic opportunities and in return more spend on AWS and other cloud provider technologies.

VentureBeat: What impact would such a mesh-layer solution have on the leading cloud companies? I imagine it could influence user decisions on what underlying services to use. Could that middle mesh player reduce pricing for certain bundles, undercutting marketing efforts by the cloud players themselves?

Farooq: The data mesh layer will trigger massive innovation on the edge and 5G native (not cloud native) applications, middleware, and infra-architectures. This will drive the large providers to rethink their product roadmaps, architecture patterns, go-to-market offerings, partnerships, and investments.

VentureBeat: If the cloud companies see this coming, do you think they'll be more inclined to move toward an open ecosystem more rapidly and squelch you?

Farooq: The big providers in a first or second cycle of evolution of a technology or business model will always want to build a moat and lock in enterprise clients. For example, AWS never accepted that hybrid or multi-cloud was needed. But in the second cycle of cloud adoption by VMWare clients, VMWare started to preach an enterprise-outward hybrid cloud strategy connecting to AWS, Azure, and Google.

This led AWS to launch a private cloud offering (called Outposts), which is a replica for the AWS footprint on a dedicated hardware stack that has the same offerings. AWS executes its API across AWS public and Outposts. In short, they came around.

The same will happen to edge, 5G, and distributed computing. Right now, AWS, Google, and Azure are building their distributed computing platforms. However, the power of the open source community and the innovation speed is so great, the distributed computing architecture in the next cycle and beyond will have to move to an open ecosystem.

VentureBeat: What about lock-in at the mesh-layer level? If I choose to go with Hypergiant so I can access services across clouds, and then a competing mesh player emerges that offers better prices, how easy is it to move?

Farooq: We at Hypergiant believe in an open ecosystem, and our go-to-market business model depends on being at the intersection of enterprise consumption and provider offerings. We drive consumption economics, not provider economics. This will require us to support multiple data mesh technologies and create a fabric for interoperation with a single interface to our clients. The final goal is to ensure an open ecosystem, developer, and operator ease, and value to enterprise clients so that they are able to accelerate their business and revenue strategies by leveraging the best value and the best breed of technologies. We are looking at this from the point of view of the benefits to the enterprise, not the provider.