MosaicML debuts inference service to make generative AI deployment affordable

California-based MosaicML, a provider of generative AI infrastructure, has launched a fully-managed inference service to help enterprises easily and affordably deploy generative AI models.

The offering comes as the demand for large language models (LLMs) continues to grow across industries. According to MosaicML, it can make it possible to serve LLMs for up to 15 times less than other comparable services in the market.

The launch expands MosaicML’s capabilities, making it a complete tool for generative AI training and deployment. Prior to this, the company had largely focused on providing the software infrastructure for training generative AI models.

MosaicML inference: How does it help?

Given the rise of LLMs like ChatGPT, enterprises have grown eager to implement generative AI capabilities into their applications and products. However, owing to the privacy challenges (data going to third party) and high costs involved with building and deploying such models, the task has not exactly been a cakewalk.

With the new inference service, MosaicML is simplifying deployment by giving enterprises the option to either query their own custom-built LLMs or a curated selection of open-source models, including Instructor-XL, Databricks' Dolly, GPT NeoX and MosaicML foundation series models.

At the core, the service includes two separate tiers: starter and enterprise. The starter tier offers open-source models curated and hosted by MosaicML as API endpoints for easy starts when adding generative AI to applications. They can be deployed as is.

The enterprise tier goes a step further, allowing teams to deploy any model they want, including custom ones developed to address specific use cases in their own network (VPC). This way, inference data never leaves the secured environment of the user's infrastructure, ensuring full privacy and security.

And, it saves money

More importantly, with its low latency and high hardware utilization capabilities, MosaicML Inference can also be several times cheaper at deploying models than other comparable offerings.

In a cost assessment, MosaicML said the starter edition of its inference service hosted curated text completion and embedding models for four times less than OpenAI’s offering, while the enterprise tier was found to be 15 times cheaper. All measurements were taken on 40GB NVIDIA A100s with standard 512-token input sequences or 512x512 images, the company added.

_{Cost performance of starter and enterprise tiers MosaicML inference}

While MosaicML didn't share the names of the companies using the new inference service, CO Naveen Rao did note that customers are already starting to witness results with the offering.

“A publicly traded customer of ours in the financial compliance space is using the MosaicML inference service to deploy their custom GPT trained from scratch on MosaicML," Rao told VentureBeat. "This customer experienced north of 10x inference savings compared to alternate providers. TCO (total cost of ownership) for their first model was less than $100,000."