Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More
California-based MosaicML, a provider of generative AI infrastructure, has launched a fully-managed inference service to help enterprises easily and affordably deploy generative AI models.
The offering comes as the demand for large language models (LLMs) continues to grow across industries. According to MosaicML, it can make it possible to serve LLMs for up to 15 times less than other comparable services in the market.
The launch expands MosaicML’s capabilities, making it a complete tool for generative AI training and deployment. Prior to this, the company had largely focused on providing the software infrastructure for training generative AI models.
MosaicML inference: How does it help?
Given the rise of LLMs like ChatGPT, enterprises have grown eager to implement generative AI capabilities into their applications and products. However, owing to the privacy challenges (data going to third party) and high costs involved with building and deploying such models, the task has not exactly been a cakewalk.
Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.
With the new inference service, MosaicML is simplifying deployment by giving enterprises the option to either query their own custom-built LLMs or a curated selection of open-source models, including Instructor-XL, Databricks’ Dolly, GPT NeoX and MosaicML foundation series models.
At the core, the service includes two separate tiers: starter and enterprise. The starter tier offers open-source models curated and hosted by MosaicML as API endpoints for easy starts when adding generative AI to applications. They can be deployed as is.
The enterprise tier goes a step further, allowing teams to deploy any model they want, including custom ones developed to address specific use cases in their own network (VPC). This way, inference data never leaves the secured environment of the user’s infrastructure, ensuring full privacy and security.
And, it saves money
More importantly, with its low latency and high hardware utilization capabilities, MosaicML Inference can also be several times cheaper at deploying models than other comparable offerings.
In a cost assessment, MosaicML said the starter edition of its inference service hosted curated text completion and embedding models for four times less than OpenAI’s offering, while the enterprise tier was found to be 15 times cheaper. All measurements were taken on 40GB NVIDIA A100s with standard 512-token input sequences or 512×512 images, the company added.
While MosaicML didn’t share the names of the companies using the new inference service, CO Naveen Rao did note that customers are already starting to witness results with the offering.
“A publicly traded customer of ours in the financial compliance space is using the MosaicML inference service to deploy their custom GPT trained from scratch on MosaicML,” Rao told VentureBeat. “This customer experienced north of 10x inference savings compared to alternate providers. TCO (total cost of ownership) for their first model was less than $100,000.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.