Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. Watch now.


The latest generation of artificial intelligence (AI) models, also known as transformers, have already changed our daily lives, taking the wheel for us, completing our thoughts when we compose an email or answering our questions in search engines. 

However, right now, only the largest tech companies have the means and manpower to wield these massive models at consumer scale. To get their model into production, data scientists typically take one to two weeks, dealing with GPUs, containers, API gateways and the like, or have to request a different team to do so, which can cause delay. The time-consuming tasks associated with honing the powers of this technology are a main reason why 87% of machine learning (ML) projects never make it to production. 

To address this challenge, New York-based Hugging Face, which aims to democratize AI and ML via open-source and open science, has launched the Inference Endpoints. The AI-as-a-service offering is designed to be a solution to take on large workloads of enterprises — including in regulated industries that are heavy users of transformer models, like financial services (e.g., air gapped environments), healthcare services (e.g., HIPAA compliance) and consumer tech (e.g., GDPR compliance). The company claims that Inference Endpoints will enable more than 100,000 Hugging Face Hub users to go from experimentation to production in just a couple of minutes. 

Hugging Face Inference Endpoints is a few clicks to turn any model into your own API, so users can build AI-powered applications, on top of scalable, secure and fully managed infrastructure, instead of weeks of tedious work reinventing the wheel building and maintaining ad-hoc infrastructure (containers, kubernetes, the works.),” said Jeff Boudier, product director at Hugging Face.   

Event

Intelligent Security Summit

Learn the critical role of AI & ML in cybersecurity and industry specific case studies on December 8. Register for your free pass today.

Register Now

Saving time and making room for new possibilities

The new feature can be useful for data scientists — saving time that they can instead spend working on improving their models and building new AI features. With their custom models integrated into apps, they can see the impact of their work more quickly.

For a software developer, Inference Endpoints will allow them to build AI-powered features without needing to use machine learning. 

“We have over 70k off-the-shelf models available to do anything from article summarization to translation to speech transcription in any language, image generation with diffusers, like the cliché says the limit is your imagination,” Boudier told VentureBeat. 

So, how does it work? Users first need to select any of the more than 70,000 open-source models on the hub, or a private model hosted on their Hugging Face account. From there, users need to choose the cloud provider and select their region. They can also specify security settings, compute type and autoscaling.  After that, a user can deploy any machine learning model, ranging from transformers to diffusers. Additionally, users can build completely custom AI applications to even match lyrics or music creating original videos with just text, for example. The compute use is billed by the hour and invoiced monthly.  

“We were able to choose an off the shelf model that’s common for our customers to get started with and set it so that it can be configured to handle over 100 requests per second just with a few button clicks,” said Gareth Jones, senior product manager at Pinecone, a company using Hugging Face’s new offering. “With the release of the Hugging Face Inference Endpoints, we believe there’s a new standard for how easy it can be to go build your first vector embedding-based solution, whether it be semantic search or question answering system.”

Hugging Face started its life as a chatbot and aims to become the GitHub of machine learning. Today, the platform offers 100,000 pre-trained models and 10,000 datasets for natural language processing (NLP), computer vision, speech, time-series, biology, reinforcement learning, chemistry and more.

With the launch of the Inference Endpoints, the company hopes to bolster the adoption of the latest AI models in production for companies of all sizes.  

“What is really novel and aligned with our mission as a company is that with Inference Endpoints even the smallest startup with no prior machine learning experience can bring the latest advancements in AI into their app or service,” said Boudier. 

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.