Hive's cloud-hosted machine learning models draw $85M

While cloud computing continues to gain favor, only a limited number of companies have embraced machine learning based in the cloud. Hive wants to change this by allowing enterprises to access hosted machine learning models via APIs.

Hive has had particular success in the area of content moderation, thanks to its deep learning models that help companies interpret unstructured data, like images, videos, and audio. But it's also expanding into areas like advertising and sponsorship measurement as it seeks to find other areas that would benefit from intelligent automation.

In an interview with VentureBeat, Hive CEO Kevin Guo said the company kept relatively quiet as it sought to prove its models work. But its growth has started to accelerate, and the company is getting ready to make more noise.

"Now that we have enough tracking points and we have over 100 customers, we are quite confident what we have in the market does actually work," Guo said. "Now we're ready to scale up."

Investors are also excited by what they see. Today, Hive announced it has raised $85 million in funding over two rounds that put its total raised at $121 million and bring its valuation to $2 billion. Glynn Capital led a $50 million round, which followed a $35 million round the company had not previously disclosed.

An unexpected journey

Guo said that when the company was founded in 2014, he and cofounder Dmitriy Karpman were at Stanford studying computer vision. They originally began building consumer apps that used AI to improve things like content recommendations.

But along the way, they encountered issues around content moderation and couldn't find models that solved them. As they started building a solution, other companies heard about it and asked if they could try it. By the end of 2017, the company had become enterprise-focused and the current incarnation of Hive was born.

Even then, Guo said the founders took a slow and steady approach. They continued to deploy the service to partners who are now using it to monitor every piece of content shared by users. If Hive spots a problem, something like a video stream can be instantly shut down. Guo said Hive has fewer false positives than some alternatives on the market, which lowers the risk of blocking a non-violating piece of content.

"If your models are inaccurate and you're banning 30% of your users' content incorrectly, that's a real problem," Guo said.

Humans vs. machines

Guo said the key to Hive's accuracy is the massive amount of data fed into the models as they have been developed. To do that, Hive has built a distributed workforce of data labelers.

"They have fed in now billions of human judgments," Guo said. "And that's what makes this model work so well. At this point, our clients basically view that our models are pretty much at or even above human accuracy, which is why they can trust [them] 100% and use [them] in real-time in production."

Hive claims its models have been trained on 1 billion pieces of human-labeled data, the largest such public dataset. That allows Hive to screen for 40 classes of content categories, such as sexual content, violence, and hate speech.

This work has put the company in the middle of the debate over supervised versus semi-supervised and unsupervised learning. Guo said the right answer really depends on the nature of the company's service. But for Hive, the human element is essential.

"There's nothing quite like humans, truthfully," he said. "Humans are really good at generating labeling data, finding patterns, and solving hard problems. And so that's ground truth data for models. We've always believed that human training is necessary. Our approach has been generally to stick with [the] tried and true path of using humans to train our models, first and foremost."

The ability to offer the service as an API means clients just have to drop a few lines of code into their service to be up and running, Guo said. That ease of use has helped adoption.

According to Guo, the company has seen 300% revenue growth over the past year. Customers include NBCUniversal, Interpublic Group, Reddit, Walmart, Visa, Anheuser-Busch InBev, Comscore, and Cognizant.

Future plans

Hive intends to use the funding to continue development of the company's cloud-based deep learning models. It also plans to invest in building out its sales and marketing teams.

"Up until now, we've really been operating with a bit more of a conservative mindset," Guo said. "We didn't want to over-invest in sales and marketing until we knew our product [worked]. It took a while. It takes time to prove these models out."

An unexpected journey

Humans vs. machines

Future plans

More