Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

Seven years ago, Scott Stephenson was working as a postdoctoral researcher building detectors designed to detect dark matter, deployed deep under the surface of the Earth.

With the detectors the goal was to pull signals out of noise to help solve the mysteries of the universe. As part of the process, there was technology built to better understand sounds using machine learning techniques. It’s an approach that Stephenson figured had broader applicability for pulling meaning out of human speech, which led him to start up Deepgram in 2015.

Deepgram is taking a somewhat nuanced approach to building natural language processing (NLP) capabilities with its own foundation model that can execute transcription functions as well as summation and sentiment analysis from audio.

“We have our own foundation model, where this model can be used to achieve several goals from audio,” Stephenson said.


Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.


Register Now

Those goals include building out customized models for specific use cases and industry verticals. To help Deepgram achieve those goals, the company today announced that it has raised $47 million in funding to help continue to build out its technology and go to market efforts.

How Deepgram builds it NLP technology

The market for NLP and voice transcription technologies today is increasingly crowded with consumer services like Otter and large vendors including AWS, Google and IBM all providing services.

Stephenson said his company’s technology is built with a series of deep learning techniques including convolutional neural networks (CNN), recurrent neural networks (RNN) and transformers. The models that Deepgram have built are trained on audio waveforms to pull meaning from the spoken word.

Deepgram has also built out its own data labeling technologies and workflow for being able to identify what is being said in an audio file and how it can be classified. From a continuous innovation perspective, Deepgram is taking a self-supervised approach to reinforcement learning to help its NLP models improve over time.

“The model is aware of when it doesn’t know something, but it still will give you an answer,” Stephenson said. 

Those answers that the model isn’t entirely confident about get logged. The Deepgram platform includes both automated elements as well as human data scientists that will review the uncertain item to suggest further training within a specific vertical or area of expertise to help update the model.

Sentiment analysis might still struggle with sarcasm

A key challenge that faces transcription and NLP tools is the capability to actually understand the tone of the speaker with sentiment analysis.

A common way that sentiment analysis is done today is purely with text. For example if negative words are used in a review, the overall sentiment is not considered to be positive. With the spoken word, negative sentiment isn’t just about words, it’s also about tone.

“The easy version of supporting sentiment is to only look at the words but, of course, as humans with a couple of microphones in our head, we know that tone matters,” Stephenson said.  

Being able to understand users’ frustration is important for accurate sentiment analysis. The Deepgram system uses what Stephenson referred to as “acoustic cues” in order to understand the sentiment of the speaker and it is a different model than what would be used for just text-based sentiment analysis.

While the Deepgram system can better determine sentiment than text-based methods alone, detecting sarcasm can be a little trickier.

“If you ask an American to figure out if somebody is being sarcastic or not, we can usually do a pretty good job,” Stephenson said. “The models are not tuned for that yet; I wouldn’t say that’s because of the expressiveness of the models, though, that really just has to do with the data labeling and the demand of customers asking for it.”

Stephenson said that if there were enough users that wanted to be able to more accurately detect sarcasm and would be willing to pay for it, the technology would likely be developed faster. Either way, he expects that NLPs ability to detect sarcasm accurately is likely to come within the next five years.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.