Video conferencing company Touchcast uses AI to add context to conversations

Video conferencing is increasingly becoming a commodity as technology giants like Microsoft and Google incorporate the feature into their free services. Touchcast is staying a step ahead of the giants by innovating on AI-powered services for premium users.

Special effects are important, but the key differentiator lies in creating more context to drive the next wave of communication, Touchcast CEO Edo Segal told VentureBeat. Touchcast is doing so by taking advantage of Nvidia Maxine, a software development kit for creating GPU powered applications. The SDK includes various primitives for things like AI powered background removal, simulating eye contact, and measuring body pose in sports.

"The fact that a company like Nvidia, the leader in AI powering hardware, has the foresight to invest in the research and development on the conceptual and software side helps companies like Touchcast accelerate time to market and focus on building on the shoulders of giants," said Segal.

Nvidia Maxine sets a new baseline of capabilities from which to innovate. "It allows us to focus on other areas where there is still no work being done as we chart this frontier," Segal said.

Better image effects

One big goal is to reduce the effort involved in creating quality events. Live presenters can be virtually teleported into mixed reality sets without a green screen. Live semantic segmentation uses AI to separate a person from the background in high quality, making it possible to automatically place people in a mixed reality set. "This literally used to take days or weeks of work and rendering and is now done live," Segal said.

Neural upscaling can clean a basic webcam image and scale it to an ultra-HD 4K screen. This works in a similar way to an artist asked to paint a mural from a small picture by intuiting how they might fill in the missing parts. Another new feature called auto framing can keep a speaker centered in the view even when they move.

The age of inference

Words can be automatically transcribed, translated, and dubbed into multiple languages. Maxine allows all of this to occur in a fraction of a second so that the audio appears in sync with the speaker. Another new feature is the ability to break up a video and better organize it with summaries, table of contents, and short-form articles. A talk can be broken down by themes and have machine-generated titles and descriptions for each section.

"Humanity has long lost its ability to commit to long-form content, and by creating this AI article view, we allow the viewer to skim the content quickly in the same way you might do with a blog post," Segal said.

Segal is also excited about the potential for semantic vector search to help bring new context to content discovery. "We believe that the next generation of search and discovery will evolve to ambient streams of information that are contextualized to the task you are performing," he said. He has been working on this problem for decades and wrote about it in 2009.

Semantic vector search works more like the human associative memory system rather than traditional Boolean keyword searches. It starts by translating content into concepts into a multi-dimensional space such that closely related concepts are represented closer to each other.

Video conferencing is a crowded market, but Segal believes it is still growing because the idea of what constitutes a communications platform is also expanding. Previous advances focused on better compression and noise reduction algorithms, but they didn't do much to help people make sense of the material being communicated. Segal is excited about features that aren't easy to see but that help make information more accessible, such as how neural networks can instantly add context and curate what we communicate to make information better and more relevant.

These innovations will usher in "the age of inferences" that could increase comprehension, accessibility, and insight, Segal said.

Better image effects

The age of inference

More