Join Transform 2021 this July 12-16. Register for the AI event of the year.


Starting this summer, Cisco’s Webex will begin serving up insights for video calls to a select group of individuals, teams, and organizations. Engagement insights include things like how often you had your video on or showed up on time and the people or teams within an organization you speak with most often.

The goal, Cisco VP Jeetu Patel told VentureBeat in a phone interview, is to make video calls better for people living in the hybrid world between in-person meetings in the office and virtual meetings at home. The tricky part, he said, is considering what information is useful for an individual to know while not raising concern that Webex is, for example, alerting managers to employees who are routinely late to meetings.

“Let’s say you did 12 meetings today, and in six of those meetings with four people or fewer, you actually spoke for 90% of the time. That would be a really bad thing to give your boss but a really good thing for you to have so you can say, ‘Oh, I should probably do a better job listening,'” he said. “The privacy on that front is not at the organizational level. It’s at the individual level. So when we provide insights like that to an individual, the individual owns the data, not the organization, because we don’t believe that without your explicit permission, you’d want to have your boss see that.”

Webex has introduced a series of new features in recent months, some powered by artificial intelligence, to change how people share information in video calls. Toward this end, Patel said, “We’ve probably invested about a billion dollars or so in the past two years in AI.”

Above: Individual insights

Gesture recognition means people in video calls can now raise their hand to ask to speak or give a thumbs up or thumbs down to register feedback. Another AI-powered feature on the way will crop the faces of people who attend in-person meetings for anyone who’s working remotely.

“Even though there are three people sitting in a conference room, we’ll actually break the stream into three separate boxes and show it to you, and our hardware will actually do that,” Patel said.

Patel has overseen the acquisition of three companies since joining Cisco last summer, after serving as chief product officer at Box. Last month, Cisco closed its acquisition of IMImobile for $730 million, in part to beef up its AI capabilities. Last summer, Cisco announced plans to acquire BabbleLabs, an AI startup focused on filtering audio so the sound of someone doing dishes, a lawnmower running, or other loud background noise can be reduced or eliminated. And earlier this year, Cisco acquired Slido, a startup that makes engagement features for video calls, like word clouds and upvoting questions. Such features can allow a meeting to take the structure of a town hall, with transparency around the top questions for employees within an organization, since everyone can see the questions being posted.

But Patel acknowledges that there are limits to how far the technology should go.

“Engagement should not be measured based on having a judgment on someone saying, ‘I’m judging that you look sad, and therefore I’m going to do certain things’ … at that point in time, in my mind, you could cross a boundary where there’s more bad that can come out of that than good,” he said.

In 2019, Cisco acquired Voicea to power speech-to-text transcription of meetings. Closed captioning and live translation are also available in Webex calls.

Deciding where to draw the line on which AI-powered features or insights to introduce in video calls can be a challenge. Earlier this year, Microsoft Research did a study with AffectiveSpotlight on AI for recognizing confusion, engagement, and head nods in meetings. If taken in the aggregate, picking up cues from the audience could be really helpful, particularly for large organizations. But if affective AI for video calls leads to a critique of how often a person smiles or has a certain expression on their face, it could be considered invasive or counterproductive or even biased toward groups of people.

Video analysis of expression today can have major shortcomings. A group of journalists in Germany recently demonstrated that placing a bookshelf in the background or wearing glasses can change affective AI evaluations of a person in a video.

And it shouldn’t matter whether a person is an extrovert or prefers not to talk in group settings as long as they fulfill their job duties. Some people talk a lot but have nothing much say, while others speak less often but deliver sharp insights or sage advice. It all depends on the team, role, and scenario.

Monitoring such information also raises the question of consent.

“There’s a fine line between ‘This is super productive’ and ‘We can’t do this because it violates my privacy or it’s just outright creepy,'” Patel said.

Cisco plans to roll out Webex People Insights globally over the next year, starting with select users in the U.S. this summer. The company announced the news today as part of its Cisco Live virtual event. In other Cisco Live news, on Tuesday the company announced plans to combine networking, security, and IT infrastructure offerings and to work with the Duo authentication platform it acquired in 2018.

VentureBeat

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member