Ambient's computer vision detects dangerous behaviors, but raises privacy concerns

Computer vision, a technology that uses algorithms to "see" and evaluate objects, people, and events in the real world, is a rapidly expanding market within the broader AI sector. That's because the applications are practically limitless, ranging from agricultural crop monitoring to medical diagnostics and driverless vehicle testing. Allied Market Research anticipates that computer vision product vendors will be worth a combined $144.46 billion by 2028.

Digital transformation in the enterprise, spurred by the pandemic, has further accelerated the growth. For example, Zebra Medical Vision, a computer vision startup focused on health care, was acquired in August 2021 in a deal worth $200 million. Meanwhile, Landing AI has secured tens of millions for its visual inspection dashboards that enable engineers to train, test, and deploy computer vision to devices such as laptops.

Another rising class of startups -- one focused on analyzing camera and sensor footage -- is attracting significant investment from institutional backers. Ambient is among the newest arrivals -- its computer vision software attempts to detect potentially dangerous situations to alert stakeholder. Launched in 2017, the Palo Alto, California-based company is emerging from stealth with $52 million in venture capital led by Andreessen Horowitz with participation from Y Combinator, Stanford, and others including Okta cofounder Frederic Kerrest, CrowdStrike CEO George Kurtz, and Microsoft CVP Charles Dietrich.

Computer vision for security

Ambient was cofounded by CEO Shikhar Shrestha, who was previously at Google helping the Project Tango team. Vikesh Khanna, the company’s CTO and other cofounder, worked at Dropbox building data analytics systems.

Ambient grew out of research Shrestha and Khanna did while at Stanford. Powered by what Shrestha calls a "context graph," the platform plugs into CCTV and sensor systems and assesses risk factors when looking at real-time or historical recordings -- namely different location contexts (e.g., the type of space and time of day), behaviors (the movement of an object and object interactions), and objects (people, vehicles, animals, and more).

"I founded Ambientin January 2017 alongside Khanna. However, the inspiration for Ambient came many years before," Shrestha told VentureBeat via email. "At 12 years old, I was robbed at gunpoint in a location that was monitored by a security camera. At the time, I was expecting a patrol officer to intervene, which never happened. From that experience, I learned that despite the ubiquity of security cameras in our world, few recordings of incidents lead to real-time response. It made me fascinated with security technology, tinkering with, designing, and building alarm and surveillance systems."

Shrestha asserts that Ambient's algorithms can identify threats like perimeter breaches and "tailgating" without facial recognition or profiling, as well as learn new behaviors and threats automatically over time. The platform captions videos' contents ranging from context about what’s taking place to individual actions, like saying "this is a busy street" or "there is a man walking."

"The four key components of the Ambient platform are video data processing; the detection of objects, events, and context; threat signature evaluation; and prioritization for human intervention," Shrestha said. "Ambient provides hundreds of threat signatures that customers can deploy out-of-the-box and we’re rapidly adding new threat signatures based on customer requests from the field. Today, we deliver ... over 100 threat signatures [and our funding] will enable us to build on that foundational library to quickly double the number of threat signatures that we deliver in the next year."

Ambient says it has processed over 20,000 hours of video from its customers, which it claims include five of the top 10 U.S. tech brands by market capitalization as well as "a number of" Fortune 500 companies.

"Our customers currently span a wide variety of industry verticals including education, finance, manufacturing, media and entertainment, retail, real-estate and residential security, and technology," Shrestha added. "We intend to expand our penetration of the enterprise market into a wide range of industries and types of buildings, from corporate campuses to datacenters, schools, and museums."

Potential challenges

Like most computer vision systems, Ambient's are trained on a combination of open source datasets and in-house generated images and videos showing examples of people, places, and things. The company claims that it takes steps to ensure that the dataset is sufficiently diverse, but history has shown that bias can creep into even the best-designed models.

For example, previous research has found that large, publicly available image datasets are U.S.- and Euro-centric, encoding humanlike biases about race, ethnicity, gender, weight, and more. Flaws can arise from other sources, like differences in sun paths between the northern and southern hemispheres and variations in background scenery. Studies show that particular camera models can cause an algorithm to be less effective in classifying objects that it was trained to detect. Even architectural design choices in algorithms can contribute to biased classifications.

These biases can lead to real-world harm. ST Technologies’ facial recognition and weapon-detecting platform was found to misidentify Black children at a higher rate and frequently mistook broom handles for guns. Meanwhile, Walmart’s AI- and camera-based anti-shoplifting technology, which is provided by Everseen, came under scrutiny over its reportedly poor detection rates. Facial recognition software used by the Detroit police falsely identified a Black man as a shoplifter. And Google’s Cloud Vision API at one time labeled thermometers held by Black people as "guns" while labeling thermometers held by light-skinned subjects as "electronic devices."

"This technology, which tends to involve object and behavior recognition, is far from accurate," Jennifer Lynch, surveillance litigation director at the Electronic Frontier Foundation, told Fast Company in a recent interview about gun-detecting AI technologies.

Ambient says that the data it uses to train its video processing algorithm is annotated using crowdsourcing services before being fed into the system. But labels, the annotations from which many computer vision models learn relationships in data, also bear the hallmarks of data imbalance. Annotators bring their own biases and shortcomings to the table, which can translate to imperfect annotations. For example, some labelers for MIT’s and NYU’s 80 Million Tiny Images dataset contributed racist, sexist, and otherwise offensive annotations, including nearly 2,000 images labeled with the N-word and labels such as "rape suspect" and "child molester."

In 2019, Wired reported on the susceptibility of platforms like Amazon Mechanical Turk -- where many researchers and companies recruit annotators -- to automated bots. Even when the crowdworkers are verifiably human, they’re motivated by pay rather than interest, which can result in low-quality data -- particularly when they’re treated poorly and paid a below-market rate. Being human, annotators naturally also make mistakes -- sometimes major ones. In an MIT analysis of popular benchmarks including ImageNet, the researchers found mislabeled images, like one breed of dog being confused for another.

Shrestha claims that Ambient's technology minimizes bias by taking a "system training" approach to computer vision. "System-level blocks" control which task an individual computer vision model is focused on and optimize the model for that narrow task, he says, so that a single model isn't making the final decision.

"[W]e’re breaking the problem down to system-level blocks which have very tightly described inferences. For example, [one] human interaction block can detect one of these 10 interactions, [while] this scene element block can detect one of these 20 scene elements," Shrestha added. "This architecture means that we are not asking data labelers to label based on unstructured assumptions. In our architecture, models have structured outputs associated with specific tasks. Examples would be: detect a person, a car, the color of a shirt, an interaction between people and a car. These structured outputs constrain the labeler appropriately so that they can not respond with an arbitrary label and bias the model."

Data privacy and surveillance

Anticipating that some customers might be wary of granting a vendor like Ambient access to CCTV footage, the company attempts to allay concerns in its terms of service agreement. Ambient reserves the right to use only "aggregated, de-identified data" from customers to improve, test, and market its services and claims that it doesn't use any sensitive customer data uploaded to its platform for these purposes.

"Our product has been architected from day one for data minimization. Essentially, this means that we eliminate personally identifiable information from our data collection efforts," Shrestha said. "Raw video data is not processed by Ambient computer vision algorithms. Instead, the algorithms only process raw footage metadata [and not] facial attributes, gender attributes, or identifiers of race. This comes with significant constraints. For example, we will not offer facial recognition analysis as part of our solution because it is impossible to deliver facial recognition without collecting and processing."

Ambient doesn't make it clear in its terms of service under what circumstances it'll release customer data, such as when requested by law enforcement or served a subpoena. The company also doesn't say how long it retains data -- only that the data "may be irretrievably deleted" if a customer's account is terminated.

"We are committed to working with our customers to ensure that their use of the product is consistent with the requirements of applicable privacy and data protection laws," Shrestha said. "We have strong technical controls in the product that limit both what the product can do and who has access to the product, [and] we’re committed to putting appropriate technical constraints in place in the interest of preventing potential harm."

It's not just users that might be concerned about Ambient's AI-powered technology. Privacy advocates worry that systems like it -- including from Umbo, Deep Sentinel, and other vendors -- could be coopted for less humanitarian intents, potentially normalizing greater levels of surveillance.

In the U.S., each state has its own surveillance laws, but most give wide discretion to employers so long as the equipment they use to track employees is visible or disclosed in writing. There’s also no federal legislation that explicitly prohibits companies from video recording staff during the workday.

"Some of these techniques can be helpful but there are huge privacy issues when systems are designed to capture identity and make a determination based on personal data," Marc Rotenberg, president of the Electronic Privacy Information Center, told Phys.org in an interview. "That's where issues of secret profiling, bias and accuracy enter the picture."

Computer vision for security

Potential challenges

Data privacy and surveillance

More