The utopian promise and dystopian potential of real-time detection of police, fire, and medical emergencies

In 2014, John Garofolo went to Baltimore to visit Lt. Samuel Hood of the Baltimore Police Department. Garofolo was previously head of Aladdin, a program in the Office of the Director of National Intelligence to automate analysis of a massive number of video clips. Garofolo began hosting workshops with members of the AI research community to promote multi-camera tracking systems in 2012. Then the Boston Marathon bombing happened in 2013, and Garofolo joined the White House Office of Science and Technology Policy to continue that work. This research focus led him to visit Baltimore to see the CitiWatch network of 700 cameras in action.

Garofolo said what he saw was horrifying: video of a woman falling into the harbor, where she drowned. Nobody saw the surveillance footage of her fall in time to rescue her.

"They had video of it the next day, but they didn't know what to look at. If they had known what to look at, she would be alive now," he said. "And so I thought, 'We can make technology that can start to address some of these issues -- where people are having emergencies -- and make it easier for [human] monitors to look at the right video and move from more forensic use of that video to real-time use for emergency response.'"

That's why Garofolo helped create the Automated Streams Analysis for Public Safety (ASAPS) Challenge. The two-year challenge is based on a large data set being assembled by the federal government to encourage people in the computer vision community to build AI that delivers automated insights for emergency operators working with police, fire, and medical personnel.

Computer-aided dispatch software that emergency operators use today often shows specific information, like reported emergency events, location of emergency service vehicles, and some forms of data visualization. But the goal is to soon enable emergency operators to spot emergencies in action and dispatch police, fire, or medical services. To train AI systems to do this, ASAPS sprinkles events like assaults, medical emergencies, and structure fires into a series of image, audio, and text data created by the U.S. Department of Commerce's National Institutes of Standards and Technology (NIST) and its contractors.

As part of the ASAPS data set creation process, in July 150 people participated in staged emergencies at the Muscatatuck Urban Training Center (MUTC). The participants included 19 stunt actors and 14 public safety officials, Garofolo said. MUTC is located in Butlerville, Indiana. Typically used for military training, MUTC is the largest urban training facility for the Department of Defense in the U.S. In-person staged emergencies produced footage for roughly 30 video cameras and contributed images and video to a collection of up to 15,000 social media posts in the data set.

ASAPS also includes simulated gunshot detection, text from emergency dispatch entries, and more than 50 hours of radio transmissions and 911 calls recorded by actors and actresses. All of the emergencies are set in a mock 200-acre town. The data set is entirely fabricated or staged to give challenge participants a full range of flexibility, NIST R&D program manager Craig Connelly told VentureBeat.

The full data set of synthetic and real emergency events is scheduled to be released this fall. A first look will be shared with challenge participants at virtual workshops scheduled to take place September 23-24.

ASAPS is also unique because it challenges AI practitioners to create systems that can take data from a range of sources and decide whether an emergency is in progress. Garofolo said ASAPS is the largest data set created for live video analysis.

"There's nothing out there like this right now. All of the challenges out there basically use canned data, and the entirety of the data is presented to the systems so that they can look at everything before they make a decision," he said. "I doubt that we will completely solve it in the two years of the program. That's a very short amount of time. But I think that we will create a seed for the growth of this technology and an interest in the community in real-time, multimodal analytics."

The ASAPS data set was assembled by NIST, a federal agency that does things like analyze facial recognition systems. NIST has developed a plan for federal agencies to create standards for AI systems in concert with private entities.

The ASAPS challenge involves a set of four separate contests: The first two focus on analyzing the time, location, and nature of emergencies, while the last two aim to surface information for first responders in emergency operations centers. To win, teams must design a system with a confidence level of prediction appropriate for bringing an event to the attention of a human operator without raising too many false alarms.

"It's a little bit like the game of Clue," Garofolo said. "You run around the board and you have to make a strategic decision about when you declare that you think you know what the answer is. If they declare it too soon and they're wrong, they'll get dinged on the metric. If they declare it much later than other participants, they won't get as high a score on the metric."

Savior or dystopian surveillance state?

AI that calls for help if you're attacked in the street or your home is on fire sounds like a dream, but AI that tracks people across multiple camera systems and sends police to your location could be a dystopian nightmare.

Black Lives Matter protests that started in June and continue today are historic in their size and reach. A policy platform created by Black community organizations calls for a reduction in the surveillance of Black communities and recognition of the role surveillance plays in systematic racism. But you don't have to think far beyond Baltimore to understand how potential applications of AI like the kind ASAPS is looking to produce could raise concern.

AI has already been used in Baltimore for more than finding people who fall into the harbor. CitiWatch doesn't just use city-owned cameras installed in public places but also cameras from partners like Johns Hopkins University and even those owned by private businesses or citizens.

When protests and civil unrest broke out in Baltimore following the death of Freddie Gray in police custody in 2015, law enforcement used numerous forms of surveillance, such as cell phone tracking tech and Geofeedia for monitoring people on Facebook, Instagram, and Twitter. Working in tandem with CitiWatch cameras on the ground, a surveillance plane flew over the city. In a lawsuit filed earlier this year to stop police use of Aerial Investigation Research (AIR), the ACLU called the program "the most wide-reaching surveillance dragnet ever employed in an American city."

Police also used facial recognition to identify people from camera footage and social media photos. Former House Oversight and Reform committee chair Rep. Elijah Cummings (D-MD) said use of facial recognition at protestors and evidence of discriminatory bias in facial recognition systems were part of the reason he decided to call a series of Congressional hearings last year to discuss facial recognition regulation. According to a NIST study, facial recognition systems are more likely to misidentify Asian Americans and people with darker skin tones than they are white people.

Democrats and Republicans have decried use of facial recognition at protests or political rallies for its potentially chilling effect on people's constitutional right to free speech. But in recent weeks, police in Miami and New York have used facial recognition to identify protesters accused of crimes. Further inflaming fears of a mounting surveillance state, predictive policing from companies like Palantir used in cities like Los Angeles and New Orleans have been shown to demonstrate racial bias. Globally, projects like Sidewalk Labs in Toronto and the deployment of Huawei 5G smart city solutions to dozens of nations around the world have also sparked concerns about surveillance and the spread of authoritarianism.

Garofolo said facial recognition and license plate reading were purposely kept out of the challenge, due to privacy concerns. He also said he's already been approached by a surveillance company that wants to use ASAPS, but he turned down the request. Indeed, NIST requires challenge participants to only use the data for emergency analysis. Participants can track individuals across multiple cameras but are unable to identify their faces.

"We've gone to great pains to preserve privacy and the challenge. We realize that, like any technology, it can be used for good or bad. We need to start to see policy developed for the use of these technologies. That's beyond what we're doing in ASAPS, but I think ASAPS will illustrate the challenge, and hopefully we will get some good discussion about it," Garofolo said.

However, even if anonymized, an AI system that views an alleged assault caught on camera, for example, could increase the likelihood that a person of color comes into contact with police.

As we've seen this week when Jacob Blake was shot in the back seven times in Wisconsin, any scenario that puts people into contact with police can be deadly, especially for Black people. A Northeastern University study released earlier this year found that Black people are twice as likely to die from police shootings as white people are.

There's also the risk of mission creep, in which surveillance technology acquired for one purpose is later used for another. The most recent examples come from San Diego, where smart street lamps were initially supposed to be used for gathering traffic and environmental data. Then police started requesting access to footage -- first only for serious, violent crimes, but eventually for smaller infractions, like illegal dumping. The San Diego Police Department put policy in place to prohibit application of facial recognition or license plate readers from being used on camera footage, but they also requested video from Black Lives Matter protests.

The San Diego City Council is now considering whether to create a privacy advisory commission or enact a formal surveillance technology adoption policy that would review the adoption of new tech and government officials' use of existing tech. Surveillance technology review policies haven't yet become commonplace for city governments, but major California cities Oakland and San Francisco adopted such laws in 2018 and 2020, respectively.

China, computer vision, and surveillance systems

Garofolo started promoting use of multi-camera surveillance systems at conferences like the Computer Vision and Pattern Recognition (CVPR) in 2012. (CVPR is one of the largest annual AI research conferences in the world, according to the AI Index 2019 report.) To move toward a goal of promoting ASAPS among members of the computer vision community, Garofolo and Connelly joined the AI City Challenge workshop at CVPR in June.

The AI City Challenge was created to solve traffic operations challenges with AI and make smart public transportation systems. One 2020 challenge, for example, focuses on the detection of stalled cars or traffic accidents on the freeway. Roughly 30 teams participated in the inaugural challenge in 2017. This year saw 800 individual researchers on 300 teams from 36 nations; 72 teams ultimately submitted final code.

AI City Challenge has always been an international competition that welcomes teams from around the world. But since its launch, virtually all of the winning teams have been from China and the United States. Teams from the University of Washington and University of Illinois took top honors in 2017. In 2018, a University of Washington team took first place in two of three competitions, with a team from Beijing University in second place. This year, a team from Carnegie Mellon University won a single competition, but teams from Chinese universities and companies like Baidu won three out of four contests, and Chinese teams captured most runner-up spots, as well.

Garofolo said he believes the 2020 AI City Challenge results make "a statement in terms of where we are in terms of our competitiveness in the U.S. You go to CVPR and you can see that a great [number] of the minds in the workforce in AI are now coming from overseas. I think that's an important issue that concerns all of us. And so ASAPS is hopefully going to provide one of many different research venues for American scientists and American organizations to be competitive," Garofolo said.

ASAPS challenges award up to $150,000, and since the prize money comes from the U.S. government, participating teams must be led by an individual, business, or university from the United States.

Researchers have made headlines in recent months as tensions mount between China and the U.S. Disputes over researcher activity led to the closure of a Chinese embassy in Texas, and Republicans in Congress have criticized Microsoft and Google in the past year for allegedly working with Chinese military researchers. Since the economy and China are key issues for the Trump 2020 reelection campaign, similar disputes may continue to emerge in the months ahead.

But despite tech nationalism on the political stage, cooperation between researchers has continued. At the close of the AI City Challenge workshop, organizers said they're considering a competition involving live video analysis that would be more like ASAPS.

The ASAPS challenge will take place over the next two years. Security for edge devices and privacy considerations for emergency detection challenges could motivate future challenges with the data set, Garofolo said.

Savior or dystopian surveillance state?

China, computer vision, and surveillance systems

More