The number of people of all ages who are visually impaired is estimated to be 285 million globally, of whom 39 million are blind. In an effort to provide assistance where it can, Microsoft in 2016 launched Project Tokyo, a partnership among researchers in the U.S., U.K., China, Japan, and India to explore technologies that might help those with impairments interact with the world around them. Four years later, it’s borne fruit with a device — a modified version of Microsoft’s HoloLens augmented reality headset — imbued with algorithms that provide info about people within the wearer’s surroundings.

According to a blog post published by Microsoft, the research group began by following athletes and spectators with varying levels of vision on a trip from the U.K. to the 2016 Paralympic Games in Rio de Janeiro, Brazil, observing how they interacted with other people as they navigated airports, attended sporting venues, and went sightseeing, among other activities. Machine learning experts on the Project Tokyo team then developed the aforementioned algorithms, which run on graphical processing units housed in a PC connected to a HoloLens from which the front lenses have been removed.

An LED strip affixed above the HoloLens’ band of cameras tracks the person closest to the user and turns green when said person has been identified, in order to let communication partners or bystanders know they’ve been seen or to cue them to move out of the device’s field of view. One computer vision model detects the pose of people in the environment, providing a sense of where and how far away they are. Another analyzes footage from the headset’s camera to recognize people and determine if they’ve opted to make their names known to the system.

Project Tokyo

Above: An early version of the Project Tokyo system.

Image Credit: Microsoft

All of this information is relayed to the wearer through audio cues. For example, if the modified HoloLens detects a person one meter away on the user’s left side, it’ll play a click that sounds as though it’s coming from roughly that distance to the left. If it recognizes the person’s face, it’ll play a “bump” sound, and if it spots a person that’s known to the system, it’ll announce their name. A separate, second layer of sound resembling a stretching elastic band guides the user’s gaze toward the person’s face.

When the HoloLens’ camera focuses on the person’s nose, the user hears a high-pitched click and, if the person is known to the system, their name. Users can alternatively ask for an overview and get a spatial readout of all the names of people who’ve given permission to be recognized by the system, and they’re alerted with a spatialized chime when someone’s looking directly at them.

Microsoft says it’s using a scaled-down version of the tech to help blind and low-vision children develop social interaction skills.

Project Tokyo, which is still ongoing, follows on the heels of efforts like Microsoft’s Seeing AI, a mobile app designed to help low- and impaired-vision users navigate the world around them. More recently, the tech giant debuted Soundscape, a navigation app that uses binaural audio to help visually impaired users build mental maps and make personal route choices in unfamiliar spaces.

Through AI for Accessibility, which was announced in May 2018, Microsoft pledged $25 million over five years for universities, philanthropic organizations, and others developing AI tools for those with disabilities.

The program aims to reward the most promising cohort of candidates in three categories — work, life, and human connections — with seed grants and follow-on financing each fiscal quarter.


The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here