3 steps to building more useful computer vision

"Computer vision" has a decidedly sci-fi ring, so it’s no surprise the idea of smart devices becoming all-seeing tools has captured people's imaginations. Yet the true potential of the tech is held back by a widespread misunderstanding of what great applications look like and what they can achieve.

Where did computer vision come from? It’s the product of a proliferation of cheap, high-quality cameras, which has expanded the scope for imagery captured in public, private, and commercial domains. At the same time, advances in machine learning and deep learning technology are allowing us to transform those images into digital signals that support a wide range of actions. To start with, visual inspection tasks have included things like tracking stock levels or monitoring a production line. But the future of computer vision goes far beyond these basic applications.

1. Seeing across time and space

While current automation functions continue to be useful, new algorithms give us the chance to think of computer vision as diverse and adaptable, even more so than a human eye. Increasing clarity of vision backed up by the power of a brain to interpret and parse sight into action could enhance our ability to tackle complex problems.

The tech’s potential is expanding massively in three key areas.

Seeing the imperceptible is about seeing and interpreting more than our brain alone will allow. Researchers at MIT have been working on detecting a person’s pulse from standard video footage of their face. Computer vision measures very subtle changes in the tone and color of the skin, and the derived signal allows users to take a pulse without being near the person -- let alone needing to touch them. Users can apply this signal to distort the video image in real time and literally make the pulse visible. This has huge potential for ambient sensing, monitoring, and diagnosis in health care.
Seeing at scale is the ability to monitor and process enormous volumes of visual content. People who inspect flagged content for deletion are responsible for today’s social media moderation, but this approach is reactive and very limited in scale. It also puts those moderators at risk of psychological stress.
Seeing across space and time allows us to capture footage and observe features that would otherwise be impossible. Ecological surveys of wildlife typically cost a fortune and are time-consuming and difficult to complete. Computer vision is making it all easier, as practitioners use the technology to map deforestation and biomass reduction using aerial and satellite imagery. Remote camera traps are also helping count wildlife populations in very isolated locations. In industry, companies like Reconstruct Inc. monitor progress on their large building sites by combining autonomously captured footage with building information management systems. The insights can automatically generate progress plans and detect deviations or irregularities in the construction process or design.

2. Size up the data

What these examples indicate is how broad and mature the technology already is. But it may still feel like opportunities are limited to big tech companies, due to the sheer amount of data needed to make it work. Data is the fuel that feeds computer vision algorithms, so of course social media and web platform giants have a natural advantage when it comes to training their computer vision.

However, this shouldn’t stop others from getting started.

Many organizations fail to leverage the image data they already own. For instance, retailers sit on masses of CCTV footage that they usually only inspect after a security incident. If we apply computer vision to that footage, we could understand when queues start to build at checkouts, see when customers appear lost in the store, detect a missing child, or inform the redesign of store layouts.

If companies don’t have the benefit of existing image data, they can start to experiment with user-generated content. Google’s Quick Draw is a game-like experience that tries to guess what you’re sketching on your phone. In just a few months, the program generated 50 million doodles that Google can now harness to automatically interpret people’s hand-drawn scribbles.

Online fashion retailer Asos launched its “As seen on me” campaign that encourages users to upload fashion photos of themselves. While the initial motivation was customer engagement and loyalty, its potential for mass personalization through computer vision is clear.

These transitional data-generating services can play a key role in defining a product strategy that utilizes computer vision. For example, Google has now transformed the quick draw knowledge base into a tool that interprets and enhances your hand-drawn scribbles in real time.

Even without existing or user-generated data, off-the-shelf solutions are available that let you add computer vision to products and services.

3. Put human needs first

Issues around data collection only re-enforce the importance of putting the customer first. As users grow increasingly concerned about the ways companies are using their data, we cannot implement a powerful new technology like computer vision without a very clear understanding of the problem it is meant to solve.

The answer is to apply service design principles that address human needs first and that tap into technology’s great potential for good. It all starts with identifying pain points, patterns of behavior, and unmet needs to establish the context for introducing new digital tools into people’s lives.

Applications using this approach have turned computer vision into a tool to let gardeners identify flowers, allow homeowners to visualize the impact of daylight, and help all of us navigate in the dark. Computer vision is a genuinely exciting technology, and we’re only just beginning to discover its potential for the digital products and services that could fuel our future.

So, what’s your vision?

Connor Upton is a data design director for Fjord, design & innovation from Accenture Interactive.

1. Seeing across time and space

2. Size up the data

3. Put human needs first

More