What is computer vision (or machine vision)?

The process of identifying objects and understanding the world through the images collected from digital cameras is often referred to as “computer vision” or “machine vision.” It remains one of the most complicated and challenging areas of artificial intelligence (AI), in part because of the complexity of many scenes captured from the real world.

The area relies upon a mixture of geometry, statistics, optics, machine learning and sometimes lighting to construct a digital version of the area seen by the camera. Many algorithms deliberately focus on a very narrow and focused goal, such as identifying and reading license plates.

Key areas of computer vision

AI scientists often focus on particular goals, and these particular challenges have evolved into important subdisciplines. Often, this focus leads to better performance because the algorithms have a more clearly defined task. The general goal of machine vision may be insurmountable, but it may be feasible to answer simple questions like, say, reading every license plate going past a toll booth.

Some important areas are:

Best applications for computer vision

While the challenge of teaching computers to see the world remains large, some narrow applications are understood well enough to be deployed. They may not offer perfect answers but they are right enough to be useful. They achieve a level of trustworthiness that is good enough for the users.

How established players are tackling computer vision

The large technology companies all offer products with some machine vision algorithms, but these are largely focused on narrow and very applied tasks like sorting collections of photos or moderating social media posts. Some, like Microsoft, maintain a large research staff that is exploring new topics.

Google, Microsoft and Apple, for example, offer photography websites for their customers that store and catalog the users’ photos. Using facial recognition software to sort collections is a valuable feature that makes finding particular photos easier.

Some of these features are sold directly as APIs for other companies to implement. Microsoft also offers a database of celebrity facial features that can be used for organizing images collected by the news media over the years. People looking for their “celebrity twin” can also find the closest match in the collection.

Some of these tools offer more elaborate details. Microsoft’s API, for instance, offers a “describe image” feature that will search multiple databases for recognizable details in the image like the appearance of a major landmark. The algorithm will also return descriptions of the objects as well as a confidence score measuring how accurate the description might be.

Google’s Cloud Platform offers users the option of either training their own models or relying on a large collection of pretrained models. There’s also a prebuilt system focused on delivering visual product search for companies organizing their catalog.

The Rekognition service from AWS is focused on classifying images with facial metrics and trained object models. It also offers celebrity tagging and content moderation options for social media applications. One prebuilt application is designed to enforce workplace safety rules by watching video footage to ensure that every visible employee is wearing personal protective equipment (PPE).

The major computing companies are also heavily involved in exploring autonomous travel, a challenge that relies upon several AI algorithms, but especially machine vision algorithms. Google and Apple, for instance, are widely reported to be developing cars that use multiple cameras to plan a route and avoid obstacles. They rely on a mixture of traditional cameras as well some that use structured lighting such as lasers.

Machine vision startup scene

Many of the machine vision startups are concentrating on applying the topic to building autonomous vehicles. Startups like Waymo, Pony AI, Wayve, Aeye, Cruise Automation and Argo are a few of the startups with significant funding who are building the software and sensor systems that will allow cars and other platforms to navigate themselves through the streets.

Some are applying the algorithms to helping manufacturers enhance their production line by guiding robotic assembly or scrutinizing parts for errors. Saccade Vision, for instance, creates three-dimensional scans of products to look for defects. Veo Robotics created a visual system for monitoring “workcells” to watch for dangerous interactions between humans and robotic apparatuses.

Tracking humans as they move through the world is a big opportunity whether it be for reasons of safety, security or compliance. VergeSense, for instance, is building a “workplace analytics” solution that hopes to optimize how companies use shared offices and hot desks. Kairos builds privacy-savvy facial recognition tools that help companies know their customers and enhance the experience with options like more aware kiosks. AiCure identifies patients by their face, dispenses the correct drugs and watches them to make sure they take the drug. Trueface watches customers and employees to detect high temperatures and enforce mask requirements.

Other machine vision companies are focusing on smaller chores. Remini, for example, offers an “AI Photo Enhancer” as an online service that will add detail to enhance images by increasing their apparent resolution.

What machine vision can’t do

The gap between AI and human ability is, perhaps, greater for machine vision algorithms than some other areas like voice recognition. The algorithms succeed when they are asked to recognize objects that are largely unchanging. People’s faces, for instance, are largely fixed and the collection of ratios of distances between major features like the nose and corners of eyes rarely change very much. So image recognition algorithms are adept at searching vast collections of photos for faces that display the same ratios.

But even basic concepts like understanding what a chair might be are confounded by the variation. There are thousands of different types of objects where people might sit, and maybe even millions of examples. Some are building databases that look for exact replicas of known objects but it is often difficult for machines to correctly classify new objects.

A particular challenge comes from the quality of sensors. The human eye can work in an expansive range of light, but digital cameras have trouble matching performance when the light is lower. On the other hand, there are some sensors that can detect colors outside the range of the rods and cones in human eyes. An active area of research is exploiting this wider ability to allow machine vision algorithms to detect things that are literally invisible to the human eye.