Data labeling is an arduous — if necessary — part of the AI model training process. Currently, it takes around 200-500 samples of annotated images for a model to learn to detect a single object. Fortunately, freely available tools help automate the most monotonous sub-tasks, and IBM has recently published a new one on GitHub. It’s part of the company’s Cloud Annotations project, which seeks to develop easy and collaborative open source image annotation tools for teams and individuals.

The new tool uses AI to help developers annotate data without having to manually draw labels on an entire data set of images. Simply selecting the “Auto label” button from the dashboard automatically labels uploaded image samples. And it’s backed by IBM Cloud Object Storage, which is optimized for data-hungry machine learning and cloud-native workloads.

https://twitter.com/bourdakos1/status/1201928317668089857?s=20

Here’s how to access and use the new Cloud Annotations tool:

  • Upload and label a subset of photos via the Cloud Annotations GUI.
  • Train a model following these instructions. The tool will use that model to label more photos.
  • Select “Auto label” in the GUI.
  • Review new labels.

A number of companies offer tools that automatically label images for the purpose of machine learning model training. In March 2019, Intel open-sourced Computer Vision Annotation Tool (CVAT), a toolkit for data labeling that’s deployed via Docker and accessed through a browser-based interface (or optionally embedded into platforms like Onepanel). Roughly a year before that, Google released Fluid Annotation, which leverages AI to annotate class labels and outline every object and background region in a picture.

It’s estimated that the data annotation tools market could be worth $1.6 billion by 2025, and some companies are already cashing in.

San Francisco-based Scale employs a combination of human data labelers and machine learning algorithms to sort through raw, unlabeled streams for clients like Lyft, General Motors, Zoox, Voyage, nuTonomy, and Embark. Supervisely operates on the same model: a combination of deep learning models and crowd collaboration. Sweden-based Mapillary creates a database of street-level images and uses computer vision technology to analyze the data contained in those images. And Austin, Texas-based Alegion, which in August 2019 raised $12 million in venture capital, provides a range of labeling and annotation services for enterprise data science teams.

Companies like DefinedCrown take a different tack. The three-year-old Seattle-based startup, which describes itself as a “smart” data curation platform, offers a bespoke model-training service to clients in customer service, automotive, retail, health care, and enterprise sectors.