Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more
Amid the boom of AI in application building, companies face a significant data-labeling problem, especially when it comes to labeling images or other media content they want to train deep learning algorithms on.
Today data-labeling and infrastructure provider Scale AI launched a service called Scale Rapid that aims to solve this problem by labeling a data sample within one to three hours. Users can review the work to make sure the labeling is being done correctly, iterate upon their labeling instructions if necessary, and then ramp up to have Scale AI label the rest of their dataset.
This is the latest in a series of products Scale AI has launched in the last year as it seeks to maintain its leadership in the labeling sphere. In April, the company raised $325 million, bringing its total raised to over $602 million. Scale AI says it has surpassed $100 million in annual recurring revenue and is tracking to double year-on-year growth. Its $7.3 billion valuation tops the known public value of most competitors, which include Labelbox, Hive, Snorkel AI, Mighty AI, Appen, Tasq.AI, Cloud Factory, Samsource, and SupperAnnotate.
Data-labeling process workloads
Some companies boast access to massive armies of contractors who stand ready to label data, but Scale AI chief technology officer Brad Porter said he does not see anyone promising the same quality guarantees and speed Scale Rapid offers.
Companies building AI applications usually do one of two things, Porter said. They either use an existing dataset that has already been labeled but tends to be stale data and not easy for new applications to adapt to or they choose Mechanical Turk, Appen, or another third-party labeling service that employs individuals to label data for the company.
Scale AI’s competitors may provide a labeling workflow tool, but it can take weeks to set up an internal process that ensures the labeling is completed accurately, as well as being done in a way that enables AI models to work correctly. Typically, companies engaging in this area of work have to take responsibility for ensuring the data-labeling quality themselves. However, Scale Rapid is designed to ensure high-quality results by completely managing the labeling process from beginning to end, Porter said.
How does Scale Rapid work?
When a machine learning (ML) researcher or developer begins a labeling process for a dataset, they write instructions for how they want the data labeled. The instructions can be for various tasks, like labeling what is in an image, annotating an audio clip, or determining whether a content review is positive or negative. The developer then uploads 10 to 50 examples of the data to ensure the labelers are following the instructions correctly.
Scale AI says it gets those results back in one to three hours and allows the developer to make sure quality thresholds are being met. If not, the developer can then submit 10-50 more samples. Once a developer has confirmed that the instructions are being followed correctly, they can upload 500-1000 images and scale from there.
Scale AI has a labor source of more than 100,000 labelers, according to Porter. The company determines whether a task requires expert labelers and helps avoid shortcomings found in some popular labeling processes, like consensus voting. In consensus voting, a labeling task might be sent to five people and the majority result is taken as the valid label. The problem is that the majority can be wrong. For example, if the task requires someone to differentiate between a crow and grackle, four out of five labelers might mistake a grackle for the more commonly known crow. So Scale AI brings in what it calls “expert spotters.” It then tries to automate the labeling process with ML.
Scale AI reports swift adoption of Scale Rapid
Scale AI reports strong adoption of Scale Rapid during the tool’s early-access private beta period, with more than 750,000 tasks already completed for customers that include SpaceX, Cornell, Epson, Adobe, Square, and TimberEye. (Scale AI recently published a case study from TimberEye.)
Scale AI’s advantage, Porter says, lies in its origins labeling data in the autonomous vehicle industry. The company’s 24-year-old founder and CEO, Alexandr Wang, dropped out of MIT and began building a lidar labeling tool to meet extremely rigorous labeling standards. As Scale AI grew to serve other industries, it took its labeling experience with it, offering companies service-level agreements (SLAs) to guarantee quality.
Last year, the company pivoted to assist companies with data needs at every stage of the AI development lifecycle — from data annotation to data debugging, model improvements, and fully managed services. Scale AI currently covers multiple industries and serves hundreds of customers, including Brex, OpenAI, the U.S. Army, SAP, Etsy, and PayPal.
VentureBeatVentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more