Building, monitoring, and improving machine learning systems is no walk in the park, no matter the circumstances. Data scientists and engineers have to monitor fine-grained quality and diagnose errors in sophisticated apps, not to mention field contradictory or incomplete corpora. To ease the development burden somewhat, Apple developed Overton, a framework intended to automate AI system lifecycles by providing a set of novel high-level abstractions. Given the query “How tall is the president of the United States,” for example, Overton generates a model capable of supplying an answer. (It only supports text processing currently, but Apple is prototyping image, video, and multimodal apps.)
Apple researchers say that Overton has been used in production to support “multiple applications” in both near-real-time and back-of-house processing, and in that time, Overton-based apps have answered “billions” of queries in multiple languages and processed “trillions” of records. “[The] vision is to shift developers to … higher-level tasks instead of lower-level machine learning tasks. [E]ngineers can build deep-learning-based applications without writing any code,” wrote the coauthors of a research paper describing Overton. “Overton [can] automate many of the traditional modeling choices, including deep learning architecture … and [it allows engineer] … to build, maintain, and monitor their application by manipulating data files.”
Overton takes as input a schema containing two elements: data payloads, which describe the input data used to train new or existing AI models, and model tasks, which describe the tasks the model needs to accomplish. Furthermore, the schema defines the input, output, and coarse-grained data flow of the target machine learning model, illustrating not what the model computes but effectively how it computes it.
Overton compiles the schema into many versions of AI development frameworks like Google’s TensorFlow, Apple’s CoreML, or Facebook’s PyTorch, and it then performs a search for the appropriate architecture and hyperparameters (tunable variables that directly affect how well a model trains). On the monitoring side, Overton lets engineers provide tags associated with individual data points, indicating which should be used for training, testing, and development.
Overton employs other useful techniques like model slicing, which lets users identify subsets of the input data critical to the product and use them as a guide to increase representation and minimize bias. Additionally, it natively supports multitask learning, such that Overton predicts all of a model’s tasks (e.g., part-of-speech tagging or typing) concurrently.
Apple researchers say that in qualitative testing, Overton reduced errors 1.7 to 2.9 times versus production systems.
“In summary, Overton represents a first-of-its kind machine-learning lifecycle management system that has a focus on monitoring and improving application quality,” wrote the paper’s coathors. “A key idea is to separate the model and data, which is enabled by a code-free approach to deep learning. Overton repurposes ideas from the database community and the machine learning community to help engineers in supporting the lifecycle of machine learning toolkits.”
In many respects, Overton is merely another take — albeit a highly scalable one — on the raft of “auto ML” tools published by the likes of Uber, Facebook, and others. Databricks just last month launched a tool kit for model building and deployment, which can automate things like hyperparameter tuning, batch prediction, and model search. IBM’s Watson Studio AutoAI — which debuted in June — promises to automate enterprise AI model development, as does Microsoft’s recently enhanced Azure Machine Learning cloud service and Google’s AutoML suite.
But it’s a rare look at the inner workings of a company that’s been reluctant to pull back the curtains on its AI and machine learning research. With any luck, the Overton paper and last week’s Siri disclosures signal the start of a flood of publications.