Uber releases Ludwig 0.2 with audio and speech improvements, plus Comet.ml and BERT integration

Roughly five months following the debut of Ludwig, Uber's open source and no-code deep learning toolkit, the ride-hailing company today detailed improvements with the latest version: Ludwig 0.2. Among them are new tools and over 50 bug fixes, plus Comet.ml integration, the addition of Google's BERT natural language model, and support for new feature types including audio, speech, geospatial, time, and date.

"The simplicity and the declarative nature of Ludwig’s model definition files allows machine learning beginners to be productive very quickly, while its flexibility and extensibility enables even machine learning experts to use it for new tasks with custom models," wrote Uber engineers Piero Molino, Yaroslav Dudin, and Sai Sumanth Miryala. "Members of the broader open source community contributed many of new features to enhance Ludwig's capabilities."

Support in Ludwig 0.2 for Comet.ml, a utility that facilitates AI code and experiment management, enables automatic monitoring of models from a unified dashboard. From customizable panels, users can compare experimental designs, capture model configuration changes, and record test results and details while charts track live training performance.

As for BERT, a language model that's able to quickly train on a relatively small corpus of data to obtain cutting-edge performance, it's now included in Ludwig's list of available encoders. The blog authors note that it can be used as a form of pretraining or transfer learning to train models to perform text-based tasks like classification or generation.

In other news, audio and speech features are now available in Ludwig -- they support applications such as speaker identification and automatic speech recognition. Uber's H3 -- a spatial indexing system that helps to identify regions in satellite imagery at different levels of granularity -- is now supported, enabling developers to feed such data to Ludwig models directly. And on the date and timestamp front, Ludwig now lets users input events that happened on specific days or at specific times to obtain predictions about them.

Ludwig 0.2 also introduces the ability to serve trained AI models into the platform's core library, and it adds Italian, Spanish, German, French, Portuguese, Dutch, Greek, and multi-language tokenization courtesy the newest version of the open source spaCY NLP library. Image and numeric features have been improved thanks to the addition of parameters for both preprocessing and prediction, and import performance has been boosted by an average of 50%.

The Ludwig development team's work isn't done yet. In the coming months, they plan to overhaul Ludwig's preprocessing pipeline to support Petastorm, Uber’s open source data access library for deep learning, to allow it to train on petabytes of data stored in Hadoop or Amazon S3. They also intend to explore an optimization policy that might obtain better-performing models with less effort and to add cutting-edge encoders for all feature types, including multivariate time series, vectors, and point clouds. Finally, they say they're working to integrate Ludwig with Snorkel, a system for programmatically building and managing training data sets.

Ludwig 0.2's debut follows the release of Uber's Pyro in 2017, a deep probabilistic programming language built on Facebook’s PyTorch machine learning framework. And it comes as no-code AI development tools -- like Baidu’s EZDL and Microsoft's AI model builder -- continue to gain steam.

More