In a pair of papers accepted to the International Conference on Learning Representations (ICLR) 2020, MIT researchers investigated new ways to motivate software agents to explore their environment and pruning algorithms to make AI apps run faster. Taken together, the twin approaches could foster the development of autonomous industrial, commercial, and home machines that require less computation but are simultaneously more capable than products currently in the wild. (Think an inventory-checking robot built atop a Raspberry Pi that swiftly learns to navigate grocery store isles, for instance.)
One team created a meta-learning algorithm that generated 52,000 exploration algorithms, or algorithms that drive agents to widely explore their surroundings. Two they identified were entirely new and resulted in exploration that improved learning in a range of simulated tasks — from landing a moon rover and raising a robotic arm to moving an ant-like robot.
The team’s meta-learning system began by choosing a set of high-level operations (e.g., basic programs, machine learning models, etc.) to guide an agent to perform various tasks, like remembering previous inputs, comparing and contrasting current and past inputs, and using learning methods to change its own modules. Sourcing from nearly three dozen operations in total, the meta-learning system combined up to seven at a time to create computation graphs describing the aforementioned 52,000 algorithms.
Testing all of the algorithms would have required decades, so the coauthors limited their search for the best by eliminating algorithms predicted to perform poorly based on their code structure. Then the team tested the most promising candidates on a basic grid-level navigation task that required substantial exploration but minimal computation. The performance of candidates that did well became the new benchmark, eliminating even more candidates as time went on.
According to the researchers, four machines searched for over 10 hours to find the best algorithms. Over 100 were high-performing, and the top 16 were both useful and novel, performing as well as (or better than) human-designed algorithms.
The team attributes the top 16 models’ performance to the two exploration functions they share. In the first, an agent is rewarded for visiting new places where it has a greater chance of making a move. In the second, an AI model learns to predict the future state of an agent while a second model recalls its past, and they work in tandem to predict the present such that if the prediction is erroneous, both reward themselves as a sign that they have discovered something new.
The researchers note that because the meta-learning process generates high-level computer code as output, both algorithms can be dissected to peer inside their decision-making processes. “The algorithms we generated could be read and interpreted by humans, but to actually understand the code we had to reason through each variable and operation and how they evolve with time,” said MIT graduate student Martin Schneider in a statement. He coauthored the study with fellow graduate student Ferran Alet and MIT professors of computer science and electrical engineering Leslie Kaelbling and Tomás Lozano-Pérez. “It’s an interesting open challenge to design algorithms and workflows that leverage the computer’s ability to evaluate lots of algorithms and our human ability to explain and improve on those ideas.”
Shrinking AI models
In the second of the two studies, an MIT team describes a framework that reliably compresses models so that they’re able to run on resource-constrained devices. While the researchers admit that they don’t understand why it works as well as it does, they claim it’s easier and faster to implement than other compression methods, including those that are considered state of the art.
The framework is an outgrowth of the “Lottery Ticket Hypothesis,” a paper showing that a model can perform well with 90% fewer elements if the right submodel is identified during training. The coauthors of this study — who not-so-coincidentally authored “Lottery Ticket Hypothesis” — propose “rewinding” a model to its earlier training state without any parameters (i.e., configuration variables internal to the model whose values can be estimated from the given data) before retraining it. Such pruning methods typically cause models to become less accurate over time, but this one manages to restore them to nearly their original accuracy.
That’s good news for the broader AI research field, whose accessibility and sustainability issues remain for the most part unresolved. Last June, researchers at the University of Massachusetts at Amherst released a study estimating that the amount of power required for training and searching a certain model involves the emission of roughly 626,000 pounds of carbon dioxide — equivalent to nearly 5 times the lifetime emissions of the average U.S. car. And according to a recent Synced report, the University of Washington’s Grover machine learning model, which is designed to both generate and detect fake news, cost $25,000 to train over the course of two weeks.
“I’m happy to see new pruning and retraining techniques evolve,” said MIT assistant professor Song Han, who built the industry-standard pruning algorithm AMC but wasn’t involved with this particular study. He recently coauthored a paper describing an AI training technique that improves efficiency with a large model comprising many pretrained submodels that can be tailored to a range of platforms. “[It will give] more people access to high-performing AI applications.”
MIT Ph.D. student Alexa Renda coauthored the work with MIT assistant professor and fellow Ph.D. student Jonathan Frankle. Both are members of MIT’s Computer Science and Artificial Science Laboratory (CSAIL).