Watch all the Transform 2020 sessions on-demand right here.


Supervised learning is a more commonly used form of machine learning than reinforcement learning in part because it’s a faster, cheaper form of machine learning. With data sets, a supervised learning model can be mapped to inputs and outputs to create image recognition or machine translation models. A reinforcement learning algorithm, on the other hand, must observe, and that can take time, said UC Berkeley professor Ion Stoica.

Stoica works on robotics and reinforcement learning at UC Berkeley’s RISELab, and if you’re a developer working today, then you’ve likely used or come across some of his work that has built part of the modern infrastructure for machine learning. He spoke today as part of Transform 2020, an annual AI event hosted by VentureBeat that this year takes place online.

“With reinforcement learning, you have to learn almost like a program because reinforcement learning is actually about a sequence of decisions to get a desired result to maximize a desired reward, so I think these are some of the reasons” for greater adoption, he said. “The reason we saw a lot of successes in gaming is because with gaming, it’s easy to simulate them, so you can do these trials very fast … but when you think about the robot which is navigating in the real world, the interactions are much slower. It can lead to some physical damage to the robot if you make the wrong decisions. So yeah, it’s more expensive and slower, and that’s why it takes much longer and is more typical.”

Reinforcement learning is a subfield of machine learning that draws on multiple disciplines which began to coalesce in the 1980s. It involves an AI agent whose goal is to interact with an environment to learn a policy to maximize on a reward task. Achieving a task reward function reinforces what actions or policy the agent should follow.

Popular reinforcement learning examples include game-playing AI like DeepMind’s AlphaGo and AlphaStar, which plays StarCraft 2. Engineers and researchers have also used reinforcement learning to train agents to learn how to walk, work together, and consider concepts like cooperation. Reinforcement learning is also applied in sectors like manufacturing, to help design language models, or even to generate tax policy.

While at RISELab’s predecessor AMPLab, Stoica helped develop Apache Spark, an open source big data and machine learning framework that can operate in a distributed fashion. He is also creator of the Ray framework for distributed reinforcement learning.

“We started Ray because we wanted to scale up some machine learning algorithms. So when we started Ray initially with distributed learning, we started to focus on reinforcement learning because it’s not only very promising, but it’s very demanding, a very difficult workload,” he said.

In addition to AI research as a professor, Stoica also cofounded a number of companies, including Databricks, which he founded with other Apache Spark creators. Following a funding round last fall, Databricks received a $6.2 billion valuation. Other prominent AI startups cofounded by UC Berkeley professors include Ambidextrous Robotics, Covariant, and DeepScale.

Last month, Stoica joined colleagues in publishing a paper about Dex-Net AR at the International  Conference on Robotics and Automation (ICRA). The latest iteration of the Dex-Net robotics project from RISELab uses Apple’s ARKit and a smartphone to scan objects, which data is then used to train a robotic arm to pick up an object.