Salesforce's AI Economist taps reinforcement learning to generate optimal tax policies

Salesforce today announced the AI Economist, a research environment designed to elucidate how economic design might be improved with techniques from the field of AI and machine learning. The goal is to help economists, governments, and others design tax policies that optimize not only productivity and conservation, but that promote widespread, whole-country social equality.

Studies have shown that income inequality gaps can negatively impact economic growth, economic opportunity, and even health. For example, over-taxation can discourage people from working, leading to lower productivity. But it's difficult to experiment with tax policies in the real world, at least in part because economic theory relies on stylized assumptions that are tough to validate, like people's sensitivity to taxes.

The AI Economist, then, learns the best tax policies from simulations in which citizens and a government adapt and learn. Moreover, it algorithmically compares the evolution of economies both in parallel and at scale, avoiding assumptions about the skill or behavior of workers while optimizing for desired social outcomes.

The AI Economist

Salesforce chief scientist Richard Socher and the rest of the AI Economist development team -- among them senior research scientist Stephan Zheng, lead research scientist Nikhil Naik, and research scientist Alex Trott -- collaborated with David Parkes, who leads research at the interface between economics and computer science at Harvard's Laboratory for Innovation Science, to arrive at the system's theoretical foundations. As they explain in a technical paper, the AI Economist is a two-level, deep reinforcement learning framework that uses a system of rewards to spur software agents to identify tax policies.

Building it was easier said than done. Classic tax theory focuses on people who earn income by performing labor, gaining utility from income but incurring the cost of labor effort. People are assumed to differ in their skill level, such that low-skilled workers are less productive and earn less money than high-skilled workers for the same amount of labor. This leads to inequality, and the dilemma for governments is that while the redistribution of income might be preferred to improve equality, higher taxation can reduce the amount that people choose to work and may have a particularly strong effect on high-skilled workers.

Analytical frameworks for balancing equality and productivity have been proposed, but those models are applicable only to simple and static environments. Other work has studied dynamic systems, but it often simplifies assumptions in order to attain analytical solutions.

The agents comprising the AI Economist are designed to simulate how real people might react to different taxes, by contrast. They occupy a two-dimensional grid-world called Gather-and-Build in which they collect resources and earn coins by building houses of stone and wood. Agents can trade with other agents to exchange their resources for coins, where "exchange" in this context refers to an agent signaling the number of coins they're willing to accept or pay for units of resources. Additionally, agents can move around the environment to gather resources from populated resource tiles, which remain empty after they're harvested until new resources spawn.

Agents earn some number of coins for constructing a house, which requires exactly one unit of wood and one unit of stone. (Theoretically speaking, the coins earned through construction reflect the value the market places on the agent's house, while the total quantity of coins reflects the value created by the agents' collective labor.) The number of coins earned per house depends on the skill of the agent, and skill -- which is determined by a multiplier on the default number of coins earned from building a house and the probability of gaining bonus resources when harvesting -- is different across agents.

Agents start at different initial locations on the map, a perturbation that's intended to drive economic inequality and specialization in the simulation. Over the course of an episode consisting of 10 tax periods of equal length, the agents accumulate labor cost, which reflects the amount of effort associated with the actions -- moving, gathering, trading, and building -- taken by the agent. The rewards the agents receive in the end depend on the accumulated coin and accumulated labor; tax is collected at the end of each period and redistributed according to the model, at which point a new tax schedule is set (more on that later).

While each agent in the simulation earns money by collecting, trading resources, and building houses -- all the while learning to maximize their utility or happiness by adjusting movement and behaviors -- an AI planner module (the economist) learns to enact taxes and subsidies to promote certain global objectives. Concretely, the planner learns a tax schedule analogous to the way in which U.S. federal income taxes are described. Taxes are computed by applying a tax rate to each part of an individual's income that falls within a tax bracket, and the income brackets are fixed across tax policies and learn the tax rate for each bracket so that each agent faces the same rates and bracket cutoffs.

The planner also incorporates a social welfare function that considers the trade-off between income equality and productivity, where "equality" is defined as the complement of an index on the distribution of wealth (in other words, the cumulative number of coins owned by an agent after taxation and distribution). As it does all this, the agents learn to "game" the function and tax schedule to lower their effective tax rate, in part by exploiting loopholes like alternating between tax periods with high and low incomes.

The AI planner and agents engage in this fiscal tug-of-war -- each self-improving in their abilities -- until a semblance of stability is achieved. In the course of a single experiment, millions of years of economies are simulated.

This predictably leads to interesting behaviors. For instance, low-skill agents largely focus on collecting wood and stone, respectively, while higher-skill agents focus on building houses. The low-skill agents earn their income by selling resources to the higher-skilled agents, who choose to earn income through building, while the highest-skill agents build several houses early on before switching to solely collecting and selling.

Insights like these can be used to discover novel tax frameworks, notes Trott, and to study how existing frameworks can reduce inequality and improve productivity. "The AI Economist is a first step in broadening the application of [reinforcement learning] to areas with the most potential for positive impact," he said. "Our hope is that [it] can empower economists to make informed policy decisions, and in the future, politicians can use the tool to optimize for a specific social objective, like helping the middle class."

Experiments

To evaluate the AI Economist's performance, the team adopted a two-phase training approach. In the first phase, a collection of agent models was trained for 20 million steps without any taxes applied -- a sort of "free-market" scenario -- to net models well-adapted to the general environment dynamics. In the second phase, training was resumed but with one of the studied tax models active, so that the fraction of agent incomes per bracket were roughly aligned with those in the U.S. economy.

The AI Economist's performance was compared with three baseline policies: free-market, the U.S. federal single-filer 2018 tax schedule, and a prominent tax framework called the Saez tax formula. In experiments, it achieved a 16% gain improvement over Saez, and a 47% gain compared with the free-market policy at an 11% decrease in productivity. Redistribution improved equality across all policies at the cost of productivity.

Versus the progressive U.S. tax rates, the AI Economist recommended a blend of progressive and regressive tax schedules leading to higher subsidies (negative taxes) for low-income agents. In particular, it set a higher top tax rate on income above 510 coins, a lower tax rate for incomes between 160 and 510 coins, and both higher and lower tax rates on incomes below 160 coins.

Real-world experiments

To explore whether the AI Economist's policies might improve outcomes in simulations with people who earn real money, the Salesforce coauthors recruited subjects based in the U.S. through Amazon Mechanical Turk. They built a two-dimensional world to mimic Gather-and-Build -- one containing resources, but with trading disabled and with the cost of building a house set to 50% higher -- and instructed the subjects to use a web-based interface to move characters in the environment.

Around 100 subjects were tasked with completing jobs consisting of a sequence of four five-minute episodes for a total of 130 games. Each received $5 base pay and a variable bonus of at most $10, with the bonus proportional to the utility (i.e., number of coins) achieved reflecting the post-tax income and the labor cost at the end of each episode.

The researchers acknowledge the limitations of the human study -- for instance, subjects tended to engage in adversarial behaviors like blocking other people, and they had different strategies that affected their payoff and hence implied skill. But nonetheless, Socher and team found that a "camelback" tax schedule informed by the AI Economist had an equality-productivity trade-off comparable to Saez, with better equality-productivity performance than the U.S. and free-market approaches, and that significantly outperformed all the baselines for social welfare.

"The AI-driven tax model did not require knowledge of economic theory, did not require that we estimate the tax elasticity of labor, and was nevertheless able to learn a well-performing tax policy for use with human participants tabula rasa," Socher and colleagues concluded in the paper. "We were able to apply the model without requiring recalibration of tax rates: the only calibration was to scale down the income brackets by a factor of three to adjust for the relative productivity of human and AI agents, and enable all income brackets to be exercised ... The encouraging transfer performance suggests there is potential for building AI-driven tax models that can find application to the real world."

Future directions

Beyond speeding up experiments with proposals for tax systems and offering the ability to test ideas that come from economic theory, Socher believes that the AI Economist holds promise for more complex scenarios, like navigating the economic aftermath of COVID-19. To test that hypothesis and to promote future research, Salesforce plans to make both the AI Economist environment and sample training code available for a finite period of time.

"Currently, the AI Economist is solely focused on taxes," said Socher. "However, we think [reinforcement learning] is promising for economics ... Economic simulations can factor in human behavior by using real-world, human data. Together with our ... algorithms, this could lead to AI-designed economic policies that could help accelerate real-world economic recovery. We are already thinking of ways to approach this and encourage researchers thinking about this to reach out to us."

Broadly speaking, both Socher and Trott characterize the AI Economist as one of the stronger demonstrations of reinforcement learning's practical applications. While the AI technique has been employed by Uber, Google, Alphabet's DeepMind, OpenAI, Microsoft, Tencent, and others to great effect in the video and board game domains, as well as in fields like robotics and autonomous vehicles, Socher in particular asserts that the real-world benefits remain somewhat elusive.

"Reinforcement learning has made a number of breakthroughs through game-playing -- think [DeepMind's] AlphaGo for example. However, in the end, games are just games -- when chess was 'solved,' the rest of the world didn't really change that much afterwards," said Socher. "If instead of playing games, we as AI researchers focus on improving the realism and scale of these economic simulations and the abilities of the AI agents and the AI economist to improve the overall outcomes we can have a lot of positive impact."

Of course, history has proven that AI is no silver bullet where predictions about social outcomes and policies are concerned. A recent study found that machine learning models, when used to predict six life outcomes for children, parents, and households, weren't very accurate even when trained on 13,000 data points from over 4,000 families. Even the best of over 3,000 models were only marginally better than linear regression and logistic regression, which don't rely on any form of machine learning.

That's perhaps why in the paper, Socher and team explicitly caution against applying the AI-generated "camelback" schedule in a real economy. But Naik says that as a theoretical tool used ethically with sound scientific judgment, the AI Economist could give economists and governments unprecedented modeling capabilities to augment research. And for what it's worth, companies like Amazon appear to be on the same wavelength -- scientists at the tech giant earlier this year revealed that they're applying AI and machine learning to calculate inflation rates.

"Economists have previously relied on theorems, but theorems require simple math and are predicated on people behaving rationally. Our world today is getting more complex and economic theories of the future need to be able to seamlessly incorporate additional requirements such as environmental protection," Naik said. "AI helps to model such complexity and a broad spectrum of behaviors ... We want to partner with more economists and governments to help them run simulations on the AI Economist."