AI Weekly: Can AI predict labor market trends?

Perhaps more than any other part of the economy, the labor market is unpredictable. In June, U.S. weekly jobless claims increased unexpectedly even while layoffs eased, to offer one recent example. The challenges lie in accounting for anomalies like the pandemic, which caused the White House last year to cancel its summer 2020 release of updated economic projections. It's particularly difficult to get a real-time handle on hiring versus firing, especially when layoffs are temporary.

But a company called ThinkWhy claims its labor market prediction platform, LaborIQ, can deliver superior accuracy by tapping AI. Leveraging a model created from 20-year time series datasets of labor information for U.S. cities, ThinkWhy says its system learns to identify key performance attributes specific to job roles, which are analyzed by economists and analysts to arrive at results for over 20,000 job titles across U.S. metros.

Whether AI can be used to accurately predict labor market trends, however, is an open question. As with all models, data issues can throw estimates off track, and biases can emerge from setting algorithms to learn from historical examples. Models are also limited to the bounds of an observed period, losing predictive power the further they attempt to glimpse into the future.

Biases and data challenges

ThinkWhy, which provides five-year job salary forecasts as well as supply and demand volatility metrics, says it mitigates bias by using a training set spanning data inclusive of gender, ethnicity, and age. The dataset is "double curated," ensuring it doesn't drop key features in the training set. The company also uses "blind" survey records to prevent the training data from producing predictions based on influencing variables, like the fact that women are underpaid in certain industries.

But Nik Dawson, a senior data scientist at FutureFit AI, a labor market intelligence firm, notes that data limitations can come in many flavors, such as representativeness, size, regularity of updates, and variety. For example, because the U.S. Bureau of Labor Statistics uses a random sample of households to arrive at its jobs numbers, it might obtain a bad draw that isn't very representative of the population from which the sample was taken. Moreover, employers don't always report accurate numbers, sometimes counting workers as being on the payroll even if their hours or pay were minimal.

"The social cachet of economic models -- at least in policy circles -- has been high for a good century now. [But] the mythos of AI, as somehow 'beyond subjectivity and questioning,' seems like it's going to intensify the dependence on models and the authority of those who build them, which makes it more difficult to address problems they have," University of Washington AI researcher Os Keyes told VentureBeat via email. "There will be problems, because ... they need data, and if you talk to basically any economist about labor market modeling, you'll hear that before even discussing the 'best' models in some ideal universe, the issue is what data can you actually access and what are those variables a proxy for? Models are very lossy ... it's still difficult to get a grasp of individual-level decision-making and less quantifiable factors in labor decisions."

Dawson says the demand-side data fed into predictive models can come from real-time job ads, which reveal skills, education, experience levels, precise locations, and other factors employers are considering. But job ads require a lot of pre-processing using natural language processing (NLP) and even computer vision, so they're not readily available. A broader range of sources populates the supply-side, including employment statistics, census collections, occupational surveys, household surveys, anonymized resumes, and online professional profiles. These aren't without flaws either. Because of time constraints and other confounders, the Census Bureau regularly undercounts populations in certain regions of the country. Some studies show, for example, that the undercount for Black men is much higher than the net undercount rate for the total male population.

"The challenge with predicting anomalies is simply that they're hard to predict! An anomaly is something that deviates from the norm. So, when you train machine learning models on historic data, the future predictions are a product of that past information," Dawson said. "This is [especially] problematic when 'black swan' events occur, like COVID-19 ... Supply-side data are important for understanding what's actually going on with workers, but they're lagging indicators -- it takes time for the data to reflect the crises that have occurred."

ThinkWhy says that it began testing its models against "new historical precedents" when the pandemic hit, as massive swings in the labor market took place. (In April, the U.S. unemployment rate rose to 14.7%, up from 10.3% in March -- the largest over-the-month increase in history.) The company's economists update parameters to reflect changes to conditions associated with market factors that affect salary and labor supply and demand.

"AI can assist in the predictive modeling but does not permit a 'hands-off approach' to the final outcomes," ThinkWhy chief technology officer David Kramer told VentureBeat via email. "The ability for AI to process massive amounts of data and produce quantitative output reduces the probability of error and provides clarification of the key predictive characteristics that feed the final prediction sets. But AI has some very specific difficulties in intuition modeling that limits its ability to replace the human plus machine intelligence methodologies."

Looking toward the future

As Dawson notes, the risks are high when it comes to bias in labor market predictions. In HR settings, prejudicial algorithms have informed hiring, career development, and recruitment decisions. There are ways to help address the imbalances -- for example, by excluding sensitive information like race, gender, and sexual orientation from training datasets. But even this isn't a silver bullet, as these characteristics can be inferred from a combination of other features.

Even Kramer admits it'll be some time -- he predicts 10 to 15 years -- before some of the inherent limitations in machine learning can be overcome in the labor trends prediction domain. "The models and characteristics of deep learning that permit [it] to be used reliably have yet to be developed," he said. "It will be several years before enough data and the cycle of 'fail and fix' in deep learning systems permit the replacement of humans."

Dawson believes that, despite their flaws, AI models may be superior to traditional economic models, if only because they're more sophisticated in their approach. Conventional models apply statistical techniques to economic theories, which works well for many tasks but can poorly represent economic activity. For example, many models assume people are rational, profit-maximizing agents seeking employment at the highest wage. While there's a degree of truth in that assumption, people make employment decisions for a range of reasons, he points out, such as skills, values, location, and family situations.

"It's in this high-dimensional complexity where I think AI can dramatically improve predictions and decision-making, especially in career planning," Dawson said.

AI has already been applied with some success to the study of taxation. Last April, Salesforce released the AI Economist, a research environment for understanding how AI could improve economic design. Leveraging a system of rewards to spur software agents to identify tax policies, the AI Economist is a two-level, deep reinforcement learning framework that simulates how people might react to taxes. While each agent in the simulation earns money, an AI planner module learns to enact taxes and subsidies, ultimately promoting certain global objectives.

During experiments, Salesforce says the AI Economist arrived at a more equitable tax policy than a free-market baseline, the U.S. federal single-filer 2018 tax schedule, and a prominent tax framework called the Saez tax formula.

While a Ph.D. candidate at the University of Technology Sydney, Dawson himself demonstrated that AI could be -- at least in theory -- used to predict skill shortages in labor markets with reasonable accuracy. He and coauthors compiled a dataset of both labor demand and labor supply occupational data in Australia from 2012 to 2018, including data from 7.7 million job advertisements and 20 official labor force measures. They used the data as explanatory variables and employed a classifier to predict yearly skills shortages for 132 different occupations. The models were about 83% accurate when measured by their chosen metric, Dawson and colleagues claimed.

Dawson said he's optimistic about what reinforcement learning might add to the mix of labor market predictions. Not only does it better reflect how job mobility actually occurs, but it also lessens the risks of bias and discrimination in job predictions because it’s less reliant on aggregated historic training data, he asserts.

"[Reinforcement learning is a] goal-oriented approach, where an agent (say, an individual looking for a job) navigates their environment (e.g. job market) and performs actions to achieve their goal (e.g. takes a course to upskill for a target career)," Dawson said. "As the agent interacts with their environment, they learn and adjust their actions to better achieve their goal; they also respond to an environment that dynamically adjusts (e.g. a labor market crisis). This approach balances 'exploitation' of an individual's current state (e.g. recommending jobs strongly aligned with their skills and previous occupations) with 'exploration' of new paths that are different to an individual's state (e.g. recommending jobs that are new career paths)."

For AI coverage, send news tips to Kyle Wiggers -- and be sure to subscribe to the AI Weekly newsletter and bookmark our AI channel, The Machine.

Thanks for reading,

Kyle Wiggers

AI Staff Writer