LinkedIn's Al leader shares 3 traits of top data science talent

In a new interview with VentureBeat, Ya Xu, VP of engineering and head of data and artificial intelligence (AI) at LinkedIn, is more than happy to share her thoughts on everything from her passion for bringing science and engineering together to the top traits she looks for when interviewing data science talent.

She has far less to say about a New York Times article from last weekend. The piece focused on a study published in Science that "analyzed data from multiple large-scale randomized experiments on LinkedIn’s People You May Know algorithm, which recommends new connections to LinkedIn members, to test the extent to which weak ties increased job mobility in the world’s largest professional social network." The Times' said that LinkedIn ran "experiments" on more than 20 million users over five years that, "while intended to improve how the platform worked for members, could have affected some people’s livelihoods."

According to Xu, who leads LinkedIn's centralized data team that includes all AI, data science and privacy engineering teams, the study involved "no experimenting." Instead, she told VentureBeat the research "was entirely based on observational causal study - this means we used cutting-edge social science methods (the same ones that won the 2021 Nobel Prize in Economics) to analyze historical data and discover causal patterns."

A bridge between research and product

Xu said she thinks a great deal about the ethical implications of LinkedIn research, especially when it comes to using new algorithms and machine learning architecture like GPT and Transformers. At the same time, AI is core to LinkedIn products, as it is for so many of today's businesses — so she explained that her philosophy is that research and product groups have to work hand-in-hand to meet the needs of the company's three different customer ecosystems — job seekers and hiring companies; B2B buyers and sellers; and knowledge seekers/producers.

“True magic really comes when we can create a very tight connection and bridge between the research and the practical applications,” she said.

That starts with the organizational structure, with researchers and engineers working together.

"The problem itself should inform the research agenda, but at the same time the production constraints should actually inspire the research itself," she explained. "For example, if you don't have any scalability constraints, you can come up with the most complicated algorithm, but if have to fit everything within this memory, you have to use this kind of computational constraint, you have these latency constraints, all of a sudden you actually inspire and motivate the research to be done in a different way."

3 top traits of LinkedIn data science talent

That collaborative culture requires the right data science talent — Xu said there are three important things that she looks for in candidates. First, is the individual mission-driven and impact-driven?

"They want to achieve something in the end," she explained. "They may have a different approaches to achieving it...but ultimately they want to do right by members and customers."

Next, Xu wants to hire people who are — not surprisingly — collaborative. They should be those "who really care for each other, who really respect people who are coming with different skill sets," she said. "You don't want to hire individuals who are like, 'hey, I'm the smartest and the best and the brightest and no one else is right.'"

Finally, Xu said she wants people who are willing to learn, adapt and stay curious. "Nobody can come into this field and be like, 'I know everything,'" she said. "I mean, I had my Ph.D. in machine learning statistics 10 years ago, and if I compare what I did to what is [going on] today, oh my gosh, it's night and day," she said.

LinkedIn's AI and data challenges

LinkedIn's three ecosystems create AI and data challenges, said Xu, because their heterogeneity makes it hard to define a "true north" value. "AI works the best if you can say 'This is the objective function' and optimize towards that," she said.

That means there needs to be a multi-objective optimization framework for AI, complicated further by the fact that there are so many different personas involved. "It's another challenging thing to understand what their needs are and how to balance those different needs," she said.

Finally, from a technical standpoint, each of those personas comes with various problems at different scales: "We have a lot more posts on LinkedIn than we have on learning courses, for example," she said. "And they come with different latency requirements -- you have to return ads within milliseconds, but you have a lot more flexibility when it comes to, maybe, a search returned from our Sales Navigator, or recommendations by email."

AI opportunities and responsible AI

The latest AI advancements, such as large language models including GPT-3, offer opportunities for LinkedIn to tie its marketplaces together with common technology that can be used across the board, said Xu.

"Whether it's a feed post, a job description, or a member's profile, we can understand that text a lot better, and we can then map to topics that a post is about or maybe job skills and then connect that back to what this member is looking for," she said, adding that advances in algorithms, hardware and software will be a key focus overall in advancing LinkedIn's AI and data ambitions.

She added that better technology methods also now exist to better measure AI fairness in LinkedIn's feed recommendations or connection recommendations.

However, fairness is just one area LinkedIn is investing in when it comes to responsible AI using Microsoft's Responsible AI framework.

"In the fairness area, we are continuously pushing for both measurement and mitigation — how can we understand how our algorithm is doing relative to what it's intended to do?" she said. "And then mitigation is, if we identify areas that there are gaps, what are the approaches that we can do in order to mitigate it?"

Transparency is another focus area. about explaining what algorithms are doing, she said: "Can the modelers who are building these algorithms explain them to the developers? Can we explain it then to the users who are interacting with algorithms?"

It's a "very challenging" space, she admits: "But it's really, really exciting from a technology standpoint."

A bridge between research and product

3 top traits of LinkedIn data science talent

LinkedIn's AI and data challenges

AI opportunities and responsible AI

More