LinkedIn open-sources DeText, a framework for natural language processing tasks

LinkedIn today released DeText, an open source framework for natural language process-related ranking, classification, and language generation tasks. It leverages semantic matching, using deep neural networks to understand member intents in search and recommender systems. As a general framework, LinkedIn says it can be applied to a range of tasks, including search and recommendation ranking, multi-class classification, and query understanding.

According to LinkedIn senior engineering manager Weiwei Guo, DeText was designed with enough flexibility to meet the requirements of different production services. It's powered by "state-of-the-art" algorithms incorporated in an end-to-end model where the variables are jointly updated, but it attempts to balance its overall effectiveness with high efficiency.

"The framework allows users to better utilize models and embeddings across real-world applications," Guo told VentureBeat via email. "It has been applied at LinkedIn across search and recommendation ranking, query intent classification, and query auto-completion, with significant improvements in relevance ranking for members searching people and jobs."

DeText contains multiple components, all of which can be customized via preloaded templates:

An embedding layer that converts a sequence of words into a matrix, a set of numbers arranged in rows and columns. (Matrices are often used to represent the data that feeds into AI models.)
Models for text encoding, which map text data into fixed-length embeddings, or numerical representations from which algorithms can learn.
An interaction layer that generates features based on the above-mentioned text embeddings.
Feature processing that combines traditional features with the interaction features (deep features) in jointly trained wide linear models and deep neural networks. (In this context, features refer to individual measurable properties and characteristics of phenomena being observed.)
An MLP layer that combines wide and deep features.

Running DeText requires creating and launching a dev environment with the necessary dependencies, including Python. But once it's installed, an example model can be trained on the sample data set from the GitHub repository.

"Deep learning-based natural language processing has the potential to deepen how search and recommender systems understand human intent. Yet the ability to leverage models ... in commercial applications remains unwieldy due to its heavy computational load, especially when it comes to ranking results and classifying text," Guo continued. "DeText can be thought of as a cordless drill that allows users to easily swap and optimize natural language processing models, depending on the use case."

LinkedIn's use of AI is pervasive. In October 2019, the Microsoft-owned platform pulled back the curtains on a model that generates text descriptions for images uploaded to LinkedIn, achieved using Microsoft's Cognitive Services platform and a unique LinkedIn-derived data set. LinkedIn's Recommended Candidates feature learns the hiring criteria for a given role and automatically surfaces relevant candidates in a dedicated tab. And its AI-driven search engine employs data like the kinds of things people post on their profiles and the searches candidates perform to produce predictions for best-fit jobs and job seekers. Moreover, LinkedIn's AI-driven moderation tool automatically spots and removes inappropriate user accounts.