How NASA is using knowledge graphs to find talent

One of NASA’s biggest challenges is identifying where data science skills reside within the organization. Not only is data science a new discipline – it’s also a fast-evolving one. Knowledge for each role is constantly shifting due to technological and business demands.

That’s where David Meza, acting branch chief of people analytics and senior data scientist at NASA, believes graph technology can help. His team is building a talent mapping database using Neo4j technology to build a knowledge graph to show the relationships between people, skills, and projects.

Meza and his team are currently working on the implementation phase of the project. They eventually plan to formalize the end user application and create an interface to help people in NASA search for talent and job opportunities. Meza told VentureBeat more about the project.

VentureBeat: What’s the broad aim of this data led project?

David Meza: It’s about taking a look at how we can identify the skills, knowledge and abilities, tasks, and technology within an occupation or a work role. How do we translate that to an employee? How do we connect it to their training? And how do we connect that back to projects and programs? All of that work is a relationship issue that can be connected via certain elements that associate all of them together – and that’s where the graph comes in.

VentureBeat: Why did you decide to go with Neo4j rather than develop internally?

Meza: I think there was really nothing out there that provided what we were looking for, so that's part of it. The other part of the process is that we have specific information that we're looking for. It’s not very general. And so we needed to build something that was more geared towards our concepts, our thoughts, and our needs for very specific things that we do at NASA around spaceflights, operations, and things like that.

VentureBeat: What’s the timeline for the introduction of Neo4j?

Meza: We're still in the implementation phase. The first six to eight months was about research and development and making sure we had the right access to the data. Like any other project, that's probably our most difficult task – making sure we have the right access, the right information and thinking about how everything is related. While we were looking at that, we also worked in parallel on other issues: what's the model going to look like, what algorithms are we going to use, and how are we going to train these models? We've got the data in the graph system now and we’re starting to produce a beta phase of an application. This summer through the end of the year, we're looking towards formalizing that application to make it more of an interface that an end user can use.

VentureBeat: What’s been the technical process behind the implementation of Neo4j?

Meza: The first part was trying to think about what's going to be our occupational taxonomy. We looked at: “How do we identify an occupation? What is the DNA of an occupation?” And similarly, we looked at that from an employee perspective, from a training perspective, and from a program or project perspective. So simply put, we broke everything down into three different categories for each occupation: a piece of knowledge, a skill, and a task.

VentureBeat: How are you using those categories to build a data model?

Meza: If you can start identifying people that have great knowledge in natural language processing, for example, and the skills they need to do a task, then from an occupation standpoint you can say that specific workers need particular skills and abilities. Fortunately, there’s a database from the Department of Labor called O*NET, which has details on hundreds of occupations and their elements. Those elements consist of knowledge, skills, abilities, tasks, workforce characteristics, licensing, and education. So that was the basis for our Neo4j graph database. We then did the same thing with training. Within training, you're going to learn a piece of knowledge; to learn that piece of knowledge, you're going to get a skill; and to get that skill, you're going to do exercises or tasks to get proficient in those skills. And it’s similar for programs: we can connect back to what knowledge, skills, and tasks a person needs for each project.

VentureBeat: How will you train the model over time?

Meza: We’ve started looking at NASA-specific competencies and work roles to assign those to employees. Our next phase is to have employees validate and verify that the associated case — around knowledge, skills, abilities, tasks, and technologies — that what we infer based on the model is either correct or incorrect. Then, we’ll use that feedback to train the model so it can do a little bit better. That's what we're hoping to do over the next few months.

VentureBeat: What will this approach mean for identifying talent at NASA?

Meza: I think it will give the employees an opportunity to see what's out there that may interest them to further their career. If they want to do a career change, for example, they can see where they are in that process. But I also think it will help us align our people better across our organization, and we will help track and maybe predict where we might be losing skills, where we maybe need to modify skills based on the shifting of our programs and the shifting of our mission due to administration changes. So I think it'll make us a little bit more agile and it will be easier to move our workforce.

VentureBeat: Do you have any other best practice lessons for implementing Neo4j?

Meza: I guess the biggest lesson that I've learned over this time is to identify as many data sources that can help you provide some of the information. Start small – you don't need to know everything right away. When I look at knowledge graphs and graph databases, the beauty is that you can add and remove information fairly easily compared to a relational database system, where you have to know the schema upfront. Within a graph database or knowledge graph, you can easily add information as you get it without messing up your schema or your data model. Adding more information just enhances your model. So start small, but think big in terms of what you're trying to do. Look at how you can develop relationships, and try to identify even latent relationships across your graphs based on the information you have about those data sources.

More