So you want to build a data science team?

Internet companies looking to start a data science team often get overwhelmed with the challenges and specific characteristics of hiring, building and growing a team.

They can become confused by all the terms, praises and buzzwords around certain technologies, algorithms and skills. Also, starting a team of this kind is not the same as it is with an average software development team. Profiles are more specific, terminology is more exotic, and there is little consensus on the market regarding best practices and the state of the art.

One major International retailer approached me recently for advice on how to build an in-house team from scratch for their E-commerce team and I would like to share with you the elements that I consider every company should clarify before getting started in this endeavour.

In this post, I will touch on these three key topics: accountability, resources, and team composition.

Accountability

It should be very clear from the beginning for everyone exactly where in the organizational chart the team will be located and who the main stakeholders will be.

There are multiple approaches that can be used. Some organizations put the data science team under the CTO, others under the CFO or even the CMO, others prefer a federated system with specialists distributed across departments and supervised by a project manager, while others go for the R&D route where the team does not have a specific agenda or stakeholder and has an open hand to decide.

This depends on the company organization, culture, and resources, as well as the team’s mission. The risk of not deciding this from the beginning can lead to confusion in the daily activities of the team.

As data science is a sexy topic, more than one person in management would be happy to have the team under her command. These expectations can lead to friction and confusion that can seriously affect the performance of the newly formed team, if they are not addressed right from the start.

To find more exclusive insights from tech industry insiders, explore VentureBeat's selection of recent guest posts.

Resources

Anyone familiar with the current state of the job market must be aware that technical talent in this area does not come at a low price, yet it is surprising how budgets are not properly planned.

For an Internet company with 300 or more employees trying to create a centralized team with a specific mission (e.g. recommendation engines, customer reactivation, etc.) a good first start is a team of 5 to 8 people, where one is the technical project manager, 1-2 are the hardcore data scientists responsible for modeling, and 3-5 are the data engineers deploying the production code.

Over time, teams can become larger and similar teams with different missions can surge. Therefore, a quality team represents a significant commitment and this should be clear for every stakeholder.

Team Composition

After determining the resources available and the expected team size, the next big topic is who to hire. For the regular HR department this becomes very quickly an impossible task. Very fast mailboxes are flooded with résumés containing all types of exotic qualifications and never-heard-before terms.

Here it is also very easy to be influenced by media or technology vendors. Hence, it should be defined which hard skills and technologies are relevant, if education weighs more than experience, if big names in a résumé carry an extra weight, if it is really necessary to hire super senior engineers or long-experienced post-docs. This is easier said than done, as in the seed stage of the team there are still many unanswered questions.

My advice is to start with solid basics and not look for the über-exotic. Then, the objective in the first year or two of the existence of the team is to lay a foundation and justify the existence of the team through quick gains and harvesting the low-hanging fruit.

Taking the above example of 8 individuals, and considering that the company might not be able to compete with the Googles and Facebooks of this world in prestige, remuneration and perks, a good initial composition can look as following:

Technical Project Manager: The person has 3 to 5 years experience managing similar teams dealing with quantitative subjects. Preferably, this person has a solid technical background and although she is not expected to code, she is capable of doing it. This person not only has the skills expected in a project manager, but has also an understanding of the algorithms and techniques used by the team and great if she can also do code reviews.

Data Scientist: Someone with a solid quantitative background. Ideally, she holds a Ph.D in the fields of Physics, Mathematics, Computer Science, Biology, or associated disciplines. This person should be judged by the quality of her research, where she has published, and what she has contributed.

It is entirely possible to be an expert in machine learning and be really bad in software development. Hence, it is very important to not assume anything and double-check her coding skills. Unless you want to develop a more academic R&D team, somebody who cannot code will not be very helpful, especially in the early days of the team. Additionally, it is important to verify how hands-on the individual is, as candidates from academia sometimes have wrong expectations of what industry needs from them.

Data Engineer: This person does not need to be very academic. She can be a solid software developer with an interest in quantitative topics. This person must have a very solid understanding of algorithms, data structures and software engineering in general. Double-check the algorithms part (especially computational complexity), as many engineers have a poor understanding of the subject, yet it is essential for every robust data team. Overall, her code must be excellent. Try to look for individuals who actively contribute to open source projects. Ideally, this person uses the same technology stack as your data scientists (e.g. Python, Scala, etc).

Seniority for each of these positions depends on the company and budget. However, I do not recommend hiring very senior individuals in the beginning. Often, they have very specific expectations, but in the early days of the team, the scope and nature can change dramatically.

In addition, data teams have to create their own platforms in the beginning, as the data they need might not be there or might not be in the formats that they want it. This means doing non-glamorous tasks and getting dirty.

Therefore, it is preferable to have ambitious and adaptable individuals, even if they might not be very experienced.

Rodrigo Rivera is a Mexican German data entrepreneur and founder of Emplido, an analytics recruiting company acquired by Experteer Inc. In Asia and Europe, he has built and led data science teams for Rocket Internet in the areas of product management, advertisement technology, CRM, data insights and sales.

Accountability

Resources

Team Composition

More