5 lessons we learned about data science in 2013

Most people know what marketing executives do every day. They try to catch people's attention through email, ads, tweets, and press releases. As for data scientists, well, their work is not nearly as well understood.

That's been slowly changing this year as companies slowly loosen up about letting their hard-won data scientists talk about their work.

This year, VentureBeat has learned a lot about these fawned-over specimens. But our knowledge isn't always delivered at once. That's why we've brought together some of the lessons we've picked up in 2013:

Data scientists should be creative

This point became clear as Jeremy Howard, the former president of data science competition-holder Kaggle, spoke with fellow luminaries in the field at VentureBeat's 2013 DataBeat/Data Science Summit event a few weeks ago.

While popular algorithms such as Random Forests often have a hand in helping people win Kaggle competitions, "the best applied real-life data scientists ... are extremely creative," Howard said. They pore over the raw data for the competitions they enter and discern patterns. Howard prefers to just jump into the data without any context about it.

Choose a business problem and then the tools, not the other way around

During his talk at the DataBeat/Data Science Summit, MailChimp data scientist John Foreman poked fun at companies that hear about big data in headlines and then throw money at elaborate, expensive technology before they've decided what to do with it.

"This is kind of like when you decide, 'I'm going to get fit,'" Foreman said, when many spend on workout clothes and a gym membership. "You just buy all the tools first because you get this illusion of making progress. 'I've spent some of my budget. Something must be happening.'"

But companies should first see what data they can work with, and then they should identify business problems that can be solved.

"Only then do you actually start work," Foreman said. "You choose data techniques and the technologies that solve the problem you've identified."

The process might seem boring, but it could save time and money by circumventing needless "research."

Low-cost educational resources abound

Universities from Columbia to the University of California, Berkeley have introduced data science programs, and they have their merits. At the same time, the massively open online course (MOOC) sites such as Coursera provide introductions to data science, and at least one more, from Udacity, is on the way (although it won't be free).

Those wishing to learn more can spot additional high-quality information in other corners of the Internet. The recently formed website DataTau aims to bring together many of these resources and provide a place for data scientists to discuss their work. Rohit Sivaprasad, who started DataTau earlier this month, has picked up data science skills online alongside his academic learnings, and he's now ranked in 532nd place on Kaggle out of more than 135,000 users.

Developers can learn to think like data scientists

The wide availability of educational information online has created an environment in which people can pick up technological knowledge about data science. It wouldn't be impossible for some developers to start thinking like data scientists. They could develop the instinct to capture relevant data, apply the right algorithms, deal with the limitations of algorithms and incomplete data sets, and determine the next steps.

That's what Pivotal data scientist Hulya Farinas had in mind when she said at the DevBeat/Data Science Summit that it's "a very tiny jump" for developers with computer-science training to start working with data science algorithms.

Coverage of her remarks elicited pushback from some data scientists, Farinas said in an interview with VentureBeat. "I think they would rather keep the data-science community a little exclusive club, and no one else is allowed," said Farinas, who leads health care and life sciences on Pivotal's data science team.

In her opinion, it doesn't take a Ph.D. in statistics to be able to ask data-science questions. "Anyone is allowed, but it's not trivial," she said. It might be more realistic for developers to collaborate with data scientists, she said.

But Farinas' larger point is taken: Data science could help developers do their jobs better.

Data scientists should team up with colleagues -- sometimes

At LinkedIn, some data scientists put inside product teams, where they can ensure that specific pieces of social career network get optimum performance to achieve business goals, the company's senior director of data science, Jim Baer, said at the DataBeat/Data Science Summit this month.

At the same time, Baer said, "a central team" of data scientists handles the dirty work of cutting up and cleaning data before sending it to other teams. That leaves the company with a hybrid approach for using data scientists: keeping some separate from the rest of the company and integrating others with other departments.

But not every company follows LinkedIn's lead. Data scientists are more centralized at GE, said Anil Varma, the head of the data science center of excellence at GE Software, while Intuit favors cross-team collaboration, said Vineet Singh, its director of data innovation and advanced technology.

Even when it comes to setting up technology stacks, data scientists ought to have a say, to ensure good access to and easy use of data, Singh said.

Each company needs to figure out how to make the most of its data scientists. Lyft just hired former Netflix data science leader Chris Pouliot to build a data-science team. Just because hybrid works for LinkedIn doesn't mean it will work for Lyft.

What's coming in 2014

Data science covers more than these five points. But people are still learning the basics of big data and data science. Adoption is hardly a given at every company yet. As data science becomes more commonplace, more best practices should emerge. This time next year, we should have five more lessons to share -- if not many more.