At MailChimp, data science works behind the scenes

REDWOOD CITY, Calif. -- Data science doesn't need to look cool. It doesn't need to use trendy technologies, either. What it ought to do is solve problems.

And data science has done that for MailChimp, in a variety of applications, as company data scientist John Foreman showed during a presentation today at VentureBeat's DataBeat/Data Science Summit event.

Internally, scheduling support staff was such a burden for the e-mail marketing service that two people were spending their working hours managing workers' schedules. "And that's a really bad idea," Foreman said.

It was a classical optimization problem, he said. A plugin for Microsoft Excel, called OpenSolver, helped Foreman quickly construct what he described as an "optimal schedule."

Finding spammers among the people who go to MailChimp to send e-mail was another job for data science. One data set MailChimp gets is e-mail addresses to send to. That's data worth tapping to build a model that can weed out the spammers to solve a business problem.

"We've been in business over 10 years," Foreman said. "We have a really great training set for determining who's a bad actor and who's not." Indeed, data science can tell a good bit about the people behind e-mail addresses.

And a user-facing service Foreman highlighted is not exactly glitzy. It just recommends the best time for a customer to shoot out an e-mail. Customers don't have to send its e-mail with at what MailChimp identifies as the "time for maximum engagement," but it's an option available to paying customers in the latest version of MailChimp.

There are no heat maps that tell you what you already know or infographics packed with meaningless information to see here. It's a no-questions-asked, one-line option that could help customers.

There's a bit of complexity going on, like how the time recommendation is only good for 24 hours, to reflect changes in data, as Foreman wrote in a recent MailChimps blog post. But that's abstracted away, so customers can simply get the most out of an e-mail blast every time.

Foreman has found is that a "not exotic stack" often works. For example, a good old PostgreSQL database can work just fine for solving problems when data is inherently structured. Hadoop and a NoSQL database might not be necessary despite the continuing hype around it. The point is to avoid using a fancy new technology just because it's name-dropped in a news article and instead circumvent unnecessary risk and complexity.

"A data science team should align itself with the business and serve that business," Foreman said. "The purpose of the data science team is to lead from the back, not to make headlines."

That might be a hard concept for some executives to accept today, as even hiring data scientists is an achievement to start with. But in time the hype will subside, and Foreman's views stand to become common sense.

More