Obama’s chief Data scientist Rayid Ghani has just summoned 48 socially minded aspiring data scientists to Chicago this summer for a fellowship called Data Science for Social Good (DSSG).
Funded by Google chairman Eric Schmidt, this fellowship started last year and this year is its second time “to tackle problems that really matter.” This year, projects partners include World Bank Group, City of Memphis, and Montgomery County (MD) Public Schools.
Those 48 fellows are selected from a pool of 300 applicants from all around the world. Most of them are students — PhD candidates, Master students and recent undergraduates. They usually come from a quantitative background such as computer science or statistics. A few of them do bring expertise in political science, sociology or public health into the program.
These fellows will be working in teams of four on different projects, delivering results to the clients while learning in the process. The problems they will tackle range from “prediction and identification of collusion in international development projects” to “optimizing treatment for expectant mothers.”
Forty-eight data scientists trying to save the world? Although it sounds grand, they might actually have a way to achieve it, which is to build an infrastructure of big-data tools for nonprofits.
“One of the problems nonprofits face today is, even if they have the resources and the people’s capability to do this kind of work, they have to start from scratch,” said Ghani, in an interview with VentureBeat.
“There aren’t many existing, easily available, free or cheap tools they can start building on top of, designed for their needs.”
The fellowship signs project agreements with all its partners to open source the code they have developed for the projects so that the code can be re-used by others. To save software-licensing money for clients and potential code users, the fellows will also try as hard as possible to use open-source software.
One project this year that can help with the development of such a tool is tackling “text analysis of government spending bills to understand pork spending.” In this project, DSSG mentor Joe Walsh, a political science PhD candidate from University of Alabama, will assist a group of four students in “identifying earmarks using machine learning methods” to figure out where the money has been allocated. Sunlight Foundation provides data sources, and the Harris School of Public Policy at University of Chicago will provide more technical expertise.
“There are lots of groups that have armies of people who just read lots and lots of congressional texts and try to identify earmarks that way,” said Walsh.
“We’ve spoken with some advocacy groups, and we’ve spoken with lots of researchers. And lots lots of people we have spoken with are really excited about this project, because it holds so much potential. No one, it doesn’t seem anyone has done this before. ”
Some of the code that came out from last year’s projects has already been reused by non-profits other than the fellowship’s clients.
One example is from Paul Meinshausen, a 2013 DSSG fellow who built a map of the City of Chicago Data Portal to simplify the design of and increase access to the city’s data.
“Within a day, a couple of Code for America fellows had reused the code to build maps of Boston and San Francisco’s data portals,” according to a post published by Meinshausen last July.
“Jason Lally, an urban planner with Place Matters, substantially improved the way the map collects data about portals and built a map for every Socrata data portal.”
On the first day of the program, fellows are instilled with the value of openness, according to the post published by Carl Shan, a 2014 DSSG fellow. “We share what we are learning and doing to help more people learn to do what we’re doing.”