How IBM's life-saving tech tracks food poisoning to its source

IBM said it has created a system to predict the sources of contaminated foods after outbreaks of food-borne illnesses. The company believes it can help grocers, distributors, and public health officials accelerate the investigation of outbreaks of food poisoning using novel computer algorithms, visualization, and statistical techniques.

Solving contaminated food outbreaks is a big public health issue. In the U.S., one in six people are affected by food-borne diseases each year. That results in 128,000 hospitalizations, 3,000 deaths, and $9 billion in medical costs. Add to that another $75 billion in contaminated food that has to be thrown away.

"People are getting sick, and commercial retailers need to know what is coming from where," James Kaufman, manager of Public Health Research for IBM Research at the IBM Almaden Research Center in San Jose, Calif., told VentureBeat in an interview.

After looking at as few as 10 cases, IBM can make educated guesses about the source of a food poisoning outbreak. Its confidence that the culprit food will be in its list of suspect foods at that stage in an outbreak is about 95 percent. That's a remarkable result, and it is the latest in a number of cases in which IBM is using its big data analysis capability to predict problems ranging from mining failures to malaria outbreaks.

IBM's computatonal tools can use information about the date and location of billions of supermarket food items sold each week to identify with a high probability the set of potentially "guilty" products. The research was published today in the peer-reviewed journal PLOS Computational Biology. The predictive analytics are based on the same big data analytics techniques game developers use to predict which gamers are likely to drop out of a mobile game.

Kaufman said the work came about due to collaboration with the report's coauthors at Johns Hopkins University and the German Federal Institute for Risk Assessment (BfR). When a food-borne illness such as E.coli bacteria infections is detected, identifying the culprit food is essential to minimizing the spread of the illness and limiting economic losses. But the time required to detect a problem may be days or weeks, straining the public health system. In 2011, an E.coli outbreak in Germany made more than 4,000 people sick and left 50 dead. In that case, it took two months to track down the source of the contamination, and German retailers suffered losses of more than 150 million euros, or $205 million.

Kaufman said that the petabytes of retail sales data have never before been used to accelerate the identification of contaminated food. But that data is already recorded in the computer inventory systems that retailers and distributors use. They manage up to 30,000 food items at any given time, and 3,000 of those are likely perishable. So IBM built a system that automatically identifies, contextualizes, and displays data from multiple sources to reduce the time it takes to identify the prime culprits. It integrates the pre-computed retail data with geo-coded public health data that the investigators collect on victims. With knowledge of the victim's zip code and data on the lab reports, the system makes a prediction about the source of the bad food. With each new case, the algorithm learns and recalculates the probability of each food that might be causing the illness.

But Kaufman was careful to say IBM can't predict everything. It can use the system on relatively large outbreaks. And the work so far has taken about two years of research.

"I don't want to oversell this, as we are focused on the biggest outbreaks," Kaufman said.

IBM is working with public health organizations and retailers in the U.S. to scale the research prototype and begin processing information from 1.7 billion supermarket items that are sold each week in the country.

In the research, the scientists simulated 60,000 outbreaks of food-borne disease across 600 products using real-world food sales data from Germany. The problem that slows investigators is that the reports of illnesses trickle in over time. So it takes weeks for public health officials to identify causes. IBM had about eight researchers working on the problem in San Jose, and it collaborated with a variety of other researchers.

IBM is working on other ways of detecting problems with the food chain, including doing genetic scanning. Last year, for instance, horse meat was found in hamburgers. Screening for that kind of problem is possible, Kaufman said.

More