Researchers propose platform for evaluating disease-forecasting AI methods

Since the start of the pandemic, there's been an influx of papers on epidemic forecasting. Indeed, as of February, a search for "COVID forecasting" on Google Scholar yields over 14,000 results. But while many researchers compare their approaches against traditional modeling strategies, forecasts are highly sensitive to the implementation, which requires a well-defined benchmark.

Researchers at the University of Southern California propose a benchmark in EpiBench, which focuses on retrospective forecasting -- i.e., forecasting when ground truth is already available. While the platform is in the preliminary stages, the researchers believe that it could benefit real-time forecasting efforts by the U.S. Centers for Disease Control and other agencies around the world that drive governments' responses.

Modeling is only a part of the forecasting process and can't be considered a method to compare against. It describes only the epidemic portion rather than data preprocessing and model architecture development. Therefore, the researchers say, without a platform like EpiBench, claiming a new AI forecasting approach performs better than another isn't possible.

As a proof of concept, the researchers developed a prototype of EpiBench called "COVID-19 forecast-bench," which keeps a daily record of COVID-19 cases and deaths as reported by Johns Hopkins University. Researchers are required to provide details of their methodologies regarding data preprocessing, modeling techniques, and learning strategy, and these methodologies are evaluated for 1-, 2-, 3-, and 4-week-ahead forecasts.

In an experiment, the researchers compared 3 AI and machine learning forecasting methods and 30 methodologies pulled from published research using EpiBench. They found that while many of the forecasts reportedly used the same model (SEIR), they predicted "drastically" different outcomes. Moreover, two methodologies identical except that one smoothed data over 14 days versus the other's 7 days varied "significantly" in their performance, suggesting that data preprocessing played a nontrivial role.

The researchers believe that EpiBench will help the research community in making decisions by analyzing methodologies based on how they arrive at their forecasts. Expanding from the prototype, the researchers hope to launch a website, a GitHub repository, and a Slack channel for discussion. If all goes according to plan, developers will be able to view evaluations for submitted models and upload sets of forecasts for benchmarking, as well as upload code to reproduce results and run code on more datasets.

"EpiBench can help us identify which decisions (data preprocessing, modeling choice, learning strategy, hyper-parameter tuning, etc.) in the forecasting approach are critical in epidemic forecasting and help direct the research accordingly," the researchers wrote in a paper describing EpiBench. "In the future, we wish to expand EpiBench to various epidemic forecasting tasks. The platform will be available to the AI and machine learning researchers and epidemiologists."