Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) say they’ve developed a recommendation algorithm that predicts the probability a patient’s urinary tract infection (UTI) can be treated by first- or second-line antibiotics. With this information, the model makes a recommendation for a specific treatment that selects a first-line agent as frequently as possible, without leading to an excess of treatment failures.

UTIs, which affect half of all women, add almost $4 billion a year in health care costs. Doctors often treat UTIs using antibiotics called fluoroquinolones, but they’ve been found to put women at risk of contracting other infections. They’re also associated with a higher risk of tendon injuries and life-threatening conditions like aortic tears, leading medical associations to issue guidelines recommending fluoroquinolones as “second-line treatments.” (A second-line treatment is a treatment for a disease employed after the initial treatment has failed, stopped working, or caused intolerable side effects.) Despite this, doctors with limited time and resources continue to prescribe fluoroquinolones at high rates.

The CSAIL team claims that their model, which was trained on data from more than 10,000 patients from Brigham & Women’s Hospital and Massachusetts General Hospital, would allow clinicians to reduce the use of second-line antibiotics by 67%. For patients where clinicians chose a second-line drug but the algorithm chose a first-line drug, the first-line drug ended up working more than 90% of the time. When clinicians chose an inappropriate first-line drug, the algorithm chose an appropriate first-line drug almost half of the time.

The system adopts a thresholding approach the team hopes will be intuitive for clinicians to apply to a range of drugs. Toward this goal, the model is structured to be directly embedded into electronic health records (EHR). A doctor might set the threshold treatment failure at a relatively high number like 10%, reflecting the fact that UTI treatments are unlikely to lead to life-threatening side effects. In contrast, treatments for certain bloodstream infections have a much higher risk of death, so in those cases a doctor could set the treatment failure far lower (e.g., 1%).

The team admits they haven’t tested their algorithm on more complicated forms of UTIs and that it hasn’t been assessed with a randomized controlled trial. Indeed, studies show that much of the data used to train algorithms for diagnosing diseases may perpetuate inequalities. A team of U.K. scientists found that almost all eye disease datasets come from patients in North America, Europe, and China, meaning eye disease-diagnosing algorithms are less certain to work well for racial groups from underrepresented countries. In another study, Stanford University researchers claimed that most of the U.S. data for studies involving medical uses of AI come from California, New York, and Massachusetts.

Moving forward, the MIT team says their efforts will focus on trials comparing usual practices to algorithm-supported medical decisions. They also plan to increase the diversity of their sample size to improve recommendations across race, ethnicity, socioeconomic status, and more complex health backgrounds. “What’s exciting about this research is that it presents a blueprint for the right way to do retrospective evaluation,” research coauthor and MIT professor David Sontag said. “We do this by showing that one can do an apples-to-apples comparison within the existing clinical practice. When we say we can reduce second-line antibiotic use and inappropriate treatment by certain percentages, we have confidence in those numbers relative to clinicians.”


Best practices for a successful AI Center of Excellence: A guide for both CoEs and business units Access here