Google's AI system can grade prostate cancer cells with 70% accuracy

Approximately one in nine men in the U.S. will develop prostate cancer in their lifetime, according to the National Cancer Institute, and more than 2.9 million patients diagnosed with it at some point are still alive today. And from a treatment perspective, it tends to be problematic -- prostate cancer is frequently non-aggressive, making it difficult to determine which, if any, procedures might be necessary.

Google has made headway in diagnosing it, encouragingly, with the help of artificial intelligence (AI). In a paper ("Development and Validation of a Deep Learning Algorithm for Improving Gleason Scoring of Prostate Cancer") and accompanying blog post, Google AI researchers describe a system that uses the Gleason score -- a grading system that classifies cancer cells based on how closely they resemble normal prostate glands -- to detect problematic masses in samples.

The goal, according to technical lead Martin Stumpe and Google AI Healthcare product manager Craig Mermel, was to develop AI that could perform Gleason grading objectively -- and precisely. Human pathologists disagree on grades by as much as 53 percent, studies show.

"We developed a deep learning system (DLS) that mirrors a pathologist's workflow by first categorizing each region in a slide into a Gleason pattern, with lower patterns corresponding to tumors that more closely resemble normal prostate glands," they wrote. "The higher the grade group, the greater the risk of further cancer progression and the more likely the patient is to benefit from treatment."

The researchers developed the AI model by first collecting anonymized images of prostatectomy samples, which they noted contain a greater amount and diversity of prostate cancer than needle core biopsies. A group of 32 general pathologists provided annotations of Gleason patterns (resulting in over 112 million annotated image patches) and an overall Gleason group grade for each image. In order to mitigate variability, each slide was graded independently by 3 to 5 pathologists from a cohort of 29, in addition to a genitourinary-specialist pathologist.

The results were promising. In testing, the AI model achieved an overall accuracy of 70 percent, beating the 61 percent achieved by the U.S. board-certified pathologists who participated in the study. Moreover, it performed better than eight of the ten "high-performing" individual pathologists who graded the validation set's slide, and better identified patients at higher risk for disease recurrence after surgery. Finally, it was able to characterize tissues that straddled the line between two Gleason patterns -- e.g., Gleason pattern 3.3 or 3.7, between 3 and 4.

Future work will investigate how the system might be integrated into pathologists' diagnostic workflows; how it might be adapted to work on diagnostic needle core biopsies; and its overall impact on "efficiency, accuracy, and prognostic ability." The researchers warn, though, that its accuracy stands to be improved with additional training data.

"There is much more work to be done before systems like our DLS can be used to improve the care of prostate cancer patients," Stumpe and Mermel wrote. "Nontheless, we are excited about the potential of technologies like this to significantly improve cancer diagnostics and patient care."

Google and Deepmind, its AI research subsidiary, are involved in several health-related AI projects, including an ongoing trial at the U.S. Department of Veterans Affairs that seeks to predict when patients’ conditions will deteriorate during a hospital stay. Previously, Deepmind partnered with the U.K.’s National Health Service to develop an algorithm that could search for early signs of blindness, and to improve breast cancer detection by applying machine learning to mammography.