Google claims its AI system can grade prostate cancer samples with 72% accuracy

In a study published today in the journal JAMA Oncology, Google researchers claim to have developed an AI system that accurately identifies signs of prostate cancer in biopsies. Building on an algorithm that grades large, surgically removed cancerous segments of prostates, they say their system -- which was developed with support from the Naval Medical Center in San Diego and Verily, Alphabet's life sciences division -- works on the smaller samples extracted during the initial part of cancer care to get diagnoses and prognoses.

Prostate cancer biopsies are commonly taken to better evaluate tumors' aggressiveness. The Gleason score, a grading system that classifies cancer cells based on how closely they resemble normal prostate gland tissue, is used to detect problematic masses. But determining which of three Gleason patterns a tumor falls into and assigning a grade based on the relative amounts of pattern in the whole sample is a challenging task -- one that relies on subjective visual inspection and experience. By some estimates, pathologists disagree on the right grade for a tumor 50% of the time.

The researchers' system first "grades" each region of biopsy and then summarizes the region-level classifications into an overall biopsy-level score, contending with the smaller amount of tissue and changes to the sample from the tissue extraction and the preparation processes. In experiments with six pathologists specializing in prostate cancer with an average of 25 years of experience, the team sought to evaluate the system's accuracy on 498 deidentified tumor samples.

According to the study results, the Google-developed system achieved 72% accuracy -- higher than the 58% achieved by a baseline cohort of general pathologists without prostate cancer training. Taking the ambiguous appearances of some prostate cancers into account, the system's agreement rate with experts was comparable to the agreement rate between the experts themselves, according to the Google researchers.

In addition to Gleason grading, the Google researchers evaluated the general pathologists' performance compared with the system for differentiating specimens with and without cancer. Given a total of 752 samples, the pathologists and system were in agreement in 94.3% to 94.7% of cases; while the system caught more cancers, it also flagged more false positives.

"These promising results indicate that the deep learning system has the potential to support expert-level diagnoses and expand access to high-quality cancer care. To evaluate if it could improve the accuracy and consistency of prostate cancer diagnoses, this technology needs to be validated as an assistive tool in further clinical studies and on larger and more diverse patient groups. However, we believe that AI-based tools could help pathologists in their work, particularly in situations where specialist expertise is limited," Google Health software engineer Kunal Nagpal and scientist Craig Mermel wrote in a blog post. "We look forward to future research and investigation into how our technology can be best validated, designed and used to improve patient care and cancer outcomes."

More