Google AI claims 99% accuracy in metastatic breast cancer detection

Metastatic tumors -- cancerous cells which break away from their tissue of origin, travel through the body through the circulatory or lymph systems, and form new tumors in other parts of the body -- are notoriously difficult to detect. A 2009 study of 102 breast cancer patients at two Boston health centers found that one in four were affected by the "process of care" failures such as inadequate physical examinations and incomplete diagnostic tests.

That's one of the reasons that of the half a million deaths worldwide caused by breast cancer, an estimated 90 percent are the result of metastasis. But researchers at the Naval Medical Center San Diego and Google AI, a division within Google dedicated to artificial intelligence (AI) research, have developed a promising solution employing cancer-detecting algorithms that autonomously evaluate lymph node biopsies.

Their AI system -- dubbed Lymph Node Assistant, or LYNA -- is described in a paper titled "Artificial Intelligence-Based Breast Cancer Nodal Metastasis Detection," published in The American Journal of Surgical Pathology. In tests, it achieved an area under the receiver operating characteristic (AUC) -- a measure of detection accuracy -- of 99 percent. That's superior to human pathologists, who according to one recent assessment miss small metastases on individual slides as much as 62 percent of the time when under time constraints.

"Artificial intelligence algorithms can exhaustively evaluate every tissue patch on a slide," the authors of the paper wrote. "We provide a framework to aid practicing pathologists in assessing such algorithms for adoption into their workflow (akin to how a pathologist assesses immunohistochemistry results)."

LYNA is based on Inception-v3, an open source image recognition deep learning model that's been shown to achieve greater than 78.1 percent accuracy on Stanford's ImageNet dataset. As the researchers explained, it takes as input a 299-pixel image (Inception-v3's default input size), outlines tumors at the pixel level, and, in the course of training, extracts labels -- i.e., predictions -- of the tissue patch ("benign" or "tumor") and adjusts the model's algorithmic weights to reduce error.

The team improved on previously published algorithms by exposing the LYNA to a 4:1 ratio of normal to tumor patches, and by increasing the "computational efficiency" of the training process, which in turn led to the algorithm "see[ing]" a greater diversity of tissues. Additionally, they normalized variations in the biopsy slide scans, which they say boosted the model's performance to an even greater degree.

The researchers applied LYNA to the Cancer Metastases in Lymph Nodes 2016 challenge dataset (Camelyon16) -- a collection of 399 whole-slide images of lymph node sections from Radboud University Medical Center (Nijmegen, the Netherlands) and University Medical Center Utrecht (Utrecht, the Netherlands) -- as well as a separate set of 108 images from 20 patients. It trained on 270 of those slides (160 normal, 110 tumorous), and two evaluation sets -- one consisting of 129 slides and another of 108 slides -- were used to evaluate its performance.

In tests, LYNA achieved 99.3 percent slide-level accuracy. When the model's sensitivity threshold was adjusted to detect all tumors on every slide, it exhibited 69 percent sensitivity, accurately identifying all 40 metastases in the evaluation dataset without any false positives. Moreover, it was unaffected by artifacts in the slides such as air bubbles, poor processing, hemorrhage, and overstaining.

LYNA wasn't perfect -- it occasionally misidentified giant cells, germinal cancers, and bone marrow-derived white blood cells known as histiocytes -- but managed to perform better than a practicing pathologist tasked with evaluating the same slides. And in a second paper published by Google AI and Verily, Google parent company Alphabet's life sciences subsidiary, the model halved the amount of time it took for a six-person team of board-certified pathologists to detect metastases in lymph nodes.

Future work will investigate whether the algorithm improves efficiency or diagnostic accuracy.

"[Lyna] achieves higher tumor-level sensitivity than, and comparable slide- level performance to, pathologists," the researchers wrote. "These techniques may improve the pathologist's productivity and reduce the number of false negatives associated with morphologic detection of tumor cells."

Google has invested broadly in AI health care applications. This spring, the Mountain View company’s Medical Brain team claimed to have created an AI system that could predict the likelihood of hospital readmission and that they had used it in June to forecast mortality rates at two hospitals with 90 percent accuracy. And in February, scientists from Google and Verily created a machine learning network that could accurately deduce basic information about a person, including their age and blood pressure, and whether they were at risk of suffering a major cardiac event like a heart attack.

DeepMind, Google's London-based AI research division, is involved in several health-related AI projects, including an ongoing trial at the U.S. Department of Veterans Affairs that seeks to predict when patients’ conditions will deteriorate during a hospital stay. Previously, it partnered with the U.K.’s National Health Service to develop an algorithm that could search for early signs of blindness. And in a paper presented at the Medical Image Computing & Computer Assisted Intervention conference earlier this year, DeepMind researchers said they’d developed an AI system capable of segmenting CT scans with “near-human performance.”