How not to write about cancer diagnostics

Last week, a Johns Hopkins research team reported positive results from an experimental diagnostic for prostate cancer, leading to a fair bit of excitement and media attention. The first story I noticed was an effusive report in the Washington Post, although the test results — published in the journal Urology (no link available, although the abstract is here) — also led to stories in the LA Times, the CBS Web site and USA Today.

The gist was that an experimental test for a protein called EPCA-2 appears to find prostate tumors missed by current diagnostics, while also avoiding “false positives” that can lead to unnecessary biopsies. On its face, that certainly sounds like great news, particularly since the most widely used test today — which looks for a protein called prostate-specific antigen (PSA) — is indeed wildly inaccurate, although many doctors still consider it better than nothing.

So far, so good. There’s just one problem: The trial described in Urology was too small and not designed properly for anyone to draw the conclusion that EPCA-2 works any better than PSA — or any of what are likely dozens of other protein “biomarkers” that various research teams are studying as possible diagnostics for prostate cancer.

To understand why, it helps to recall two important things about clinical trials. The first is that more subjects you enroll in a trial, the more weight the results will carry. (This is only true up to a point, but it’s a good approximation for our purposes.) If you’re only looking at a diagnostic’s performance in a limited number of patients, the odds are much higher that whatever results you see may be the result of sheer chance or other bias.

The second point is that the only reliable way to know that a diagnostic like EPCA-2 works is to test it in a prospective, or “forward looking,” trial. In this case, a prospective trial would most likely involve testing a large group of healthy men regularly with both the PSA and EPCA-2 tests and watching to see which ones developed prostate cancer and which test better predicted it. (Again, I’ve simplified things somewhat for clarity.) By contrast, a retrospective, or “backward looking,” test involves taking patients whose cancer status is already known and seeing if the experimental test correctly identifies them. It’s not a bad way to get a rough-and-ready sense of whether an experimental diagnostic is worth pursuing further, but it’s nowhere near as statistically reliable as a prospective trial.

One more piece of background: To date, most studies of biomarkers that are supposed to predict cancer or heart disease have involved small groups of patients in retrospective studies. The track record for such studies is not great: One recent large and rigorous effort to “validate” more than 80 biomarkers thought to predict heart attacks and related problems found that only one of them even came close to working as advertised.

You can probably tell where I’m headed. The Urology study, of course, was retrospective, and its key findings involved tests in relatively small groups of men — 30 in one case, 40 in another, 18 in a third — whose cancer status was already known. (The person performing the diagnostic was “blinded” to the patient status, which eliminates one possible source of bias, but doesn’t affect the larger statistical problem.) Which puts the latest EPCA-2 results pretty much squarely in the realm of promising but unproven.

In fact, the Urology report itself notes that the test values reported “are not necessarily reflective of a screening population,” and Robert Getzenberg, the Johns Hopkins University researcher who led the study, confirmed to me via e-mail that “it is clear that further validation is required.” (Getzenberg went on to tell me that he thought the EPCA-2 test would produce “equal if not better” results in a broader population. He holds a patent on the test and has consulted for and received a grant from Onconome, a Seattle biotech developing EPCA-2 as a commercial product.)

To its credit, the USA Today story put those caveats squarely before the reader. Here’s the lead paragraph from reporter Liz Szabo:

Researchers at the Johns Hopkins University School of Medicine are trying to develop a more reliable way to find prostate cancer. While experts say the new test is promising, they say it’s too soon to know whether it really works better than older screening methods.

Contrast that with the following from the LAT’s Susan Brink:

A new prostate test that relies on measuring levels of a blood protein called EPCA-2 accurately found cancer 94% of the time, a significant improvement over the current PSA test, according to a study released Wednesday.

Or, worse, the WaPo’s David Brown, whose lead reports the researchers’ belief in the test as if it were fact:

An experimental blood test for prostate cancer may help eliminate tens of thousands of unnecessary biopsies at the same time that it detects many tumors that are now missed by the test commonly used, its developers said yesterday.

Somewhat more incredibly, Brown managed not to mention either of the study’s key limitations anywhere in the text of his story, although he did note uncritically that the EPCA-2 test could be commercially available by 2008. The LAT story hints at those limitations by noting that EPCA-2 must still undergo further testing, but only barely.

I don’t mean to pick on the WaPo and the LAT or their reporters, who are generally much better than this. Mostly, I think, it points up some of the difficulty today’s medical reporters have interpreting biomarker-test findings, which can look a whole lot more clear-cut than they actually are. Absent an understanding of the statistics and trial-design issues here, the simple fact that a new test appears to outperform an older one can look a lot like important news, even when it’s really not.

I also don’t have any particular beef with this way the Johns Hopkins folks designed and conducted this study. Large prospective trials are great, but they’re also expensive and time-consuming, so it makes perfect sense to take a rough cut at the question you’re hoping to answer with a faster but more limited trial. It’s just that you can’t read too much into the results when you do that.

I can’t help but wonder, though, why this story got picked up by so many major newspapers, as Urology isn’t exactly in the top tier of medical journals that reporters keep an eye on. (Check out this recent table of contents and you’ll understand why.) Although I don’t know for certain, I suspect that the Johns Hopkins PR office gave the story a big push, and that the university’s institutional authority lent it credibility so far as reporters were concerned. Check out the university’s press release on the subject and marvel at the similarities between its opening paragraph –

EPCA-2 testing curtails unnecessary biopsies and can differentiate disease that has spread outside the prostate from cancer within the prostate, Hopkins team says.

– and the lead of Brown’s WaPo story.

Onconome, of course, might have pushed for press coverage as well — and of course it put out its own press release on the findings — but as a tiny biotech no one’s ever heard of, it would have faced an uphill battle getting attention at a major newspaper.

Just to be clear, I’d be more than happy to see EPCA-2 eventually proven as a new and more reliable cancer diagnostic. (Just about anyone with a prostate — or who cares about someone who has one — probably feels the same.) It just makes no sense to let our hopes run way ahead of the science, or for the media to do likewise.