deCODEme and its questionable disease-risk predictions

(UPDATED: Original final paragraphs on 23andMe broken out as a separate post here.)

deCODEme logoA few days ago, I noted that deCODEme, the personal-genomics spinoff of Iceland’s deCODE Genetics, looks to be offering disease-risk predictions based on surprisingly thin evidence. I looked into it a little more deeply, and while I’m not a geneticist or even a close approximation thereof, I’m still a little taken aback by how little deCODEme currently seems to be flying on where many of these conditions are concerned

To recap for a second, deCODEme — like the much better-publicized 23andMe (more on them in a moment) — offers a service for an “introductory” price of $985 that scans customer genomes in a million or so specific locations to yield a rough sense of their genetic inheritance and its potential influence on their health and physical characteristics. Using gene-chip technology, the company looks specifically for individual DNA “letters,” or nucleotides, that are known to vary between individuals. These single-letter variations, technically called single-nucleotide polymorphisms, or SNPs, essentially mark genes or other stretches of DNA whose altered function can contribute to (or protect against) disease or determine physical characteristics such as eye color.

deCODEme provides its customers with an analysis of SNPs that have been linked to 18 diseases, calculating a risk summary that compares an individual’s odds of getting sick to those for the population — well, a population — at large. The trouble, as we noted earlier, is that in many cases deCODEme bases this risk assessment on just one or two SNPs, when most diseases are thought to be influenced by tens or hundreds of different genes. That means the disease risks deCODEme calculates are very likely to be wildly inaccurate — potentially a serious state of affairs for the folks paying roughly $1,000 for this very analysis, even if deCODEme is careful to caution its users not to rely on the data as medical information. (Exactly what other use it might be isn’t entirely clear to me.)

Since I didn’t originally go through every one of the 18 diseases deCODEme analyzes, I decided take a closer look at the scientific foundation for the company’s risk assessments. It turns out that for fully half of those conditions, including colon cancer and heart attack, deCODEme is relying on just one or two SNPs to calculate disease risk. Risk for three conditions — Alzheimer’s disease, asthma and obesity — is based on a single SNP. (I’ve put together a chart listing the number of SNPs used to assess risk in all these conditions below the fold.)

In several instances, the very scientific publications that deCODEme uses to justify the use of one SNP also provide evidence for others that deCODEme, for some reason, has so far chosen to overlook. In Alzheimer’s disease, for instance, deCODEme cites this publication in support of its choice of a SNP called rs4420638, which appears to affect the gene that produces apolipoprotein E, or ApoE, a protein linked to Alzheimer’s susceptibility. The same study, however, lists four additional SNPs, all meeting criteria of statistical significance.

In heart attack, deCODEme relies upon this New England Journal of Medicine study to implicate a SNP known as rs599839. The company, however, overlooks thirteen other SNPs linked to heart disease in the same study, including one called rs1333049 that carried “the strongest association with coronary artery disease” in two separate studies involving almost 7,400 patients. Of course, deCODEme doesn’t seem to explain why anywhere on its Web site.

Similarly, the study deCODEme cites to support its use of one of two SNPs in colon cancer notes explicitly that “[m]uch of the variation in inherited risk of colorectal cancer (CRC) is probably due to combinations of common low risk variants” — which, translated into English, essentially means that the genetic risk of colon cancer is most likely spread across a large number of common genetic variants, each of which increases risk of the disease by a small amount. Yet deCODEme uses only two SNPs to assess its customers’ risk of colon cancer, and outside of some boilerplate language, mostly leaves it to individuals themselves to interpret what the service is telling them. (The company does make “experts” available to answer questions, although unsurprisingly that feature isn’t available to demo users.)

To be fair, the whole field of genetic disease analysis is still an imperfect science, not to mention a work in progress. And there are some conditions — both types of diabetes and Crohn’s disease, in particular — for which deCODEme bases its calculations on eight or more SNPs, which at least should give a fuller picture of the situation. That said, though, at the moment the site looks very much like it was thrown up in a hurry (it launched just a few days before 23andMe), which may explain the “introductory” pricing and the, well, introductory level of service here.

A chart listing the number of SNPs deCODEme uses for each disease-risk calculation follows after the jump.

NOTE: Links are to the deCODEme “scientific details” page for each condition. Since these pages are maintained inside the company’s “demo user” account, you’ll first have to activate that account by clicking here for the links to work.

One SNP
Alzheimer’s disease
Asthma
Obesity

Two SNPs
Age-related macular degeneration (AMD)
Atrial fibrillation
Celiac disease
Colorectal cancer
Glaucoma
Myocardial infarction (heart attack)

Three SNPs
Multiple sclerosis
Psoriasis

Four SNPs
Restless legs syndrome

Five SNPs
Prostate cancer

Six SNPs
Rheumatoid arthritis

Seven SNPs
Breast cancer

Eight SNPs
Type 2 diabetes

Ten SNPs
Type 1 diabetes

Twelve SNPs
Crohn’s disease

Next Story: Job site NotchUp thinks companies should pay applicants for interviews
Previous Story: The Midas List — deconstructed

Bookmark and Share

Tags: , , , , ,

Photo of David P. Hamilton

About the Author, David P. Hamilton

David Hamilton has been writing for VentureBeat LifeScience since April 2007. He formerly spent 14 years as a reporter for the Wall Street Journal in its San Francisco and Tokyo bureaus. Prior to that, he spent several years as a reporter at Science Magazine and as a reporter/researcher for the New Republic, both in Washington.

  • David,
    Thank you for this well written post. Your investigative work is notable. This is what I have been shouting about for the last 2 months! This is why I am getting calls from the "Other" SNP service. I will make you an honorary Sherpa! Great Post!
    -Steve
    www.thegenesherpa.blogspot.com
    www.helixhealth.org
  • Ann Turner
    David,

    I also thank you for your thought-provoking posts.

    I may be able to answer one of your questions from my Sunday morning arm-chair position. You asked why rs1333049 was not included in the deCODEme report. DeCODEme uses the Illumina BeadChip array, and that particular SNP is not included. (23andMe also uses Illumina, with a smaller set of SNPs). The NEJM article used an array from Illumina's competitor, Affymetrix, which has selected a different (but partially overlapping)set of SNPs for its product.

    It's quite possible that the Illumina set has a different SNP that correlates strongly with the Affymetrix SNP. Neither SNP need be causally related to the medical effect -- they just happened to be in the neighborhood of the actual location where one person had the critical mutation, and they go along for the ride in that person's descendants.

    Both deCODEme and 23andMe are being fairly conservative in their interpretive reports, since many initial "discoveries" turn out to be false positives with attempts to replicate them in different populations. The Illumina FAQ alludes briefly to their criteria; 23andME has a more detailed white paper about the issue.

    https://www.23andmeobjects.com/res/1926/pdf/23-...
  • Dear David P. Hamilton,

    We read your recent review of deCODEme.com, posted in VentureBeat on Jan 23, 2008, with great interest. We noted that you had some concerns that the disease risk modeling provided in the deCODEme service was based on a limited number of genetic variants (SNPs), even though as you put it “… most diseases are thought to be influenced by tens or hundreds of different genes.” As examples, you specifically mention Alzheimer’s disease and heart attack, citing two references that seem to report more SNPs than are used to predict disease risk in the deCODEme service.

    Obviously, the opinion of someone like you is valuable, as it can help us to further improve the quality of our service and the information we provide about that service. However, we would like to point out that your concerns, although clearly well-intentioned, are unfounded. We hope the following explanation will shed some light on this matter.

    It is hypothesized (and very likely true) that the risk of developing any one common disease may be affected by numerous genetic variants, most of which are presently unidentified. Obviously, genetic variants cannot be used to estimate disease risk until they are discovered. This is a limitation faced equally by deCODEme and its competitors. Even though not all genetic risk variants have been discovered, there is considerable value and predictive power in the risk estimates provided by deCODEme based on the current set of identified and verified disease associated genetic variants. When deCODEme reports the relative genetic risk, it is assumed that the impact of the still undiscovered or unconfirmed variants is the same for every person. This is equivalent to saying; if you don’t know a person’s cholesterol level, family history, or other currently known risk factors for heart attack, then your best estimate for his risk is the population’s average risk for heart attack.

    It is encouraging to note that, in many cases, the genetic risk variants that have already been discovered and are used in the deCODEme service will be those that contribute the greatest risk of the disease in the population – because these tend to be the easiest variants to detect. A good example of this is the variant in the TCF7L2 gene associated to type 2 diabetes (discovered by deCODE genetics in 2005), which is likely to be the single most important genetic risk factor in this disease in most populations. Indeed, many of the new genetic risk variants are being discovered by the scientists at deCODE genetics (www.decode.com), some of whom are involved in bringing such new discoveries to the public through the deCODEme service.

    It is imperative to note that deCODEme only reports risk based on well validated genetic variants (SNPs). Not only does deCODEme require that the association between genetic variant and a disease is truly statistically significant, it also requires that the association has been replicated in at least two independent studies. To include risk estimates based on unverified variants (i.e. those based on marginal evidence) is not only questionable from the point of view of our customers, it is scientifically unsound.

    In some cases, variants with a verified disease association are excluded from the genetic risk estimates in deCODEme service. This is done when multiple variants from the same chromosomal region are strongly correlated and therefore redundant. In such instances deCODEme uses the minimum number of SNPs that capture all the risk conferred by the full set of correlated SNPs. In this case no information about genetic risk is lost, even though some variants are not used in the risk prediction. When there is redundancy due to correlation between SNPs, quantity does not translate into quality! Thus, it is not the case, as you stated, that deCODEme “overlooks 13 other SNPs linked to heart disease in the same study”. Rather, some SNPs are excluded either because they are redundant and covered by other SNPs that are included in the risk estimate or they cannot be used because they are unverified. Significantly, the genetic variants that are used both by deCODEme and others to assess risk of the disease were discovered by scientists at deCODE genetics. These same scientists used their specialist knowledge to select the most informative subset of SNPs to estimate the genetic risk of heart attack for the deCODEme service. You can rest assured that they did a good job.

    In relation to Alzheimer’s disease, you state that deCODEme overlooks four variants that meet statistical criteria according to a paper cited in relation to the well established apolipoprotein E (APOE) variant. In fact, these other SNPs must be classified as unverified. They have only nominal significance based on a specific genetic model, such that the authors themselves point out that the effect is weak (p-values of 0.04 to 0.001) and that further evaluation is needed. In comparison, the disease association of the APOE variant is beyond any criticism. Indeed, it is the most cited and significant (p-value of 2.0×10E-44) association to a common human disease. As previously explained, it would be scientifically unsound and irresponsible to jump the gun by including unverified genetic variants in disease risk assessments.

    Given your obvious interest in the number of SNPs used to estimate the genetic risk of diseases, we were somewhat disappointed to note that your review did not mention the fact that for most of the diseases, deCODEme uses more SNPs than the competitor 23andMe (which you seem to favour). Moreover, as the deCODEme service is based on over 1 million SNPs, compared to only about 650 thousand measured by 23andme, it is considerably more likely that future genetic discoveries will be efficiently covered by the deCODEme service than by that of this competitor.

    Furthermore, deCODE genetics has contributed more than any other institution in the world to the recent surge of discoveries of genetic variants conferring risk of common diseases (www.decode.com/publications). Hence, when deCODE genetics scientists convert these discoveries into components of the deCODEme service, we are exploiting our core expertise and unique position in this scientific field. We are confident that we know what we are doing, but we welcome constructive criticism, because we are eager to do even better.

    The deCODEme team
  • chris peri
    After reading Mr. Hamilton's report was a bit amazed on the amateurism this company (decodeme) uses to identify such significant and life changing results. After reading the comment from the deCODEme team though i think that Mr. Hamilton was unfair to them and i also think he is a bit biased against them. Whether it is it conflict of interests or anything else, i don't know, but he conveniently forgot to mention the rival's methods are even less satisfactory than deCODEme's. I even tried myself to find the methods t23andme uses and how many SNIPs they identify but there site doesn't really help you to that direction (even though it is a bit more user friendly than deCODEme.com).
    Anyway this whole story can only lead to a good think but untill more companies get in the game, things will be a bit cloudy.