Genetics roundup: The secrets of DNA, peering into Watson's genome -- and yawning, and more

What secrets lurk in the heart of DNA? -- Didn't the Human Genome Project answer that question? Think again. Last week, scientists involved in a project called Encode reported on a detailed analysis of one percent of the genome, and the findings have undercut a host of conventional wisdom about how genes and DNA work. Most notably, the Encode study suggests that much of the genome -- the long stretches of seemingly inactive genetic "letters" surrounding functional genes, sometimes unfairly called "junk DNA" -- is actually a beehive of complex and little-understood activity.

The WaPo's Rick Weiss, one of the few mainstream reporters to tackle the consequences of the Encode report (which was published in 29 separate papers), put it this way:

The new work also overturns the conventional notion that genes are discrete packets of information arranged like beads on a thread of DNA. Instead, many genes overlap one another and share stretches of molecular code. As with phone lines that carry many voices at once, that arrangement has prompted the evolution of complex switching, splicing and silencing mechanisms -- mostly located between genes -- to sort out the interwoven messages. The new picture of the inner workings of DNA probably will require some rethinking in the search for genetic patterns that dispose people to diseases such as diabetes, cancer and heart disease, the scientists said, but ultimately the findings are likely to speed the development of ways to prevent and treat a variety of illnesses. One implication is that many, and perhaps most, genetic diseases come from errors in the DNA between genes rather than within the genes, which have been the focus of molecular medicine. Complicating the picture, it turns out that genes and the DNA sequences that regulate their activity are often far apart along the six-foot-long strands of DNA intricately packaged inside each cell. How they communicate is still largely a mystery.

The implications are preliminary but profound, since so much of today's cutting-edge medical enterprise is based on the premise that understanding genes and their variation is the key to understanding disease. Increasingly, that appears to be only part of the story -- possibly not even a particularly large part. All of which suggests that understanding the genome wasn't the beginning of the end of the quest to understand the workings of life -- just the end of the beginning, and maybe not even that.

Scanning the genome of a DNA pioneer -- Much as he must have anticipated, the news last week that James Watson, the co-discoverer of DNA, had sequenced his own genome drew a flurry of largely unenlightening media attention (see here and here for just two examples). Yes, it's interesting to know that Watson has some of the same reservations about knowing his genetic disease risks as many others -- the scientist didn't want to know the status of his apolipoprotein E gene, which can indicate the risk of Alzheimer's disease -- and yes, as the technology improves, more people are going to want their genome sequenced. As the Encode study referenced above suggests, however, sequencing the genome is probably just the start of actually understanding an individual's genetic makeup. So two cheers for Jim Watson and the low-cost genome sequencers, but no one should be under the illusion that this event is the "milestone" many have made it out to be.

Genetic-association overload -- Scientists employing the technique of "whole-genome association" recently announced that common genetic variations appear to underlie seven common diseases -- bipolar disorder, coronary artery disease, Crohn's disease, hypertension, rheumatoid arthritis, and Type 1 and Type 2 diabetes. (See the NYT piece here, although a subscription may be required). The finding was noteworthy because it suggested that several of the diseases may share common origins -- and, of course, because knowing the effect of these genetic variations might provide clues to new treatments.

This study is the latest of several that have recently turned up much more solid evidence of links between DNA variants and disease (see my earlier coverage here, here and here), and it seems safe to say that we're just at the beginning of an avalanche of such announcements. Which, of course, means, it's right about time for boredom to set in -- and fortunately Tom Goetz is on hand to deliver. Now it's time to anticipate the backlash.

Synthetic Genomics hits it big -- At least in terms of valuation. As Matt reported yesterday, the synthetic-biology startup founded by genomics pioneer Craig Venter raised an undisclosed sum of venture funding and is now valued at something close to $300 million -- according, that is, to Venter himself. Synthetic Genomics aims to create artificial microbes that could assist in the production of new clean-burning fuels -- for instance, by converting coal into natural gas.

In a separate but related effort, Venter's own research institute has been trying to determine the minimum number of genes necessary for life by systematically knocking genes out of a simple microbe. Earlier this month, a patent application from Venter's institute claimed ownership of the 381 genes that resulted from this effort. The idea here is that it should be possible to synthesize that short genome, insert it into a microbe from which the DNA has been removed, and "boot up" a largely synthetic organism. The synthetic genome would be designed so that additional genes could be easily inserted, theoretically making it an ideal platform for industrial use. The patent application, in fact, claims production of ethanol or hydrogen fuel as an initial use.

What's perhaps most striking about all this are the parallels to Venter's early attempts to lay claim to large chunks of the human genome. (Those never really worked out, but not for his lack of trying.) As science writer Carl Zimmer points out in this post, Venter's approach to synthetic biology seems to embody the same sort of land-grab mentality, by attempting to lock up the basic genes necessary for creating synthetic organisms. That stands in sharp contrast to the "open-source biology" movement, in which researchers are building publicly available "genetic toolkits" for designing and building new synthetic organisms.

In any case, it's far from clear that Venter's attempted land-grab will work any better this time around, but this could easily turn into another epic battle between "open" and "closed" technology philosophies. So make some popcorn and grab a seat.

Number of human genes finally determined? -- One of the early conundrums created by the first human-genome map was the surprisingly small number of human genes turned up by the Human Genome Project. Although some initial estimates had ranged as high as 100,000 to 150,000, the first draft of the genome put the number at 30,000 to 40,000, and that number has been falling steadily ever since. (See here for details.)

Now, an MIT computational biologist named Michele Clamp has a new bottom-line answer: 20,488 genes. As Science's Elizabeth Pennisi writes in this news story (subscription required):

Clamp compared all the human genes in a database called Ensembl with those cataloged for dog and mouse. In all, 19,209 were the real, protein-coding McCoy, 3009 had been erroneously put on the gene list, and 1177 remained ambiguous, she reported. She rated the "geneness" of these leftovers by comparing them to random stretches of DNA. Almost all made the grade with respect to a genelike proportion of the bases G and C, but not for features such as the distribution of short insertions and deletions in their sequences. Overall, 1167 were "bogus" and lacked any independent evidence that they coded for proteins, she reported. She did a similar analysis with the other gene databases, then summed the unique genes of all of them to get her final count.

Given the tremendous genome complexity that's now coming into view, the low number of human genes isn't quite the shock it once was -- but it's still nice to have an answer.

More