In a recent study conducted in collaboration with Calico Life Sciences, Google researchers built a “genome-wide” machine learning model for the regulation of gene expression — the process by which information from a gene is used to create functional protein or RNA — in a species of yeast. While the work focused on yeast, it could be applicable to humans because it reveals how genes work together as a system, a core and only partially understood piece of the microbiological puzzle.
As the team explains in a technical paper and a blog post, yeast — which are single-celled organisms — grow old and die after budding (i.e., producing almost genetically identical offspring) 30 times. Budding produces “scars” on yeast cells that are visible under a powerful microscope, making it possible to determine the age of a cell from its appearance.
Leveraging this, Google Research’s Ted Baltz and team trained a model on a yeast growth data set produced by Calico, which contained the results of over 200 experiments on different yeast strains. In the course of each experiment, a single gene within the strains was activated and the expression levels of 6,000 genes were measured eight times over 90 minutes, yielding a total of almost 20 million individual measurements.
The Google researchers’ approach was to model the whole data set as a system of differential equations, such that the rate of change of the expression of a gene was proportional to a weighted sum of the expression levels of all genes. Baltz reports that in the end, the work amounted to more than 50 million regularization paths, which informed predictions about which genes would code for regulators (i.e., genes involved in controlling the expression of one or more other genes).
To verify the model’s predictions, the researchers tested it against a validation data set comprising 10 new yeast strains. They report that three out of 10 predictions held up in experiments, including one gene that hadn’t previously been identified by scientists.
“Based on exhaustive experiments, we built a genome-wide model for the regulation of gene expression in [yeast] and verified some of the results experimentally, enabling future investigations into less well understood biological systems,” wrote Baltz. “Our model was able to identify these without prior biological knowledge, demonstrating that these [machine learning] techniques might scale to other domains or organisms that are much less well studied.”
Google’s work in AI and gene expression follows the publication of a study describing a “massively parallel reporter assay” (MPRA),” a framework designed to investigate DNA. The researchers claimed it could be used to create AI models that predict gene regulation for industrial and life science applications. An older work proposes a unified AI architecture to model and interpret how chromatin, a complex of DNA and protein found in eukaryotic cells, controls gene regulation.