How computational approaches are carving out new ways of understanding biological networks.
Colleen Smith
Computational Biologist Chad Myers applies his expertise in leading edge techniques in computer science to the latest genome-scale technologies in order to understand the genetic architecture of life.
Despite an overabundance of genomic data, researchers remain stumped by many basic questions about how genes interact with each other and the environment to create such astonishingly diverse organisms — and to present such perplexing and difficult obstacles to curing disease states.
Myers’ lab tackles these fundamental challenges by developing new algorithms and computational tools to both map genetic interactions and understand what those interactions mean in model organisms like yeast and worms all the way across the tree of life to humans.
How do you define a genetic interaction?
“If you make mutations in two genes at the same time, and there is an interesting effect on the organism that you couldn’t predict, that’s called a genetic interaction (GI).
It’s like you’re flipping switches in the organism in a precise way, turning off certain components one by one.
The interesting thing is that flipping most of those switches off individually doesn’t have an effect. It’s been 15 years since researchers first knocked out every single one of yeast’s 6,000 genes. They found that yeast only needs 20% of those genes to grow.”
Does that mean most of yeasts’ genes are redundant?
“There are relatively few genes that are redundant in sequence, but they’re functionally redundant. By flipping combinations of switches, you start to see that you need one of two systems, but you don’t need both of them. Our lab has been heavily involved in mapping those types of interactions.”
What types of questions can you answer by mapping out GIs?
“There’s been a decade and a half of mapping projects in different contexts, and this yeast project is one of those. Everyone decided, since we can measure this on a large scale, let’s just measure it, and understand how much noise is in our measurements, with the expectation that it should be valuable. It was kind of like, build it, and they will come.”
After almost ten years, we’ve now measured interactions for about 15 million combinations of genes in yeast, but what exactly does all this data tell us? Well, we’ve learned quite a bit and much of it generalizes to other species.
For example, even though there are only ~1000 ways to kill a yeast cell by deleting single genes, there are ~100,000 ways to kill a yeast cell by introducing combinations of two mutations. This has major implications beyond yeast because it reveals how complex genetic networks are, and helps to put a measure on the expected complexity of these networks.
By mapping GIs on the scale we have in yeast, we now understand many of the rules so that we can map them more efficiently in other species. For instance, genes that tend to have large numbers of GIs also tend to be enriched for certain functional roles, evolve more slowly, and be more conserved across the tree of life.”
What’s the end goal?
“Computational techniques play big roles in the modern era of data-driven biology. First, they are very important in simply collecting reliable data, which often involves a number of computational steps to process and transform the data. After that, ultimately, what we want to do is squeeze some kind of information with predictive value out of the collected data, where again, computational methods are critical. For example, the goal of techniques like machine learning and data mining is to take a huge amount of data and distill it down
to the important nuggets.”
What types of data are computational biologists trying to distill right now?
“The biggest developments in data are driven by genome sequencing technologies, which have been applied in hundreds of ways to measure various aspects of molecular biology in recent years. There are already several thousand species that have their complete genomes sequenced. In humans, for example, there are probably at least a half million genome sequence profiles from a variety of different normal or diseased tissues, many of them public. Despite having all of this data, we still don’t know what half of the genes do, even when they function normally.”
The new way to measure gene expression quantitatively is RNAseq, a sequencing based technique that provides a snapshot of which genes are expressed in which tissues. That data is also huge, with thousands of different expression profiles available for humans or any model organisms, including plants, like maize. This data is really powerful if you want to understand how genes drive the development of tissues, or which genes differentiate between this phenotype or that phenotype.
A lot of questions in biology are related to which genes work together to accomplish things, or which genes are involved in common processes that support life. So, a type of data my lab cares a lot about are network data that can tell us which genes are related. Gene expression information, especially when it’s collected across a variety of different tissues or individuals can be a valuable source for extracting relationships among genes.”
How do you visualize network data?
“Picture a number of nodes, with lines between them. The nodes are genes, and an edge means that when I delete these two genes in combination, it kills the cell.
What’s interesting is that in most cases where that happens, it happens within a larger structure, where there are multiple interactions crossing the two pathways. It’s actually a very highly structured bundle of genes.
It’s hard to look for individual edges because computationally, it’s difficult or impossible to interpret these data at the level of the individual gene pair, but when you integrate the data with our existing knowledge of pathways, the structure emerges. We’ve designed algorithms to find these local structures in the data.”
Does this only work in yeast?
“No, this is not yeast specific — we’ve applied this computational approach in
the context of human Genome-Wide Association Studies and found really interesting results.
For example, you might look at two patients and think they have nothing in common because they have mutations in totally different sets of genes. But if you take a step back and say, wait a second, these different mutations are actually affecting the same two pathways, now you can group together patients that otherwise look different. It’s a heterogeneity problem. There are a lot of different paths to the same exact phenotype.”
Is that only true of disease states?
“A fundamental principle is that biological systems are modular. Very rarely do genes work in isolation. They work as part of larger sets of genes that together accomplish something.”
So, these techniques are enabling you to address deeply cross-cutting questions?
“Yes, we can solve one problem here, but by doing so we might also solve a thousand problems like it in different contexts. We might think about maize today, and how we might improve our ability to feed the world, and then tomorrow think about how to treat cancer. A lot of the same concepts are being used and reapplied across diverse biological settings where the computational approach is the bridge.”