Bioinformatics: sequences and stuff

Oct 03, 2014 15:21

Pinning down exactly where bioinformatics got its start is tricky business--you could make a good argument that it goes back a century or so, to Fisher's pioneering work in population genetics--but in the modern sense, it mainly goes back to gene sequencing, which started in the 1970s and has been happening faster ever since. Think about how much faster the computer you're reading this on is than its equivalent from forty years ago, and then consider that our ability to read DNA sequences has grown even faster than that.

For a long time, from that early work to the time I started studying the subject around the turn of the century (get off my lawn!) the focus was on DNA sequences: first of specific genes, then of entire genomes. The genome can be defined as the complete set of genes an organism has, although that definition gets blurry sometimes as I'll discuss later. Phrases like "the human genome" refer to the consensus sequence for the species as a whole. You have your own genome, and I have mine, and mostly they're identical; in the places where they differ, they do so in ways that can be categorized and statistically described. The original Human Genome Project obtained the genomes of five people, and created a consensus sequence from that; the 1000 Genomes Project, as the name implies, is considerably more ambitious (and they're well over that original thousand now). The more samples we have to build a consensus sequence from, the more we know about what individual sequence variations mean.

Applications of this technology include "DNA fingerprinting," since everyone except identical twins has their own unique sequence; evolutionary biology, since we share most of our DNA sequence not only with our fellow humans but also (in decreasing amounts) with monkeys, mice, ravens*, salamanders, fruit flies, and brewer's yeast, and a good chunk of it with mushrooms and bananas and our own gut bacteria; and understanding the genetic basis of heritable traits, including most obviously diseases like diabetes, schizophrenia, and susceptibility to various cancers. This is by no means an exhaustive list! The more genomic data we have, the more things we find to do with it. In my own field of research, we look for alleles (which are, remember, sequence variations in certain genes) that occur frequently in populations that have lived at high altitude for a long time in the Andes, the Himalayas, and the Ethiopian highlands, and compare the frequency of those alleles to those observed in populations that live closer to sea level (which is almost everyone else in the world--even if you live at high altitude, your ancestors probably didn't). We're looking for sequence variations that confer resistance against hypoxia ... and that's just the start.

But the DNA sequence itself, the familiar combination of adenine (A), cytosine (C), guanine (G), and thymine (T), only explains a part of the variation we see in heritable traits. A "gene" is more than just a stretch of DNA that codes for proteins. It's maybe best seen as a unit of inheritance, which includes the sequence of the coding portion, the sequences outside the coding portion that regulate transcription of the gene, the proteins (histones) that the DNA strand coils around like thread around a spool, and chemical modifications to the individual DNA bases that can affect their function. All of this gets lumped under the general category of epigenetics: things that are "around the genome." Really, though, we're learning that we need to expand our definition of genes and genomes to encompass everything we inherit from our parents and grandparents and great-grandparents ... and that we have all inherited from our common ancestors, the common threads that make up the tapestry of life.

Genes are pretty nifty things even when they're just sitting there, but when they start actually doing stuff, they get much more interesting. Next time I'll talk more about regulation, which is pretty much at the heart of my research.

*You knew I'd find a way to talk about dinosaurs here somehow, right? Of course you did.

Other entries in the series here.

bioinformatics, science

Previous post Next post
Up