Never mind the length, feel the bandwidth

Mar 22, 2011 12:24

A couple of days ago I saw a tweet from @TomCadwallader, claiming that 1 sperm has 37.5MB of DNA information in it. That means a normal ejaculation represents a data transfer of 1587GB in about 3 seconds.
That's cute, but there's a more interesting question lurking here: it may represent 1.5TB of data, but how much information is transferred? In ( Read more... )

beware the geek, maths, biology, sex

Leave a comment

Comments 10

(The comment has been removed)

pozorvlak March 22 2011, 13:53:25 UTC
The human mutation rate is aroung 2.5 * 10^-8 per base pair per generation; at roughly 3*10^9 base pairs per sperm, I'd expect 75 per sperm. You can fit the location of each mutation into 32 bits, plus two bits to store whatever it's mutated to: that gives 75*34 = 2550 bits per sperm, or 0.3kB per sperm. Much more than the "choose your chromosome" data, but still less than the densely-represented homologous recombination data. That brings us up to 3.3TB :-)

Imprinting I know from nothing. To the researchmobile!

Reply

pozorvlak March 22 2011, 14:36:24 UTC
From my reading of Wikipedia's article on genomic imprinting, it works by setting an im_the_daddy bit (in the form of a methyl group?) on somewhere between 80 and 3000 genes.But since the affected loci are the same for all sperms in the sample, we can just set the im_the_daddy bit once in the header, so it only adds one bit to the size of the compressed data :-)

On the other hand, I'm sure that other epigenetic mechanisms are more data-intensive.

Reply

(The comment has been removed)


atreic March 22 2011, 13:24:23 UTC
That's interesting...

755MB if Y, or 730MB if X - did you mean that the other way round?

Reply

pozorvlak March 22 2011, 13:44:23 UTC
I did, you're right. Fixed now. Thanks!

Reply


wormwood_pearl March 22 2011, 13:39:42 UTC
> [Also, three seconds? Either Tom's ignoring the time required for the initial protocol handshake, or he's doing it very, very wrong.]

*Snrk*

Reply

necaris March 22 2011, 20:35:56 UTC
I love that euphemism :-)

Reply


(The comment has been removed)

pozorvlak March 22 2011, 17:00:42 UTC
If your definitions are in a twist, then so are mine :-) I would indeed expect long repetitive sequences to reduce the entropy, which is why the entropy is 1.7 bits per base pair rather than 2. If you're saying that you're surprised it's that high, then obviously there aren't as many long repetitive sequences as you'd expected. Or Wikipedia's sources aren't reliable.

Reply


Gross overestimate ext_344306 March 23 2011, 09:45:46 UTC
Hmm, never mind the handshake, what about the 9 months (or 70 years?) it takes to decode the information? And that's for a properly formatted and syntactically viable sperm. Most aren't, and never get decoded. So, the data rate may be huge, but the transmission losses are pretty close to 100%.

Reply

Re: Gross overestimate pozorvlak March 23 2011, 11:01:56 UTC
I don't think decoding time should be accounted for in a bandwidth calculation (or do you count the time to play a video file as part of the time to transfer it?), but your point about transmission losses is well taken. At best you get 750MB/3TB = 0.02% data transferred :-)

Reply


Leave a comment

Up