Data Storage and DNA
Name: Craig R. G.
Date: July 2003
I am trying to figure out how much data can be stored in the human chromosomes in
order to get a firmer grasp of evolutionary theory.
I would like some input, if you would be so kind.
Textbooks I have read have put so much stress on the number of different COMBINATIONS
supplied by DNA and by the chromosome system. They emphasize that such a system provides
an enormous diversity, and thus it is valid. Oftentimes the number of combinations is used
to explain data storage as well, but is that the case?
I have read that a nanometer of DNA holds one bit of information. The longest DNA strand you
can get is 2 meters long.
Therefore, a chromosome holds 2x10^9 bits of information, or 200
megabytes of information, thereabouts. Multiply this by 26 chromosomes (right? I am rusty
on my biology) and you get 5200 megabytes of information, max, in an embryo, right? Or would
you use a combination method? And are my figures right? Any suggestions?
From a certain standpoint, there is NO limit of the amount of information storable. This may
sound strange but if you investigate the different types of antibodies able to be generated by
a small number of genes it numbers at least to 5 billion. This is because each gene is a
stable code of bases which can then be modified in the various steps towards making the genes
products...proteins. It is rather like asking....how much information can be stored on a
hundred page symphony score. Well one might think there are only so many notes that could
be fit...but the order of the notes the various accents on the notes and even the variety of
interpretations and ways the score could be rendered into music by the orchestra is infinite.
Genes make up less than 2% of the base sequence in human DNA. There are somewhere between
30,000-40,000 genes with an average length of 3,000 bases/gene. Remember that the "non-coding
DNA" can have all sorts of information and can play a major role in gene regulation.
On a philosophical note... one might argue that bases, in and of themselves, are not
information...what they MEAN because of their sequence and what they can produce is
There are 3 x 10e9 base pairs in the human genome times two for the duplicate chromosomes
[minor number changes to get x and y]. Each base pair counts as a bit in your analogy.
So you can calculate the megabytes from that.
So the total length of the chromosomes together is about 3 meters.
Jeanine M. Durdik
Associate Professor Biological Sciences
I am not sure about a nanometer being the size of a base. But not all chromosomes are the same
size. They are numbered with #1 being the largest and the 22nd being the smallest. In females
the X is about 4x as big as the Y and there is also DNA in our mitochondria. So I would say
you cannot multiply by 26 because you are assuming the same amount of DNA on each chromosome.
Click here to return to the Molecular Biology Archives
Update: June 2012