Molecular evolution

Changes in the base pair sequences in DNA or RNA molecules and changes in amino acid sequences and their molecular configuration in different proteins, from generation to generation are known as molecular evolution.

It is possible to measure differences between these molecules obtained from different organisms (such as humans, apes, monkeys, prosimians etc.) on a unit scale of amino acids or nucleotides and demonstrate their relationships. As the molecular sequences are heritable, their variations produce molecular records that have been transferred from generation to generation during evolution.

A triplet made of three pairs of nucleotides is called a codon. A codon will change if one of the three bases changes and it may or may not end up in a change in the aminoacid synthesized by it. Majority of these changes are small and inconsequential but accumulate over long periods to bring about large alterations in the gene frequencies in populations. Two kinds of such changes are possible:

Silent site substitution: These are such changes in DNA sequences which do not result in any change in aminoacid synthesis and hence composition of proteins is not changed. They are usually changes in the last base pair of the codon. For example in mRNA strand GCA codes for alanine and if adenine is replaced by guanine, the resulting GCG will still code for the same aminoacid alanine. Silent site substitutions do not bring about any phenotypic changes.

 Replacement substitution:They are changes in the bases of codons that result in synthesis of new aminoacids and are capable of altering the structure of proteins that are controlled by them and thus changing the phenotype.

Silent site substitutions have much higher rate of change as compared to the replacement substitutions, since the former do not produce changes that can be exposed to natural selection but the latter do. For the same reason genes which are less vital to the cell can undergo rapid changes by replacement substitution without showing harmful effects. Pseudogenes, which are duplicated sequences of bases and do not code for proteins and hence are not exposed to natural selection, are known to undergo higher rate of evolutionary changes.

Sequencing amino acids: Comparing amino acid sequences in a protein in different species by using biochemical techniques is one of the most popular methods to determine phylogeny. For example, in haemoglobin two pairs of alpha and beta sequences of polypeptide chains form a tetramer that can be distinguished by different amino acid sequences in different species. In vertebrates different types of globin chains appeared during evolution and in each species they followed their own evolutionary path by changes in the amino acid sequences. They are all variations of a single globin ancestor that is controlled by similar globin genes which are believed to have originated by gene duplication of the original type. 

Neutral Theory Of Molecular Evolution

Motoo Kimura (1986) proposed that a vast majority of base substitutions that are preserved in a population are neutral with regards to natural selection. Positive substitutions are so rare that they are inconsequential in molecular evolution, while negative changes are quickly eliminated by natural selection. Natural selection seems to favour neutral changes which determine the overall rate of sequential evolution. For instance, pseudogenes have the highest substitution rate among the genes but the changes are completely neutral with regard to selection.

The theory was tested by J. McDonald and M. Kreitman (1991) by comparing base sequences of alcohol dehydrogenase gene of Drosophila melanogaster, D. simulans and D. yakuba.

Kimura’s theory not only contradicts classical Darwinism but also does not explain fixation of various types of alleles in different sizes of population. The theory holds that the rate of fixation of neutral mutations does not depend on population size but the genes are fixed or eliminated by genetic drift.

The neutral theory provides theoretical framework for testing and predicting molecular evolution in the absence of positive selection.

The Molecular Clock

E. Zuckerkandl and Linus Pauling (1962) found that the rate of divergence in aminoacid sequences of haemoglobin and cytochrome c in different species-pairs in mammals always remained constant. In other words, changes in the base sequences of DNA and the resultant aminoacid substitution accumulates in a population in a clock-like regularity over a period of time and hence could be used to date branching evolution or cladogenetic events. This was called molecular clock, which before putting to use needs to be calibrated by matching observed genetic divergence of the living population with the absolute time of divergence as revealed by the fossil records. Differentiation in haemolymph proteins of Hawaiian drosophilids gave an idea of splitting of phyletic lines and colonization of Hawaiian Islands by these flies about 40 million years ago from North America.

 Sarich & Wilson (1967) used this method to find out the divergence of hominids from apes by calibrating the amount of molecular differentiation achieved between the two groups in relation to time by taking the example of divergence of Old World and New World monkeys. The measurement of divergence in albumin gave the time of split of hominids and apes at 5 million years from present, which is supported by other evidences.

  The following three types of changes are considered for molecular clock.

  •  Those base sequences in which substitution has taken place in the third position of codon, since they are expected to be neutral.
  • Changes in pseudogenes are not exposed to natural selection and hence are likely to give better results.
  • When natural selection is very strong, only neutral substitutions are likely to be fixed in a population and should only be considered for molecular clock.

Changes in Mitochondrial DNA are linear and constant like clock ticking and hence are commonly used in molecular clock. In mtDNA of mammals the divergence is about 2% per million years. In sea urchins this rate of divergence is estimated to be at 1.8-2.2%, which is remarkably similar to mammals. In sharks that have reliable molecular clock owing to well-documented fossil records, the rate of change is estimated to be 8 times slower than in mammals. Therefore, there is no universal molecular clock even for mtDNA of animals.


In spite of being useful there are serious drawbacks in the use of molecular clock. Substitution rates are variable among genes in different species due to differences in generation time and rate of mutation. The same molecule may evolve at different rates in different evolutionary lines. Also, regulatory genes, introns, transposons and gene families may demonstrate considerable deviations in their rates of divergence. Therefore, the clock might work in a particular lineage but not work in others or might work at different rates in different lineages.


Duplicate genes are produced by irregular or unequal crossing over and produce similar phenotypic expression but without cumulative effect. When duplicated genes diverge slightly in their function, they form gene families that become important sources of variations. Genes of a family share the following characteristics:

  • They originate by duplication of an existing gene due to unequal crossing over.
  • They show structural homology with one another.
  • They produce distinct effect but it is related to the ancestral gene.

The human haemogobin gene cluster is called gene family or multigene family that consists of two alpha duplicate tandem genes and 7 beta genes. Globin genes in mammals form an excellent example of gene families.  Immunoglobin gene family is a branching lineage of duplicated genes and T-cell receptor genes that produce a very specific reaction against a huge diversity of viruses and bacteria that invade our body.

Ribosomal rRNA, tRNA and histone controlling genes are also examples of gene families but their coding sequences are identical and they produce the same effect. This is because of the fact that their product is required in large quantities in the cell. 

Genes coding for histones exist in tandem clusters of over 100 copies each in humans and up to 1000 copies in sea urchins. They all line up in sequence along the chromosome and form gene families. The gene ancestral to the modern haemoglobin genes is believed to have duplicated about 350 million years ago. 

Unequal crossing over is probably the primary source of gene families and they are known to evolve together in concert. Gene conversion is another source of gene families and it occurs between homologous chromatids when cross-over products are being repaired.