======================================================================
=                             Base pair                              =
======================================================================

                             Introduction
======================================================================
A base pair (bp) is a unit consisting of two nucleobases bound to each
other by hydrogen bonds.  They form the building blocks of the DNA
double helix and contribute to the folded structure of both DNA and
RNA. Dictated by specific hydrogen bonding patterns, Watson-Crick base
pairs (guanine-cytosine and adenine-thymine) allow the DNA helix to
maintain a regular helical structure that is subtly dependent on its
nucleotide sequence. The complementary nature of this based-paired
structure provides a redundant copy of the genetic information encoded
within each strand of DNA. The regular structure and data redundancy
provided by the DNA double helix make DNA well suited to the storage
of genetic information, while base-pairing between DNA and incoming
nucleotides provides the mechanism through which DNA polymerase
replicates DNA and RNA polymerase transcribes DNA into RNA. Many
DNA-binding proteins can recognize specific base-pairing patterns that
identify particular regulatory regions of genes.

Intramolecular base pairs can occur within single-stranded nucleic
acids. This is particularly important in RNA molecules (e.g., transfer
RNA), where Watson-Crick base pairs (guanine-cytosine and
adenine-uracil) permit the formation of short double-stranded helices,
and a wide variety of non-Watson-Crick interactions (e.g., G-U or A-A)
allow RNAs to fold into a vast range of specific three-dimensional
structures. In addition, base-pairing between transfer RNA (tRNA) and
messenger RNA (mRNA) forms the basis for the molecular recognition
events that result in the nucleotide sequence of mRNA becoming
translated into the amino acid sequence of proteins via the genetic
code.

The size of an individual gene or an organism's entire genome is often
measured in base pairs because DNA is usually double-stranded. Hence,
the number of total base pairs is equal to the number of nucleotides
in one of the strands (with the exception of non-coding
single-stranded regions of telomeres). The haploid human genome (23
chromosomes) is estimated to be about 3.2 billion bases long and to
contain 20,000-25,000 distinct protein-coding genes. A kilobase (kb)
is a unit of measurement in molecular biology equal to 1000 base pairs
of DNA or RNA. The total amount of related DNA base pairs on Earth is
estimated at 5.0 and weighs 50 billion tonnes. In comparison, the
total mass of the biosphere has been estimated to be as much as 4 TtC
(trillion tons of carbon).


                    Hydrogen bonding and stability
======================================================================
|282px
|282px
Top, a G.C base pair with three hydrogen bonds. Bottom, an A.T base
pair with two hydrogen bonds. Non-covalent hydrogen bonds between the
bases are shown as dashed lines. The wiggly lines stand for the
connection to the pentose sugar and point in the direction of the
minor groove.

Hydrogen bonding is the chemical interaction that underlies the
base-pairing rules described above. Appropriate geometrical
correspondence of hydrogen bond donors and acceptors allows only the
"right" pairs to form stably. DNA with high GC-content is more stable
than DNA with low GC-content. But, contrary to popular belief, the
hydrogen bonds do not stabilize the DNA significantly; stabilization
is mainly due to stacking interactions.

The smaller nucleobases, adenine and guanine, are members of a class
of double-ringed chemical structures called purines; the smaller
nucleobases, cytosine and thymine (and uracil), are members of a class
of single-ringed chemical structures called pyrimidines. Purines are
complementary only with pyrimidines: pyrimidine-pyrimidine pairings
are energetically unfavorable because the molecules are too far apart
for hydrogen bonding to be established; purine-purine pairings are
energetically unfavorable because the molecules are too close, leading
to overlap repulsion. Purine-pyrimidine base-pairing of AT or GC or UA
(in RNA) results in proper duplex structure. The only other
purine-pyrimidine pairings would be AC and GT and UG (in RNA); these
pairings are mismatches because the patterns of hydrogen donors and
acceptors do not correspond. The GU pairing, with two hydrogen bonds,
does occur fairly often in RNA (see wobble base pair).

Paired DNA and RNA molecules are comparatively stable at room
temperature, but the two nucleotide strands will separate above a
melting point that is determined by the length of the molecules, the
extent of mispairing (if any), and the GC content. Higher GC content
results in higher melting temperatures; it is, therefore, unsurprising
that the genomes of extremophile organisms such as 'Thermus
thermophilus' are particularly GC-rich. On the converse, regions of a