2. Why are genomes so large?

3.2 Why are genomes so large?

Key terms defined in this section
C-value is the quantity of DNA in the genome (per haploid set of chromosomes).
Nonrepetitive DNA shows reassociation kinetics expected of unique sequences.
Repetitive DNA behaves in a reassociation reaction as though many (related or identical) sequences are present in a component, allowing any pair of complementary sequences to reassociate.

The total amount of DNA in the (haploid) genome is a characteristic of each living species known as its C-value. There is enormous variation in the range of C-values, from <106 bp for a mycoplasma to >1011 bp for some plants and amphibians.




Figure 3.1 DNA content of the haploid genome is related to the morphological complexity of lower eukaryotes, but varies extensively among the higher eukaryotes. The range of DNA values within a phylum is indicated by the shaded area.

Figure 3.1 summarizes the range of C-values found in different evolutionary phyla. There is an increase in the minimum genome size found in each group as the complexity increases. But as absolute amounts of DNA increase in the higher eukaryotes, we see some wide variations in the genome sizes within some phyla.




Figure 3.2 The minimum genome size found in each phylum increases from prokaryotes to mammals.

Plotting the minimum amount of DNA required for a member of each group suggests in Figure 3.2 that an increase in genome size is required to make more complex prokaryotes and lower eukaryotes.


Mycoplasma are the smallest prokaryotes, and have genomes only ~3 the size of a large bacteriophage. Bacteria start at ~2 106 bp. Unicellular eukaryotes (whose lifestyles may resemble the prokaryotic) get by with genomes that are also small, although larger than those of the bacteria. Being eukaryotic per se does not imply a vast increase in genome size; a yeast may have a genome size of ~1.3 107 bp, only about twice the size of the largest bacterial genomes.


A further twofold increase in genome size is adequate to support the slime mold D. discoideum, able to live in either unicellular or multicellular modes. Another increase in complexity is necessary to produce the first fully multicellular organisms; the nematode worm C. elegans has a DNA content of 8 107 bp.




Figure 3.3 The genome sizes of some common experimental animals.

We can also see the steady increase in genome size with complexity in the listing in Figure 3.3 of some of the most commonly analyzed organisms. It is necessary to increase the genome size in order to make insects, birds or amphibians, and mammals. As we climb the evolutionary tree, however, the relationship between complexity of the organism and content of DNA becomes obscure.


The C-value paradox refers to the lack of a correlation between genome size and genetic complexity. We know that genes are much larger than the sequences needed to code for proteins, because exons (coding regions may comprise only a small part of the total length of a gene. This explains why there is much more DNA than is needed to provide reading frames for all the proteins of the organism. Large parts of an interruped gene may not be concerned with coding for protein. And there may also be significant lengths of DNA between genes. So it is not possible to deduce from the overall size of the genome anything about the number of genes.


There are some extremely curious variations in relative genome size. The toad Xenopus and man have genomes of essentially the same size. But we assume that man is more complex in terms of genetic development! And in some phyla there are extremely large variations in DNA content between organisms that do not vary much in complexity (see Figure 3.1). (This is especially marked in insects, amphibians, and plants, but does not occur in birds, reptiles, and mammals, which all show little variation within the group, with an ~2 range of genome sizes.) A cricket has a genome 11 the size of a fruit fly. In amphibians, the smallest genomes are <109 bp, while the largest are ~1011 bp. There is unlikely to be a large difference in the number of genes needed to specify these amphibians. We do not understand why natural selection allows this variation and whether it has evolutionary consequences.


Do larger genomes contain a greater number of different genes or instead contain more copies of the same genes that are present in smaller genomes? If the diversity of genes increases with genome size, we should expect the number of unique DNA sequences in the genome to increase. This will not happen if there are simply more copies of the same genes.


These questions in due course will be directly answered by genome sequences, but at present we have direct information for only a few individual (and relatively small) genomes. However, the general nature of the eukaryotic genome can be assessed by the kinetics of reassociation of denatured DNA. This technique was used extensively before large scale DNA sequencing became possible (for review see the supplement on DNA reassociation kinetics).


Reassociation kinetics identify two general types of genomic sequences. Nonrepetitive DNA consists of sequences that are unique: there is only one copy in a haploid genome. Repetitive DNA describes sequences that are present in more than one copy in each genome. Repetitive DNA is often classed into two general types. Moderately repetitive DNA consists of relatively short sequences that are repeated typically 10-1000 in the genome. The sequences are dispersed throughout the genome, and are responsible for the high degree of secondary structure formation in pre-mRNA, when (inverted) repeats in the introns pair to form duplex regions. Highly repetitive DNA consists of very short sequences (typically <100 bp) that are present many thousands of times in the genome, often organized as long tandem repeats (see 4 Clusters and repeats). Neither class represents protein.




Figure 3.4 The proportions of different sequence components vary in eukaryotic genomes. The absolute content of nonrepetitive DNA increases with genome size, but reaches a plateau at ~2 ´ 109 bp.

The proportion of the genome occupied by nonrepetitive DNA varies widely. Figure 3.4 summarizes the genome organization of some representative organisms. Prokaryotes, of course, contain only nonrepetitive DNA. For lower eukaryotes, most of the DNA is nonrepetitive; <20% falls into one or more moderately repetitive components. In animal cells, up to half of the DNA often is occupied by moderately and highly repetitive components. In plants and amphibians, the nonrepetitive DNA may be reduced to a minority of the genome, with the moderately and highly repetitive components accounting for up to 80%.


The length of the nonrepetitive DNA component tends to increase with overall genome size, as we proceed up to a total genome size ~3 109 (characteristic of mammals). Further increase in genome size, however, generally reflects an increase in the amount and proportion of the repetitive components, so that it is rare for an organism to have a nonrepetitive DNA component >2 109. The nonrepetitive DNA content of genomes therefore accords better with our sense of the relative complexity of the organism. E. coli has 4.2 106 bp, C. elegans increases an order of magnitude to 6.6 107 bp, D. melanogaster increases further to ~108 bp, and mammals increase another order of magnitude to ~2 109 bp.


What type of DNA corresponds to protein-coding genes? Reassociation kinetics typically show that mRNA is derived from nonrepetitive DNA. More detailed analysis based on genomic sequences shows that many exons have related sequences in other exons(see 2 From genes to genomes). Such exons evolve by a duplication to give copies that initially are identical, but which then diverge in sequence during evolution.


This section updated 2-29-2000




Genes VII
Genes VII
ISBN: B000R0CSVM
EAN: N/A
Year: 2005
Pages: 382

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net