10. How did interrupted genes evolve?

2.10 How did interrupted genes evolve?

Key terms defined in this section
Superfamily is a set of genes all related by presumed descent from a common ancestor, but now showing considerable variation.

What was the original form of genes that today are interrupted?



  • The "introns early" model supposes that introns have always been an integral part of the gene. Genes originated as interrupted structures, and those without introns have lost them in the course of evolution.
  • The "introns late" model supposes that the ancestral protein-coding units consisted of uninterrupted sequences of DNA. Introns were subsequently inserted into them.

A test of the models is to ask whether the difference between eukaryotic and prokaryotic genes can be accounted for by the acquisition of introns in the eukaryotes or by the loss of introns from the prokaryotes.


The introns early model suggests that the mosaic structure of genes is a remnant of an ancient approach to the reconstruction of genes to make novel proteins. Suppose that an early cell had a number of separate protein-coding sequences. One aspect of its evolution is likely to have been the reorganization and juxtaposition of different polypeptide units to build up new proteins.


If the protein-coding unit must be a continuous series of codons, every such reconstruction would require a precise recombination of DNA to place the two protein-coding units in register, end to end in the same reading frame. Furthermore, if this combination is not successful, the cell has been damaged, because it has lost the original protein-coding units.


But if an approximate recombination of DNA could place the two protein-coding units within the same transcription unit, splicing patterns could be tried out at the level of RNA to combine the two proteins into a single polypeptide chain. And if these combinations are not successful, the original protein-coding units remain available for further trials. Such an approach essentially allows the cell to try out controlled deletions in RNA without suffering the damaging instability that could occur from applying this procedure to DNA.


If current proteins evolved by combining ancestral proteins that were originally separate, the accretion of units is likely to have occurred sequentially over some period of time, with one exon added at a time. Can the different functions from which these genes were pieced together be seen in their present structures? In other words, can we equate particular functions of current proteins with individual exons (for review see Blake, 1985)?




Figure 2.29 Immunoglobulin light chains and heavy chains are coded by genes whose structures (in their expressed forms) correspond with the distinct domains in the protein. Each protein domain corresponds to an exon; introns are numbered 1-5.

In some cases, there is a clear relationship between the structures of the gene and protein. The example par excellence is provided by the immunoglobulin proteins, which are coded by genes in which every exon corresponds exactly with a known functional domain of the protein. Figure 2.29 compares the structure of an immunoglobulin with its gene.


An immunoglobulin is a tetramer of two light chains and two heavy chains, which aggregate to generate a protein with several distinct domains. Light chains and heavy chains differ in structure, and there are several types of heavy chain. Each type of chain is expressed from a gene that has a series of exons corresponding with the structural domains of the protein.


In many instances, some of the exons of a gene can be identified with particular functions. In secretory proteins, the first exon, coding for the N-terminal region of the polypeptide, often specifies the signal sequence involved in membrane secretion. An example is insulin.


Sometimes the evolution of a gene involves the duplication of exons, creating an internally repetitious sequence in the protein. In chicken collagen, a 54 bp exon appears to have been multiplied many times, generating a series of exons that are either 54 bp or multiples of 54 bp in length.




Figure 2.30 The LDL receptor gene consists of 18 exons, some of which are related to EGF precursor and some to the C9 blood complement gene. Triangles mark the positions of introns. Only some of the introns in the region related to EGF precursor are identical in position to those in the EGF gene.

Sequences held in common between genes that are related only in part may represent exons that have migrated or been recruited between genes. Figure 2.30 summarizes the relationship between the receptor for human LDL (plasma low density lipoprotein) and other proteins.


In the center of the LDL receptor gene is a series of exons related to the exons of the gene for the precursor for EGF (epidermal growth factor). In the N-terminal part of the protein, a series of exons codes for a sequence related to the blood protein complement factor C9. So the LDL receptor gene was created by assembling modules for its various functions. These modules are also used in other proteins.


The relationship between exons and protein domains is somewhat erratic in known genes. In some cases there is a clear 1:1 relationship; in others no pattern is to be discerned. One possibility is that removal of introns has fused the adjacent exons. This means that the intron must have been precisely removed, without changing the integrity of the coding region. An alternative is that some introns arose by insertion into a coherent domain; here the difficulty is that we must suppose that the intron carried with it the ability to be spliced out.




Figure 2.20 A special splicing vector is used for exon trapping. If an exon is present in the genomic fragment, its sequence will be recovered in the cytoplasmic RNA, but if the genomic fragment consists solely of an intron,

Exons tend to be fairly small (see Figure 2.20), around the size of the smallest polypeptide that can assume a stable folded structure, ~20 V40 residues. Perhaps proteins were originally assembled from rather small modules. Each module need not necessarily correspond to a current function; several modules could have combined to generate a function. The number of exons in a gene tends to increase with the length of its protein, which is consistent with the view that proteins acquire multiple functions by successively adding appropriate modules.


This idea might explain another feature of protein structure: it seems that the sites represented at exon-intron boundaries often are located at the surface of a protein. As modules are added to a protein, the connections, at least of the most recently added modules, could tend to lie at the surface.


A fascinating case of evolutionary conservation is presented by the α- and β-globins and two other proteins related to them. Myoglobin is a monomeric oxygen-binding protein of animals, whose amino acid sequence suggests a common (though ancient) origin with the globin subunits. Leghemoglobins are oxygen-binding proteins present in the legume class of plants; like myoglobin, they are monomeric. They too share a common origin with the other heme-binding proteins. Together, the globins, myoglobin, and leghemoglobin constitute the globin superfamily, a set of gene families all descended from some (distant) common ancestor.




Figure 2.13 All functional globin genes have an interrupted structure with three exons. The lengths indicated in the figure apply to the mammalian b-globin genes.

Both α- and β-globin genes have three exons (see Figure 2.13). The two introns are located at constant positions relative to the coding sequence. The central exon represents the heme-binding domain of the globin chain.


Myoglobin is represented by a single gene in the human genome, whose structure is essentially the same as that of the globin genes. The three-exon structure therefore predates the evolution of separate myoglobin and globin functions.




Figure 2.31 The exon structure of globin genes corresponds with protein function, but leghemoglobin has an extra intron in the central domain.

Leghemoglobin genes contain three introns, the first and last of which occur at points in the coding sequence that are homologous to the locations of the two introns in the globin genes. This remarkable similarity suggests an exceedingly ancient origin for the heme-binding proteins in the form of a split gene, as illustrated in Figure 2.31.


The central intron of leghemoglobin separates two exons that together code for the sequence corresponding to the single central exon in globin. Could the central exon of the globin gene have been derived by a fusion of two central exons in the ancestral gene? Or is the single central exon the ancestral form; in this case, an intron must have been inserted into it at the start of plant evolution?




Figure 2.32 The rat insulin gene with one intron evolved by losing an intron from an ancestor with two interruptions.

Cases in which homologous genes differ in structure may provide information about their evolution. An example is insulin. Mammals and birds have only one gene for insulin, except for the rodents, which have two genes. Figure 2.32 illustrates the structures of these genes.


The principle we use in comparing the organization of related genes in different species is that a common feature identifies a structure that predated the evolutionary separation of the two species. In chicken, the single insulin gene has two introns; one of the two rat genes has the same structure. The common structure implies that the ancestral insulin gene had two introns. However, the second rat gene has only one intron. It must have evolved by a gene duplication in rodents that was followed by the precise removal of one intron from one of the copies.


The organization of some genes shows extensive discrepancies between species. In these cases, there must have been extensive removal or insertion of introns during evolution.


A well characterized case is represented by the actin genes. The typical actin gene has a nontranslated leader of <100 bases, a coding region of ~1200 bases, and a trailer of ~200 bases. Most actin genes are interrupted; the positions of the introns can be aligned with regard to the coding sequence (except for a single intron sometimes found in the leader).




Figure 2.33 Actin genes vary widely in their organization. The sites of introns are indicated in purple; the number identifies the codon interrupted by the intron.

Figure 2.33 shows that almost every actin gene is different in its pattern of interruptions. Taking all the genes together, introns occur at 12 different sites. However, no individual gene has more than 6 introns; some genes have only one intron, and one is uninterrupted altogether. How did this situation arise? If we suppose that the primordial actin gene was interrupted, and all current actin genes are related to it by loss of introns, different introns have been lost in each evolutionary branch. Probably some introns have been lost entirely, so the primordial gene could well have had 20 or more. The alternative is to suppose that a process of intron insertion continued independently in the different lines of evolution. The relationships between the intron locations found in different species may be used ultimately to construct a tree for the evolution of the gene.


The equation of at least some exons with protein domains, and the appearance of related exons in different proteins, leaves no doubt that the duplication and juxtaposition of exons has played an important role in evolution. It is possible that the number of ancestral exons, from which all proteins have been derived by duplication, variation, and recombination, could be relatively small (a few thousands or tens of thousands). By taking exons as the building blocks of evolution, this view implicitly accepts the introns early model for the origin of genes coding for proteins.


The highly interrupted structure of eukaryotic genes suggests a picture of the eukaryotic genome as a sea of introns (mostly but not exclusively unique in sequence), in which islands of exons (sometimes very short) are strung out in individual archipelagoes that constitute genes.


Alternative forms of genes for rRNA and tRNA are sometimes found, with and without introns. In the case of the tRNAs, where all the molecules conform to the same general structure, it seems unlikely that evolution brought together the two regions of the gene. After all, the different regions are involved in the base pairing that gives significance to the structure. So here it must be that the introns were inserted into continuous genes.


Organelle genomes provide some striking connections between the prokaryotic and eukaryotic worlds. Because of many general similarities between mitochondria or chloroplasts and bacteria, it seems likely that the organelles originated by an endosymbiosis in which an early bacterial prototype was inserted into eukaryotic cytoplasm. Yet in contrast with the resemblances with bacteria Xfor example, as seen in protein or RNA synthesis Xsome organelle genes possess introns, and therefore resemble eukaryotic nuclear genes.


Introns are found in several chloroplast genes, including some that have homologies with genes of E. coli. This suggests that the endosymbiotic event occurred before introns were lost from the prokaryotic line. If a suitable gene can be found, it may therefore be possible to trace gene lineage back to the period when endosymbiosis occurred.


The mitochondrial genome presents a particularly striking case. The genes of yeast and mammalian mitochondria code for virtually identical mitochondrial proteins, in spite of a considerable difference in gene organization. Vertebrate mitochondrial genomes are very small, with an extremely compact organization of continuous genes, whereas yeast mitochondrial genomes are larger and have some complex interrupted genes. Which is the ancestral form? The yeast mitochondrial introns often have the property of mobility Xthey are self-contained sequences that can splice out of the RNA and insert DNA copies elsewhere Xwhich suggests that they may have arisen by insertions into the genome (see 16 Retroviruses and retroposons).



Reviews
Blake, C. C. (1985). Exons and the evolution of proteins. Int. Rev. Cytol. 93, 149-185.



Genes VII
Genes VII
ISBN: B000R0CSVM
EAN: N/A
Year: 2005
Pages: 382

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net