21.6 Homeodomains bind related targets in DNA |
The homeobox is a sequence that codes for a domain of 60 amino acids present in proteins of many or even all eukaryotes. Its name derives from its original identification in Drosophila homeotic loci (whose genes determine the identity of body structures). It is present in many of the genes that regulate early development in Drosophila, and a related motif is found in genes in a wide range of higher eukaryotes. It is attractive to think that the homeodomain identifies (or at least is common in) genes concerned with developmental regulation (see 29 Gradients, cascades, and signaling pathways). Sequences related to the homeodomain are found in several types of animal transcription factors, but with the extension from the original Drosophila homeodomains to mammalian transcription factors, the relationship between the conserved regions drops significantly.
Figure 21.10 The homeodomain may be the sole DNA-binding motif in a transcriptional regulator or may be combined with other motifs. It represents a discrete (60 residue) part of the protein. |
In Drosophila homeotic genes, the homeodomain often (but not always) occurs close to the C-terminal end. Some examples of genes containing homeoboxes are summarized in Figure 21.10. Often the genes have little conservation of sequence except in the homeobox. The conservation of the homeobox sequence varies. A major group of homeobox-containing genes in Drosophila has a well conserved sequence, with 80 V90% similarity in pairwise comparisons. Other genes have less well related homeoboxes. The homeodomain is sometimes combined with other motifs in animal transcription factors. One example is presented by the Oct (octamer-binding) proteins, in which a conserved stretch of 75 amino acids called the Pou region is located close to a region resembling the homeodomain. The corresponding sequences in homeoboxes of the pou group of proteins are the least well related to the original group, and thus comprise the farthest extension of the family.
Figure 21.11 The homeodomain of the Antennapedia gene represents the major group of genes containing homeoboxes in Drosophila; engrailed (en) represents another type of homeotic gene; and the mammalian factor Oct-2 represents a distantly related group of transcription factors. The homeodomain is conventionally numbered from 1 to 60. It starts with the N-terminal arm, and the three helical regions occupy residues 10-22,28-38, and 42-58. |
The homeodomain is responsible for binding to DNA, and experiments to swap homeodomains between proteins suggest that the specificity of DNA recognition lies within the homeodomain, but (like the situation with phage repressors) no simple code relating protein and DNA sequences can be deduced. The C-terminal region of the homeodomain shows homology with the helix-turn-helix motif of prokaryotic repressors. We recall from 11 Phage strategies that the λ repressor has a "recognition helix" (α-helix-3) that makes contacts in the major groove of DNA, while the other helix (α-helix-2) lies at an angle across the DNA. The homeodomain can be organized into three potential helical regions; the sequences of three examples are compared in Figure 21.11. The best conserved part of the sequence lies in the third helix. The difference between these structures and the prokaryotic repressor structures lies in the length of the helix that recognizes DNA, helix-3, which is 17 amino acids long in the homeodomain, compared to 9 residues long in the λ repressor.
Figure 21.12 Helix 3 of the homeodomain binds in the major groove of DNA, with helices 1 and 2 lying outside the double helix. Helix 3 contacts both the phosphate backbone and specific bases. The N-terminal arm lies in the minor groove, and makes additional contacts. |
The crystal structure of the homeodomain of the product of the D. melanogaster engrailed gene is represented schematically in Figure 21.12. Helix 3 binds in the wide groove of DNA and makes the majority of the contacts between protein and nucleic acid. Many of the contacts that orient the helix in the major groove are made with the phosphate backbone, so they are not specific for DNA sequence. They lie largely on one face of the double helix, and flank the bases with which specific contacts are made. The remaining contacts are made by the N-terminal arm of the homeodomain, the sequence that just precedes the first helix. It projects into the minor groove. So the N-terminal and C-terminal regions of the homeodomain are primarily responsible for contacting DNA (Wolberger et al., 1991).
A striking demonstration of the generality of this model derives from a comparison of the crystal structure of the homeodomain of engrailed with that of the α2 mating protein of yeast. The DNA-binding domain of this protein resembles a homeodomain, and can form three similar helices: its structure in the DNA groove can be superimposed almost exactly on that of the engrailed homeodomain. These similarities suggest that all homeodomains bind to DNA in the same manner. This means that a relatively small number of residues in helix-3 and in the N-terminal arm are responsible for specificity of contacts with DNA (for review see Gehring et al., 1994).
Figure 29.8 The posterior pathway has two branches, responsible for abdominal development and germ cell formation. |
One group of homeodomain-containing proteins is the set of Hox proteins (see Figure 29.8). They bind to DNA with rather low sequence specificity, and it has been puzzling how these proteins can have different specificities. It turns out that Hox proteins often bind to DNA as heterodimers with a partner (called Exd in flies and Pbx in vertebrates). The heterodimer has a more restricted specificity in vitro than an individual Hox protein; typically it binds the 10 bp sequence TGATNNATNN. Still this is not enough to account for the differences in the specificities of Hox proteins. A third protein, Hth, which is necessary to localize Exd in the nucleus, also forms part of the complex that binds DNA, and may restrict the binding sites further. But since the same partners (Exd and Hth) are present together with each Hox protein in the trimeric complex, it remains puzzling how each Hox protein has sufficient specificity.
Homeodomain proteins can be either transcriptional activators or repressors. The nature of the factor depends on the other domain(s) Xthe homeodomain is responsible solely for binding to DNA. The activator or repressor domains both act by influencing the basal apparatus. Activator domains may interact with coactivators that in turn bind to components of the basal apparatus. Repressor domains also interact with the transcription apparatus (that is, they do not act by blocking access to DNA as such). The repressor Eve, for example, interacts directly with TFIID (Han et al., 1989).
Reviews | |
Gehring, W. J. et al. (1994). Homeodomain-DNA recognition. Cell 78, 211-223. |
Research | |
Han, K., Levine, M. S., and Manley, J. L. (1989). Synergistic activation and repression of transcription by Drosophila homeobox proteins. Cell 56, 573-583. | |
Wolberger, C. et al. (1991). Crystal structure of a MATa2 homeodomain-operator complex suggests a general model for homeodomain-DNA interactions. Cell 67, 517-528. |