THE IHE AND BEYOND
By Mike Gene (1/7/03)
I have argued that the genetic code may have been designed [1] to exploit the mutagenic bias that exists as a consequence of cytosine deamination. In my original analysis, I noted that the genetic code uses cytosine deamination to channel mutations such that they sample from an pool of amino acids that is almost exclusively hydrophobic (IHE or Increasing Hydrophobicity Effect). Furthermore, this pool is biased toward facilitating secondary structure formation. If coupled with carefully chosen initial proteomes, the potential exists that the first major evolutionary steps subsequent to the originally designed state where rigged such that the mutational bias could untap secondary designs that were front-loaded into the original state. For example, this might mean that something like the evolution of multicellularity (and perhaps more) was designed through this biased mutagenic effect.
The first step in testing this hypothesis is to determine whether the IHE plays out in evolution. Since the genetic code is universal, we might expect to see residual traces of this effect even if the sole intention of the design was to guide the first major evolutionary steps. Two extensive analyses have indeed uncovered the IHE in action (although neither paper makes the connection I did in my original analysis).
I'll begin by discussing the first one: Gregory A. C. Singer and Donal A. Hickey. 2000. Nucleotide Bias Causes a Genomewide Bias in the Amino Acid Composition of Proteins. Molecular Biology and Evolution 17:1581-1588
Since evolution through cytosine deamination essentially entails replacing guanine and cytosine (G and C) with adenine and thymine (A and T), it would be interesting to compare the proteomes of GC-rich and AT-rich genomes. Luckily, this analysis has already been done by Singer and Hickey.
They begin by noting:
Some organisms, for example, have genomes that are disproportionately rich in guanine and cytosine (G and C), while others have DNA that is rich in adenine and thymine (A and T). Variation in nucleotide composition is usually most pronounced at the synonymous codon positions of genes, and, because of the redundancy in the genetic code, these variations in DNA content may have little effect on the amino acid content of the encoded proteins.
Singer and Hickey then partitioned the genetic code into GC-rich and AT-rich codons. They noted the AT-rich codons would encode phenylalanine, tyrosine, methionine, isoleucine, asparagine, and lysine (FYMINK) while the CG-rich codons would encode glycine, alanine, arginine, and proline (GARP). While the amino acid pools are not the same ones I identified in as the pre- and post-cytosine deamination codons (given the codons they looked at were enriched with AT or GC), there is an overlap, where the AT-rich codons contain mostly hydrophobic residues.
Singer and Hickey then looked at 22 completely sequenced genomes to determine if GC-rich genomes would have proteins that are enriched with GARP amino acids and AT-rich genomes are enriched with FYMINK amino acids. This is exactly what they found.
They took a closer look at Borrelia burgdorferi and Mycobacterium tuberculosis , which have a 25.5% and 65.9% GC content, respectively. These thus represented the two ends of the extreme. They compared 305 genes common to both organisms and measured the synonymous nucleotide frequencies and amino acid contents of each one. They found, "For every gene, the GARP/FYMINK ratio in the M. tuberculosis homolog was higher than that of the corresponding gene in B. burgdorferi".
The authors conclude, Our main finding is not just that protein composition is affected by nucleotide bias, but also that this effect is both very large and very widespread. In fact, they observe:
quote:
When we plotted the relationship between nucleotide bias and amino acid content for the entire set of genomes examined, we were surprised to see that there was no "clumping" of major phylogenetic groups in these graphs and that the archaeal and eubacterial genomes behaved as a single homogenous data set. Moreover, the yeast genome data also fell at the predicted point on these graphs. This suggests that the effects of nucleotide bias on protein composition are operating in all major lineages.
Thus, if nucleotide bias does in large part determine protein composition, and nucleotide bias can be tweaked by cytosine deamination, it becomes clear the Increasing Hydrophobicity Effect I described could very well play out in evolution and thus be a component of the design mechanism.
They end their article with two very interesting observations.
quote:
The most parsimonious explanation of the observed patterns of amino acid composition in these genomes is an underlying mutational bias that varies between lineages. The resulting amino acid sequence changes are nonrandom, since the mutational bias is strongly directional, and yet they are not caused by natural selection acting directly on protein function. Consequently, their evolutionary dynamics cannot be described in terms of either Darwinian selection or random genetic drift. They may, however, result in secondary selective changes in the protein sequence. For example, amino acid bias could result in a change of the charge distribution within a protein, as well as an alteration of the protein's secondary and tertiary structures. Such proteins may then undergo positive selection at other sites to counter the potentially deleterious effects of these nucleotide bias-induced changes. The long-term result might be a cascade of compensatory changes to reduce the impact of amino acid bias on protein structure and function. The problem of distinguishing between functional constraint in protein sequences and mutation-driven biases in the composition of these same sequences will provide a future challenge for molecular evolutionists.
This line of reasoning nicely complements the mechanism I describe and outline in Figure 7 of Evolution's Design[1] :
quote:
Figure 7. The C-to-T transitions can be likened to a stream, constantly pushing amino acid content toward a more hydrophobic state. Given the context provided by the originally designed state (continually reflected by evolution's tendency to borrow from pre-existing states), buried secondary designs may be unmasked on a somewhat regular basis. If the act of unmasking occurs in an appropriate environment, selection will lock the secondary design into the biosphere.
Going back to Singer and Hickey, they note:
quote:
In conclusion, we recognize that other factors, such as selective constraint, adaptive change, and genetic drift, all play important roles in protein sequence evolution. The results presented here, however, demonstrate that mutational pressure on DNA composition can also be a very powerful and pervasive force in long-term protein evolution.
Such mutational pressure could very well help to unlock buried designs in front-loaded states.
Another implication of all this concerns convergent evolution, which is well situated in the hypothesis of front-loaded evolution. The authors offer their own take:
quote:
This result has implications for many studies that are based on the interpretation of amino acid sequence data. For instance, it has already been shown that nucleotide bias can affect the functional properties of proteins and that convergent amino acid composition can affect the construction of phylogenetic trees based on protein. For instance, Foster and Hickey showed that unrelated taxa were grouped together in a phylogenetic tree due to convergent amino acid sequences. Although this problem in phylogenetic reconstruction has been identified, a satisfactory method of dealing with the problem has not yet been found. Because of the relationship between primary amino acid sequence and secondary protein structure, nucleotide bias may also affect the evolution of protein structure .
This raises the fascinating question of whether any examples of convergent molecular evolution involved CT transitions at key points [2].
Another way to detect the IHE would be to analyze the effects of RNA editing that exploited cytosine deamination. During such editing processes, the synthesized RNA molecule is altered such that specific bases are changed through the use of cellular machinery. As a result, the RNA that is used by the cell is not directly encoded in the genome. A focus on RNA editing would allow us to see the effects of cytosine deamination all at once, rather than being spread out across time through incremental evolution.
So let me now discuss the second article: Philippe Giegé and Axel Brennicke. 1999. RNA editing in Arabidopsis mitochondria effects 441 C to U changes in ORFs. PNAS 96, 15324-15329
Giege and Brennicke (G&B) identified 456 instances of RNA editing. All of the edits were C-to-U changes and 441 occurred in open reading frames. The editing was rather extensive, where one of 15 cytosines was changed to uracil. Yet the editing was not evenly distributed, as genes coding for complex I (of the electron transport chain) and cytochrome c biogenesis were edited at a frequency higher than others. The effects of the editing nicely illustrate the IHE. According to G&B:
quote:
RNA Editing Increases the Overall Hydrophobicity of Mitochondrial Proteins.
Figure 1 from their study is shown below:

The interesting thing here is that the IHE may only be tweaking proteins and their function. For example, many of the subunits of NADH dehydrogenase are extensively edited, yet there is no reason to think any novel function has appeared. Instead, the effects of cytosine deamination may be utilized to accelerate the fine-tuning of any particular protein.
The effects of cytosine deamination not only are channeled into the IHE, but also may play a role in modifying RNA structure and function. G&B report:
quote:
The Arabidopsis mitochondrial intron population is exclusively composed of organellar group II introns with a well-conserved secondary structure, which has been shown to be important for splicing. Some of the editing sites affecting group II intron sequences are predicted to improve the quality of the intron folding (Fig. 2) and thus very likely to improve functional splicing.
The fact that cytosine deamination produces uracil is quite intriguing. Just as the hydrophobic amino acids play a crucial role in protein structure, thus function, uracil appears to play such a role in RNA structure and function.
In 1994, Stephen Holbrook, a chemist in the Structural Biology Division, determined that uracil had the ability to base pair with any of the bases in RNA. Here are some excerpts from Lynn Yarris' report on this [3]:
quote:
Holbrook subsequently determined the three-dimensional structure of an RNA molecule containing U-U base pairs. Unlike the U-G and U-C base pairs, the U-U partners formed two hydrogen bonds that were stable without the presence of tightly bound water molecules.
"Non-standard base pairs such as the U-G, U-C, and U-U partners we have observed are common in ribosomal RNA, viroids, messenger RNA, and retroviruses," says Holbrook. "Runs of these mismatched pairs in the middle of double helical RNA form internal loops."
"Uracil can now be called the universal partner in RNA structure," says Holbrook. "That it can pair with any other base helps explain why RNA is so flexible in terms of how it interacts with itself and why, unlike DNA, it can take on so many different shapes."
It is also worth noting that G·U is the most frequent non-canonical base-pair found in RNA [4]. And it also just happens to be the most stable mismatch (UU being the most flexible), although less stable than the canonical G-C pair. If we imagine a hairpin structure in RNA important for function, G-C pairs are most likely to be converted to G-U pairs because of cytosine deamination. This could, depending on the sequence context, slightly destabilize the hairpin allowing the RNA the flexibility to interact on a novel way. However, since G-U mismatches are the most stable, the search stays close to the original RNA structure.
SUMMARY
The Increasing Hydrophobicity Effect can be seen both through the genomic analyses of Singer and Hickey and the RNA Editing study of Giege and Brennicke. It thus clearly has played out in evolution. However, it still remains an open question as to exactly how it has played out. There are at least three possibilities:
1) The IHE works only to tweak and adjust protein function as may be the case with the mitochondrial proteins analyzed by Geige and Brennicke. It would be helpful to biochemically characterize some of these edited proteins and compare their properties to unedited versions.
2) In addition to 1), the IHE may have been coupled to carefully chosen intitial states to unlock front-loaded states as previously suggested. [1]
3) In addition to 1) and 2), the IHE has the ability to evolve novel proteins not front-loaded into the initial states. If this is true, how often has it occurred and what frequency of novel protein evolution was dependent on the IHE?
In addition to the IHE, cytosine deamination may also play a helpful role in RNA evolution. It is noteworthy that such a mutation would unleash uracil in a position originally housed by cytosine and uracil appears to play an important role increasing RNA flexibility, thus function.
It is also worth commenting on the apparent conceptual tie between the effects of cytosine deamination on protein and RNA structure/function. In both cases, the most common base substitution appears to have significant functional potential, as both hydrophobic amino acids and uracil seem to make the greatest impact of protein and RNA structure, respectively. It's as if an engineer is trying to get the "most bang from your buck" when it comes to utilizing a nitrogenous base poised to change.