ERROR CORRECTION RUNS DEEP

By Mike Gene

One of the biological universals of Life is proof-reading/error correction mechanisms on several levels. There are three primary nodes of information transfer (replication, transcription, translation) and all three are proof-read. At a deeper level, it appears the genetic code itself (where nucleotide codons represent amino acids) was designed to minimize deleterious mutations [1]. However, a recent study [2] suggests that error correction extends deeper yet, into the very fabric of the DNA molecule.

Nature's Philip Ball [3] explains it as follows:

Donall Mac Donaill of Trinity College, Dublin, has worked out that DNA code is like the parity code that information technologists use to minimize the probability of making mistakes.

Mac Donaill notes

Investigations into nucleotide alphabet composition have tended to focus on physicochemical and related issues. Yet nucleotide replication is at heart an information transmission phenomenon, and it seems reasonable to postulate that the evolutionary pressures shaping the nucleotide alphabet might not have been confined to physicochemical issues alone, and that considerations relating to informatics might have had a constraining evolutionary role, acting concurrently but independently of the physics and chemistry. Surprisingly therefore, with the exception of Szathmáry's pioneering work recognising the importance of hydrogen donor-acceptor (D/A) patterns, informatics aspects of the problem have been largely neglected.

He then assigned binary notation to the hydrogen bonds of G:C and A:T base pairs. As Ball explains:

Mac Dónaill argues that the nucleotides' pairings are a kind of code. Each hydrogen bond has two components: chemical groups called donors and acceptors. If we denote a donor as 1 and an acceptor as 0, then C encodes the pattern 100, and G is 011. In other words, each nucleotide can be represented as a short sequence of binary code, like the 1's and 0's used to record information in computers. There is one more element in this code. A and G belong to a class of molecule called purines, and T and C are pyrimidines. Each pairing involves a purine and a pyrimidine. We can denote a purine by 0 and a pyrimidine by 1. Then C becomes 100,1 and G is 011,0.

The significance of all this? Mac Donaill notes

Thus, it would appear that in nature the purine/pyrimidine nature of a nucleotide is strictly and intriguingly related to the D/A pattern as a parity bit. The critical question is whether the parity-code structure is accidental, or shaped by selection through evolutionary advantage. In conventional error-coding theory the advantage afforded by a parity code structure lies in the number of features which must be changed to convert one codeword into another; a transmission error in any one bit changes the parity of the transmitted element whereby the error may be detected. The difference between codewords may be expressed in terms of the Hamming distance...defined as the number of bits in which two codewords differ. It is equivalent to the number of bits set 1 in the Boolean exclusive OR product XOR.

and

To summarise, error-coding considerations show how a parity code structure might offer a replication fidelity advantage. The natural alphabet appears to be structured like a parity code, and it would appear that the error-coding theory proposed by Hamming in 19505 was actually anticipated by nature.

Philip Ball puts it this way:

Represented in this way, says Mac Dónaill, the permissible combinations of A,C,T and G correspond to what computer scientists call a parity code. Each nucleotide has an even number of 1's - it is said to have an even parity.

This makes it easier to spot errors such as non-natural nucleotides. If the error changes any one digit in a nucleotide, its parity changes from even to odd. Odd-parity nucleotides are clearly wrong.

The Significance

Although Mac Donaill does not explicitly spell it out, I think his insights are more significant because they answer a question that I have not seen answered before - why does life use A, G, C, and T and not other nucleotides. For example, Mac Donaill notes that previous studies have shown that artificial nucleotides can be reliably used and replicated by polymerases. So why didn't Nature use any other nucleotides? As Mac Donaill recounts, Orgel explained this by suggesting that "nature had simply failed to discover them." Yet Mac Donaill seems to have a meatier explanation: "Errorcoding analysis however suggests that mixed parity alphabets with interpurine or interpyrimidine distances of one have an inherently low fidelity." It is interesting to note a parallel with the genetic code. It was originally explained as a "frozen accident," yet now we are beginning to appreciate it was designed to minimize errors. [4] The same pattern repeats itself here, as pure contingency has been used to explain why DNA employs the four bases that it does and not some other mix. Yet thanks to Mac Donaill, the explanation again points to minimizing errors.

Thus is would seem that minimizing errors is a theme that repeats itself in a fractal pattern. It is error correction that explains why Life used the nucleotides it does. It is error correction that explains why Life uses the Genetic Code it does. Error correction even seems to come into play in explaining the seemingly bizarre nature of lagging strand synthesis. [5] At many levels, Life revolves around correcting errors, a realization that is friendly to the concept of specified complexity.

The Designer

In trying to understand the origin of the parity-code, Mac Donaill writes, "The critical question is whether the parity-code structure is accidental, or shaped by selection through evolutionary advantage." It is understandable that non-teleologists would frame this question in a binary fashion - chance or selection. But teleologists have a third option. That is, while they can agree with the selectionists that chance is not a good explanation, they propose an intelligent designer rather than the blind watchmaker. Yet is there a way to distinguish between the two? Perhaps in the end, all that we will have are two alternative ways to look at the same data. But there is one consideration that speaks to intelligent design.

Ball's article states:

When life first emerged from simple molecular constituents, says Mac Dónaill, "selective pressure should have favoured parity-code-structured alphabets".

But might Mac Donaill be mistakenly extrapolating modern biology on a pool of "simple molecular constituents?" I think Ball's essay lays out the thinking that underlies such opinions:

The consequences of wrongly read or copied information can be disastrous. Malfunctioning genes can cause diseases and defects. Errors can occasionally have beneficial effects - they create the mutations that drive the evolutionary process - but they are usually detrimental. So cells have evolved molecular machinery for checking transcription and replication. This greatly reduces the chances of errors, but does not eliminate them. Mac Dónaill says that there is another mechanism for detecting errors - in the chemistry of DNA itself.

It is understandable why mistakes in a high information entity such as a human being can be disastrous. But are we sure that the same applies to such primitive quasi-life forms? That is, while it makes sense that mutations are more likely to be detrimental than beneficial in modern day, high-information organisms, does this really apply in primitive, low-information quasi-life forms? Consider the following points:

 

 

I would thus hypothesize that any original, simple-quasi-life forms would be better served by maintaining high error rates that appear to come with simplicity and low-information states.

One way to distinguish an intelligent designer over natural selection is that the former has foresight, while the latter is myopic, working only on immediate benefits. While the argument is fuzzy, it would seem that the error correction capabilities, inherent in the DNA chemistry (and perhaps the genetic code) appear to reflect foresight, when such capabilities would become essential in the high-information state life forms that would exist hundreds of millions of years after the putative simple replicators. Natural selection, on the other hand, it concerned only with immediate benefits, which would seem to come from high error rates in such primitive life forms. And it is also known that evolution often boxes itself in, where a solution to an immediate problem becomes "locked in" and cannot be displaced in later evolution (in such cases, natural selection ends up jury-rigging around the problem posed by life's hard-wiring, making due simply with what is laying around).

One thing seems clear. Very early on, life became obsessed with error correction. The chemistry of DNA/RNA, the Genetic Code, and the proof-reading mechanisms behind information transfer are all biological universals. Apparently, one of the first "objectives" of evolution was to put a layer of constraints on evolution.

 

[1] BRno.15

[2] A parity code interpretation of nucleotide alphabet composition

[3] DNA codes own error correction

[4] Non-teleologists attribute the design to the "blind watchmaker."

[5] Brno.16

 

ID THINK