The Gigantic Improbability of Fine-Tuned Protein Molecules

Note: This post was published in October 2019, and refers to the text of the paper 'Evolution and Probability' as it was in that month.

In his scientific paper “Evolution and Probability,” professor Luca Peliti raises the question of how evolution as described by Darwin could produce such wonders as the human eye. He then considers the problem of the origin of proteins. He states the following:

Living cells have a repertoire of 20 amino acids and proteins are made of linear chains of amino acids (called polypeptide chains). A typical enzyme (a kind of protein whose function is to fasten, sometimes by several orders of magnitude, the unfolding of chemical reactions) is about a hundred amino acids long. If we suppose that only one particular amino acid sequence can make a functional enzyme, the probability to obtain this sequence by randomly placing amino acids will be of the order of one over the total number of amino acid sequences of length 100, i.e., 20100 ≈ 1030.”

Peliti is talking about a very important reality here, the difficulty of protein molecules forming by any Darwinian process. The problem is very severe, partially because of the very great number of different types of proteins in the human body. Humans have more than 20,000 types of protein molecules, specified by about 20,000 genes, each of which lists the amino acid sequence used by a protein. Each protein uses a different sequence of amino acids.

Peliti has committed two very serious errors in the passage quoted above. The first is that he has committed a careless math error in telling us that 20 to the hundredth power is equal to about ten to the thirtieth power. Twenty to the hundredth power is actually equal to ten to the one hundred and thirtieth power. Twenty to the hundredth power is equal to 1.2676506 X 10130. In other words, 20100 ≈ 10130 rather than 20100 ≈ 1030 as Peliti stated. This difference is kind of all the difference in the world, because 10130 is more than 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 times larger than 1030. If there was a chance of about 1 in 1030 that a functional protein could appear by chance, that would be a likelihood very slim but perhaps something we might expect to occur, given countless million years for random trials. But if there was a chance of about 1 in 10130 that a functional protein could appear by chance, that would be a likelihood so small that we would never expect it to happen in the history of the universe.

The screen shot below (from an online large exponents calculator) shows a conversion that corresponds to the one stated in the previous paragraph.

The second major error committed by Peliti in the passage quoted above is that he has seriously misstated the typical number of amino acids in a protein molecule or enzyme (an enzyme is a type of protein molecule). The number of amino acids in a human protein varies from about 50 to more than 800. The scientific paper here refers to "some 50,000 enzymes (of average length of 380 amino acids)." According to the page here, the median number of amino acids in a human protein is 375, according to a scientific paper. The difference between Peliti's suggestion that protein molecules such as enzymes have about 100 amino acids and the reality that they have a median of about 375 amino acids turns out to be a gigantically big difference when you calculate the probability of a protein forming by chance, a difference almost infinitely greater than the 375% difference between 100 and 375.

Let us do a calculation using the correct numbers and correct math. There are 20 amino acids used by living things. The median amino acid length of a human protein is 375 amino acids. So to calculate the chance of a set of amino acids randomly forming into the exact set of amino acids used by a functional protein such as an enzyme, the correct figure is 1 in 20 to the three-hundred-and-seventy-fifth power. This is a probability of about 1 in 10 to the four-hundred-eighty-seventh power. Very precisely, we can say that the chance of a random sequence of amino acids exactly matching that of a protein with 375 amino acids is a probability of 1 in 7.695704335 X 10487That is a probability similar to the probability of you correctly guessing (with 100% accuracy) the ten-digit telephone numbers of 48 consecutive strangers. The calculation is shown in the visual below:

Now, for a protein such as an enzyme to function properly, it must have a sequence of amino acids close to its actual sequence. Experiments have shown that it is easy to ruin a protein molecule by making minor changes in its sequence of amino acids. Such changes will typically “break” the protein so that it will no longer fold in the right way to achieve the function that it performs. A biology textbook tells us, "Proteins are so precisely built that the change of even a few atoms in one amino acid can sometimes disrupt the structure of the whole molecule so severely that all function is lost." And we read on a science site, "Folded proteins are actually fragile structures, which can easily denature, or unfold." Another science site tells us, "Proteins are fragile molecules that are remarkably sensitive to changes in structure." But we can imagine that a protein molecule might still be functional if some minor changes were made in its sequence of amino acids.

Let us assume that for a protein molecule to retain its function, at least half of the amino acids in a functional protein have to exist in the exact sequence found in the protein. Under such an assumption, to calculate the chance of the functional protein forming by chance, rather than calculating a probability of 1 in 20 to the three-hundredth-and-seventh-fifth power, we might calculate a probability of 1 in 20 to the one-hundred-eighty-seventh power (187 being about half of 375). This would give us a probability equal to about 1 in 10 to the two hundred and forty-third power, a probability of about 1 in 10243. The calculation is shown below.

So even under the probably-too-generous assumption that perhaps only half of a protein's amino acid sequence has to be just as it is, we still are left with a calculation suggesting that a functional protein could never form by chance. The probability above is about the probability of you correctly guessing the 10-digit phone numbers of 24 consecutive strangers. It would seem that some miracle of luck would be required for a particular type of functional protein molecule to form. Since there are billions of different types of functional protein molecules in the biological world,  you need to believe in billions of miracles of luck to believe that such things appeared naturally. 

Peliti does get the math rather right in the excerpt below, in which he describes what he later calls a “garden of forking paths”:

We can recast this argument in the following way. Let us suppose, following Dawkins... that 1000 evolution steps are necessary to obtain a working eye, and that at each step there are only two possibilities: the right or the wrong one. If the choice is made blindly, the probability of making the right choice at any step is equal to 1/2 . Then the probability of always making the right choice at every step is equal to 1/21000 ≈ 1/10300. It is clear that the fact that our lineage made the correct choice at each step looks nothing less than miraculous!”

The math here is reasonable. So we have two different ways of calculating things, and both have given us a likelihood of no more than about 1 chance in 10 to the two hundredth power. Such miracles of chance would need to happen not just once but very many times for the biological world to end up in its current state.

After raising this protein origin problem and the problem of the origin of complex biological systems like the vision system, Peliti attempts to address such origin problems. Peliti begins to reason like this:

In the argument we just discussed we assumed that one does not know if one is on the right path until the end is reached. Let us suppose instead that each step made in the good direction provides a small advantage in terms of survival or fecundity to the being that makes it. More precisely, let us imagine to send in this 'Garden of forking paths' a group of, say, 100 individuals, who perform their choice (right or wrong) at each step, and then reproduce. Let us assume that those who make the right choice at the right moment have slightly more offspring than those who make the wrong one: for instance, that for each offspring of an individual who made the wrong choice, there are 1.02 offspring (on average) of one who made the right choice. Thus, after the first step, we shall have 50 individuals on the right side and 50 on the wrong side and, after reproduction, 102 on the right side and 100 on the wrong side. This is surely a very small difference: the fraction of individuals on the right path is 51% rather than 50%. However, if we wait a few generations, the fraction of individuals on the right side will increase: after 10 generations the ratio of the number of offspring of an individual who had made the right choice to that of one who made the wrong choice will be about 1.22, and after 100 generations it will be about 7.25.”

Peliti then goes further and further with such reasoning, eventually reaching the conclusion that something like an eye might appear after 300,000 generations. I need not discuss his full line of reasoning, and will simply note that Peliti's entire line of reasoning is fallacious because of the bad assumption that he has made at the beginning, the assumption that he states as, “Let us suppose instead that each step made in the good direction provides a small advantage in terms of survival or fecundity to the being that makes it.”

An assumption like this is what we may call a “benefit-with-every-step” assumption. Such an assumption can be visually represented like this:

straight line benefit calculation
Graph 1

The graph above shows a type of situation in which 10% completion yields 10% of the benefits of full completion, 20% completion yields 20% of the benefits of full completion, and so on and so forth, with 50% completion yielding 50% of the benefits of full completion, and 100% completion yielding 100% of the benefits of full completion.

A straight-line benefit calculation is suitable for a small minority of cases. For example, if you are a farmer who plants and harvests 10% of his farming field, you will get 10% of the benefit of planting and harvesting 100% of the field. And if you are a street vendor who sells 20% of the items in your vendor's wagon, you will get 20% of the benefit of selling 100% of the items in your wagon.

But it is obviously true that in life in general and in biology in particular, it is very often totally inappropriate to be using such a straight-line benefits calculation or to be making a benefit-with-every-step assumption. Here are some of the innumerable examples I could give to prove such a point:

  1. Completing 30% of your SAT examination doesn't give you 30% of the benefit of completing 100% of the examination; instead it leaves you with a very bad score that doesn't do you any good at all.
  2. Completing 50% of a suspension bridge across a river doesn't give you a bridge that is 50% as good as a full bridge; instead it leaves you with a bridge that is a suicide device because people driving across will find their cars falling into the river.
  3. Completing 20% of an airplane doesn't give you something that flies 20% as fast or as high as a full airplane, but merely gives you something that doesn't fly at all.
  4. Completing 20% of a television set doesn't give you something that gets 20% as many channels as a full TV set, but merely gives you something that doesn't display TV programs.
  5. If you have 20% of a functional protein molecule, the molecule won't perform its function, and won't do anything.
  6. If you build only 20% of a rocket designed to launch satellites, that rocket won't be able to reach orbit, and will produce no benefits at all.
  7. A woman with 20% or 30% of her reproductive system won't be able to have 20% of a baby; instead, she won't be able to have any baby at all.
  8. Having only one quarter of a circulatory system (such as only veins or only arteries or only half a heart) provides no benefit at all. 

We can call these type of cases late-yielding cases or “halfway yields nothing” cases. In all such cases, it is utterly inappropriate to use a benefit-with-every-step assumption or a straight-line benefit calculation, and it is utterly inappropriate to be evoking slogans such as “every step yields a benefit.” What type of graph would be appropriate to illustrate such cases? Rather than a straight-line graph, it would be a graph with a J-shaped line that stays flat for the first half of the graph, and then slopes up sharply. The graph would look rather like this:

late yielding benefit calculation
Graph 2

The exact slope of such a curve would differ from one late-yielding situation to another, but in each case the curve would never be a straight-line curve but a curve rising sharply in the end stages, without any benefit in the earliest stages.

Now, which one of these two calculations would be appropriate to use when considering the case of the evolution of a vision system? Clearly the appropriate calculation would be like that shown in the “late yielding benefit calculation” graph (Graph 2), and not a calculation like that shown in the “straight-line benefit calculation” graph (Graph 1).

Let us ignore the red-herrings often thrown about when discussing such an issue, such as an appeal to the light-sensitive patches of earthworms. Humans are part of the chordates phylum, which is an entirely different phylum from the phylum of earthworms (the platyhelminthes phylum). No one believes that humans are descended from earthworms.

If we look at the fossil record of the phylum that humans belong to (chordates), we see no evidence that chordate eyes evolved from some mere light-sensitive patch such as on an earthworm. When considering the possible evolution of vision systems, we should be using something like a late-yielding benefit calculation and a “halfway yields nothing” assumption.

People very often speak as if “eyes=vision,” but such an assumption is false. For an organism to have vision, it is necessary to have a complex system consisting of many different things, including the following:

  1. A functional eye requiring an intricate arrangement of many parts to achieve a functional end.
  2. One or more specialized protein molecules capable of capturing light, each consisting of many amino acids arranged in just the right way to achieve a functional end.
  3. An optic nerve stretching from the eye to the brain.
  4. Extremely specialized parts in the brain necessary for effective vision.

All of these parts must be necessary for vision, so we clearly have here a situation that is not correctly described by an “every step yields a benefit” calculation. The correct assumption would be a “halfway yields nothing” assumption and a late-yielding benefits calculation.

We can give the name "the benefit-at-every-step fallacy" for the type of fallacy Peliti has committed by saying, "Let us suppose instead that each step made in the good direction provides a small advantage in terms of survival or fecundity to the being that makes it."  I can define the benefit-at-every-step fallacy as the fallacy of assuming that benefits will be yielded throughout a process that only yields benefits in its late stages.  Here is an example of such a fallacy (I'm not quoting here from Peliti's paper):

Dave: "If I slowly buy up a whole bunch of auto parts at the  junkyard, and assemble them into a car, in about 5 weeks I'll have a car I can sell for $1000. 1000 divided by 5 is 200. So I'll get an average of $200 a week, and that will pay my rent while I'm working on the car." 

Dave has committed here the benefit-at-every-step fallacy, just like Peliti.  Dave's labors will not actually yield any benefits until he has finished, or almost finished, the car he is working on.

A vision system is only one very complex thing that we would need to explain to naturally explain the origin of creatures such as humans. By far the biggest explanatory problem is explaining the origin of protein molecules. There are more than 20,000 different types of protein molecules in the human body, most specialized for some particular functional purpose. In the animal kingdom there are more than 10 billion different types of protein molecules, most of which are properly considered as complex inventions. The science site here estimates that there are between 10 billion and 10 trillion “unique protein sequences” in the animal kingdom. Most protein molecules have very hard-to-achieve folds necessary for their functionality. The science site here says “there are few common folds” in the protein universe. This means we have billions of complex biological inventions that we need to account for, for there are billions of different types of protein molecules in the natural world, each its own distinct complex invention.

When considering the appearance of protein molecules, we would have to use a late-yielding benefit calculation as shown in Graph 2 above, and not at all a straight-line benefit calculation as shown in Graph 1 above. It is certainly not true that the formation of 10% of a protein molecule gives you 10% of the benefit of the full molecule, or that the formation of 20% of a protein molecule gives you 20% of the benefit of the full molecule. It is instead usually true that having 50% of a protein molecule leaves you with a completely useless molecule. To be functional, protein molecules require complex hard-to-achieve folds. If you remove any 50% of their amino acid sequence, the protein molecules will not be functional, for reasons such as they cannot achieve the folds on which their functionality depends. Almost certainly, a large percentage of protein molecules will be nonfunctional if they have only 70% of their amino acids.  (See the end of this post for some scientific papers supporting these claims.)  As a biology textbook tells us, "Proteins are fragile, are often only on the brink of stability."

Indeed, a large percentage of protein molecules are not even independently functional if they have 100% of their amino acids, because of external functional dependencies of such protein molecules.  According to this source, between 20 to 30 percent of protein molecules require other protein molecules (called chaperone molecules) for them to fold properly.  Very many other protein molecules are not independently functional, because their function only arises when they act as team members in complex teams of proteins called protein complexes.  The answer to the question "what good is 50% of a protein molecule" is usually "no good at all," and in regard to a large fraction of protein molecules, the correct answer to the question "what good is 100% of a protein molecule" is "no good at all, unless there also exists one or more other protein molecules that the first protein molecule requires to be functional." 

All over the place in biology we see extremely complex innovations and intricate systems requiring many correctly arranged parts for minimal functionality, particularly when we look at the byzantine complexities and many "chicken or the egg" cross-requirements of biochemistry. That's why a physicist recently told us that even the simplest bacterium is more complex and functionally impressive than a jet aircraft. It is in general true that the more complex or intricate an innovation or system (biological or otherwise), the more likely that it falls under a late-yielding benefit calculation (as depicted in Graph 2) in which something like a "halfway yields nothing" assumption is correct, and the less likely that such an innovation or system falls under a "benefit at every step" calculation (as depicted in Graph 1) in which we can assume that the addition of each part produces a benefit.  It is a gigantic mental error to assume or insinuate that very complex  biological innovations should be typically described by a simplistic "benefit at every step" calculation such as depicted in Graph 1. But our ideologically motivated biologists have been making such an error for well over a century, and Peliti has merely repeated their error. 

Later in his “Evolution and Probability” paper, Peliti appeals to the concept of “selective pressure.” That term (also called "selection pressure") is simply a variation of "natural selection," a variation designed to impress us with the idea that something like physics is going on (even though nothing like that is at play).  

Here is a case where something a little like “selection pressure” might actually occur. Let us imagine a population (a group of organisms of the same species) where 10% of the organisms have some favorable trait that makes it more likely for them to survive and reproduce, and 90% of the organisms do not have such a trait. Then, over many generations, it might be that this trait might become more common in such a population, because those with the trait might reproduce more. In such a case, over many generations the percentage of the population with such a trait might increase. The 10% of the population with the trait might increase to 15%, then 20%, and so forth. One could use the term “selection pressure” to refer to such a tendency, although the term is actually doubly misleading; for there is no actual selection occurring (unconscious nature is not making a choice), and there is no actual pressure involved (pressure being a physical pushing force that is not occurring in this case).  The fact that "natural selection" does not actually describe a choice or selection is why Charles Darwin wrote in 1869,  "In the literal sense of the word, no doubt, natural selection is a false term."

But let us imagine some complex biological innovation that yields no benefit until more than 70% of it appears. It would be absolutely wrong and misleading to claim that the biological innovation had occurred because a “selection pressure” had pushed nature into forming such an innovation. During the first half of the imagined gradual formation process, there would have been no benefit from the appearance of 5% or 10% or 15% or 25% of such a biological innovation. So there would be no “selection pressure” at all that would have caused nature to pass through such stages.

But our biology dogmatists have constantly ignored such obvious truths, and have used the term “selection pressure” in an utterly inappropriate way. They will take some fantastically improbable biological innovation, and speak as if “selection pressure” pushed such a thing into existence. In reality, selection pressure would never occur in regard to any biological innovation until that innovation first appeared in some beneficial form. It is a complete fallacy to talk about “selection pressure” causing novel biological innovations whenever such innovations are late-yielding innovations that are nonfunctional in a fragmentary form.

Peliti is one of the biology theorists who has misused the concept of selection pressure. His error is shown in the passage below from his “Evolution and Probability” paper:

We can summarize this discussion by stressing two main points: one, that an evolutionary history that appears extremely improbable a priori looks much more probable a posteriori, looking back from the arrival point, since a constant selective pressure has acted during the whole path, favoring advantageous variants with respect to disadvantageous ones.”

Here Peliti is again guilty of the "benefit-at-every-step" fallacy so often committed by biology dogmatists.  Because very complex biological innovations such as protein molecules and vision systems and reproductive systems usually do not provide any benefit when only small fractions of them exist (and very often do not provide any benefit when less than 60% or 70% of them exist), it is absolutely erroneous to assume that a “constant selective pressure has acted during the whole path” to produce complex innovations, while thinking along the lines of “every step provides a benefit.” There would typically be no selection pressure at all (and no benefit to survival or reproductive fecundity) until such extremely complex and hard-to-achieve biological innovations achieved a functional threshold, which would very often require fantastically improbable arrangements of matter that we would never expect to occur by chance (or by Darwinian evolution) in the history of the universe.  

Besides the fact that thinkers such as Peliti cannot credibly explain the origin of protein molecules such as those used in vision systems, there is the gigantic problem that no imaginable change in DNA can explain biological innovations such as vision systems, because the structure of such systems is nowhere specified in DNA. Contrary to the frequent mythical statements made about DNA being a blueprint for a human or a recipe for making a human, the fact is that DNA only specifies very low-level chemical information like the amino acid sequences of proteins. Not only does DNA not specify the structure of a human eye, it does not even specify the structure of any of the specialized cells used by the eye (such as the specialized cells used by cones and rods of the eye), or the structure of any other cells.  So any claim to explain the appearance of vision systems based on changes in DNA (with or without natural selection and with or without lucky random mutations) is futile.  The fact that DNA does not specify biological body plans or phenotypes is discussed very fully here

Through abused phrases such as "selection pressure," our biology theorists confuse us about probabilities.  The flowchart below helps to clarify the reality.  Consider some imagined path of biological progression.  The first question to ask is: was there a plan or idea matching the end result? If you think the answer is no, then go to the second question: was there a will or intention to improve the end result? If you think the answer to that question is no, then assume the progression occurred through a "random walk" effect like that of typing monkeys.  In the case of Darwinian evolution, there is no plan or idea for any of the results, nor is there any will or intention to achieve the results.  Darwinian evolution is properly described as a "typing monkeys" random-walk type of effect, with all the mountainous improbabilities associated with such a thing.  Our biologists often tell us that "evolution is a tinkerer," although that analogy is false and deceptive.  A tinkerer is an agent willfully attempting to improve something by using trial and error. There is no such willful agent present in Darwinian evolution. 

evolution flowchart

From an information standpoint, the vast amount of functional information in genomes is comparable to books in a library. The most accurate analogy for Darwinian evolution is the analogy of typing monkeys producing a large library of impressive works of literature (or functional information such as computer programs),  despite prohibitive odds against such a result. In fact, at the end of this recent essay, an evolutionary biologist ends up candidly comparing Darwinian evolution to typing monkeys.  The keystrokes of the monkeys are analogous to random mutations.  If you imagine one or more skyscrapers filled with typing monkeys, and a roving editor in each skyscraper searching for useful typed prose made by the monkeys, and making copies of such miracles of chance if they ever occur, with such a roving editor being analogous to natural selection, you will have a good analogy for the theory of Darwinian evolution. 

I discussed how fine-tuned and sensitive to changes protein molecules are.  Further evidence for such claims can be found in this paper, which discusses very many ways in which a random mutation in a gene for a protein molecule can destroy or damage the function or stability of the protein.  An "active site" of an enzyme protein is a region of the protein molecule (about 10% to 20% of the volume of the molecule) which binds and undergoes a chemical reaction with some other molecule.  Below are only a fraction of the examples of protein sensitivity and fragility cited by the paper:

"If a mutation occurs in an active site, then it should be considered lethal since such substitution will affect critical components of the biological reaction, which, in turn, will alter the normal protein function...Even if the mutation does not occur at the active site, but quite close to it, the characteristics of the catalytic groups will be perturbed....Changing the reaction rate, the pH, or salt and temperature dependencies away from the native parameters can lead to a malfunctioning protein....An amino acid substitution at a critical folding position can prevent the forming of the folding nucleus, which makes the remainder of the structure rapidly condense. Protein stability is also a key characteristic of a functional protein, and as such, a mutation on a native protein amino acid can considerably affect its stability....When a protein is carrying its function, frequently the reaction requires a small or large conformational change to occur that is specific for the particular biochemical reaction. Thus, if a mutation makes the protein more rigid or flexible compared to the native structure, then it will affect the protein’s function."

A very relevant scientific paper is the paper "Protein tolerance to random amino acid change." The authors describe an "x factor" which they define as "the probability that a random amino acid change will lead to a protein's inactivation." Based on their data and experimental work, they estimate this "x factor" to be 34%. It would be a big mistake to confuse this "x factor" with what percentage of a protein's amino acids could be changed without making the protein non-functional.  An "x factor" of 34% actually suggests that almost all of a protein's amino acid sequence must exist in its current form for the protein to be functional.  

Consider a protein with 375 amino acids (the median number of amino acids in humans).  If you were to randomly substitute 4% of those amino acids (15 amino acids) with random amino acids, then (assuming this "x factor" is 34% as the scientists estimated), there would be only about 2 chances in 1000 that such replacements would not make the protein non-functional.  The calculation is shown below (I used the Stat Trek binomial probability calculator). So the paper in question suggests protein molecules are extremely fine-tuned, fragile and sensitive to changes, and that more than 90% of a protein's amino acid sequence has to be in place before the molecule is functional. 

Figure 1 of the paper here suggests something similar, by indicating that after about 10 random mutations, the fitness of a protein molecule will drop to zero.