Rafiki Home Rafiki Store Learn about Code World Explore geometry

<< Back

This should be controversial

The idea that primary sequence alone determines tertiary structure in protein folding should be a controversial idea.

"a beautiful example of how an entirely acceptable conclusion can be reached that is entirely wrong because of the paucity of knowledge at that particular time. I spent the following 15 years or so completely disproving the conclusions reached in this communication."

Christian B. Anfinsen
Comment made in 1989 regarding earlier work on the structure of RNase

There is no doubt that Christian Anfinsen was a great contributor to the body of scientific knowledge. His main contribution was in the field of protein folding, and within that field one particular conclusion sticks out: Primary sequence determines tertiary structure. For this conclusion in 1954 Anfinsen shared a Nobel prize in 1972. The origin of the idea, 'the thermodynamic hypothesis of protein folding,' can be glimpsed at the end of the famous 1954 paper:

This hypothesis should still be controversial. Where is the evidence to support the radical conclusions that were subsequently made?

Anfinsen did not provide the evidence to support all of his conclusions. It is unquestioned - even by him - that he did not have the ability to determine the shape of even a single protein. He certainly had no way to confirm a theory regarding the shapes of all proteins. This is in fact what he was looking for, and empiric confirmation is still lacking. It will be difficult to obtain, because his conclusions are fundamentally flawed.

The confusion (and I got caught by this initially as well) is due to the fact that Anfinsen actually proposed two radical ideas simultaneously. They are almost always taken to be the same idea, but they do not necessarily go together. I agree that the first idea was all but proven - but the other idea is vastly more radical, and wasn't proven by his experiments. Here's the breakdown of the two ideas:

1. Due to thermodynamic molecular forces, polypeptides automatically assume unique, stable conformational ensembles. This might also be termed the auto-assembly hypothesis of protein synthesis.

2. For every sequence of amino acids there is a unique, defining conformational ensemble to which it must auto-assemble.

Anfinsen did a fabulous job of validating point #1, but didn't even scratch the surface on point #2. Conversely, it should be simple to disprove the second idea, and in fact it might already be disproved. This is primarily because proving the idea requires proof of a negative: specifically that polypeptides in physiologic conditions cannot consistently fold more than one way. Accepting that they can and do fold consistently in more than one way requires a mere handful of examples where this is known to happen. (For a nice overview of protein folding go here: Unraveling the Mystery of Protein Folding )

The easiest proof of this 'multi-target' view of folding is a prion. Prions are proteins involved in bizarre infectious diseases, such as mad cow disease, where normal proteins are forced to assume shapes different from their 'native state'. Regardless of the mechanism, a prion is an example where the same sequence defines at least two different proteins, and in all probability many different proteins. This argument extends to other diseases as well, diseases generally described as amyloidosis. It is believed to be the process behind Alzheimer's and several forms of cancers to boot.

When this is pointed out to cool aid drinkers the protests, excuses and apologies fly. For some reason, prions don't count, but the stark reality is that it is thermodynamically possible to make two distinct proteins from the same sequence of amino acids in physiologic conditions. They say, "there are exceptions to every rule." OK, show me a case where the rule actually holds. It should be simple to take a protein, pepper its nucleotides with copious (not just a few) 'silent mutations' and then thoroughly demonstrate that the protein remains completely unchanged. If this study exists, I have yet to find it, and in fact we can find the converse.

The single-target model itself flies in the face of common sense, and actually seems rather absurd, so shouldn't we require at least some indisputable empiric demonstrations of this cherished model, rather than handfuls of 'everyone knows' antecdotes?

Today, with a vastly larger amount of more sophisticated evidence, the accepted hypothesis fails. The whole issue must be placed back in the context of information theory. What Anfinsen essentially proposed was that the only information that must be extracted from nucleotide sequences in translation is residue sequence. The rest is autopilot. This theory is compelling not because it is consistent with the data, but because, in the words of Anfinsen, it is a 'considerable simplification.' In fact, it is an over-simplification. It effectively collapses the information content of a protein to residue configuration alone. Preposterous.

Dogma recognizes two protein states: 1. random coil, 2. native state. This makes a tautology of 'the protein folding problem'. If the investigation of folding begins with the stipulation of a single uniform, high-energy state - random coil - and it is assumed that the result of folding will inevitably find a single, stable low-energy shape, what is there left to decide?

Sequence and structure are not equivalent!

The sequence of amino acids in a polypeptide string is a major component of the information in a protein structure, but it is clearly not the only component. The issue now should be to identify the other components and elucidate the information mechanisms that deliver them to the final structure. This must begin by questioning the unproven assumptions behind the two states of protein folding.

The three basic issues are:

1. How many target structural ensembles are thermodynamically available for an amino acid sequence when it folds?

2. How many distinct conformations emerge post translation - during or just prior to definitive folding?

3. What is the correlation, or what are the folding pathways between the two sets, the sets of possible initial and final conformations of amino acid sequences?

If there is only a single possible state to either the initial or final conformations of every sequence, then we can happily go about our business as usual. However, if there are in fact multiple initial conformations coming out of translation, multiple final conformations, and a correlation between the two, then investigations of protein folding must take a completely different tact.

The implications of accepting the dogmatic sequence=single structure viewpoint are paramount to our view of the genetic code. If it is correct that primary sequence and only primary sequence determines a single tertiary structure, then the model can be rehabilitated, but if it is false then today's one-dimensional model of the genetic code is beyond repair.

If the paradigm were actually correct we should expect to see certain things, irrefutable evidence to support it, but there is nothing to presently justify our blind faith in it. Anfinsen did not justify a belief in the linear paradigm of the genetic code. There are boundless accounts of investigator's experience that suggest the linear paradigm is secure, but conspicuously there is no disciplined proof. There are no well-designed studies to confirm the single-target hypothesis. Whereas there absolutely should be a famous study easily pointed to, reassuring us that the axiom sits on a rock-solid foundation. Anfinsen did not provide it - could not provide it - so where is the subsequent definitive study to fill this important void?

Common sense and the empiric evidence points in the other direction. There is in fact more than sequence information determining the native conformation of proteins, and ultimately the physiologic behavior of entire protein populations.

The logical view is that the genetic code is more subtle, more powerful, and more complex than the beloved, over-simplified paradigm has led us to believe. There is a tremendous amount of work to be done before we can claim that this code has in fact been cracked!

Good questions with links to a few empiric answers:

Besides folded conformations, what are some of the accepted ways that protein populations are known to change with 'silent mutations'?
http://nar.oupjournals.org/cgi/content/abstract/26/20/4778
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11049749&dopt=Abstract

Is there any proof that synonymous substitutions are associated with structural differences?
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9680473&dopt=Abstract
http://nar.oupjournals.org/cgi/content/full/27/1/268 Quote from the paper: "These results support the view that structure-related synonymous codon bias is a general phenomenon found in all major taxonomic groups of organisms."
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=8897597&dopt=Abstract

Does a 'silent mutation' actually change protein folding?
Silent mutations affect in vivo protein folding in Escherichia coli.

Does a synonymous substitution actually have an evolutionary impact?
http://www.american.edu/cas/bio/faculty_media/carlini/Carlini&Stephan2003.pdf
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=140546

More important than the overwhelming evidence against the idea of sequence-only folding is the lack of evidence for it. This theory, that is so cherished, should be proven before we continue to cherish it. Why do we think that translation and subsequent folding should be perceived as a sequence-only process? The genetic code has the structure in place to go well beyond sequence during translation. Why do we think it couldn't or doesn't?

There is more involved in making a protein than translating a sequence of amino acids, and there is more genetic information than residue identity stored in the nucleotide sequence. This information must exist in some form and somehow get translated to the native conformation. How?

Where is the intellectual curiosity?
Where is the skepticism of dogma that frankly seems absurd?
Where is the controversy and debate?
Why do people get so mad when these questions are raised?

If the proof to whether or not sequence really is the only determinant of structure is so certain, then it should be readily available. 'Everybody knows' is not an adequate defense of this position, because it is too easily rebutted by 'liar liar pants on fire'. More likely, science has fallen asleep at the switch, and we all do get a bit surly upon abrupt awakening.

My ears are open - send in the proof.

<< Back


<Top> - <Home> - <Store> - <Code World> - <Genetic Code> - <Geometry>


Material on this Website is copyright Rafiki, Inc. 2003 ©
Last updated October 9, 2003 9:06 AM