Follow this link to go to the text only version of nasa.gov
spacer
spacer
spacer
NASA - National Aeronautics and Space Administration
spacer spacer

spacer
+ NASA Ames Research Center
+ NASA Astrobiology Institute

spacer Find it at NASA
spacer Navigation Button Go
spacer

Ames Team NewsHome ButtonNavigation Button ResearchNavigation Button Education and Public OutreachNavigation Button Team MembersNavigation Button CVsNavigation Button PublicationsNavigation Button Photo Gallery

Navigation Button Related Links Navigation Button SitemapNavigation Button Contact Us

Ames Research Center

 

spacer

Linking Our Origins to Our Future - RESEARCH

Andrew Pohorille - Lead Co-Investigator - Investigation 3Research - Origin and Early Evolution of Proteins and Metabolism

Horizontal Line

Hypotheses

We study the origin and early evolution of the peptides and proteins that putatively carried out metabolic functions in the ancestors of contemporary cells. There are two competing models that describe the possible origins of these molecules. According to one model, catalysis of biochemical reactions was initially carried out by RNA enzymes (ribozymes). The transition from "the RNA World" to the modern, protein-dominated biological world was bridged by the evolution of ribozymes that synthesized peptides in a non-coded fashion. This was followed by the coded evolution of short peptides, and then of longer proteins. An alternative model begins with an autocatalytic network of peptides, which evolved to a state of increasing metabolic complexity prior to the evolution of RNA-coded information. Although these two models are quite different they share a common feature - in both of them the existence of proteins not coded by nucleic acids is assumed. To evaluate the validity of this assumption and distinguish between the two models, it is critical to establish the probability that random peptides can perform protocellular functions, especially those that RNA is unlikely to perform by itself, such as the transport of specific ions across membranes. In order to understand the subsequent evolution towards modern proteins, it is also essential to determine the structure of the first functional peptides and collective properties of networks of chemical reactions catalyzed by these peptides.

Specific Objectives

We have three main objectives:

  1. To reconstruct in the laboratory the emergence of early catalytic functions by constructing for the first time models of protobiologically ubiquitous protein enzymes. We will identify their structures, properties, relationships to known proteins performing similar or different functions, and the ability to evolve to proteins with altered functions.
  2. To select peptides that assemble into transmembrane channels capable of transporting ions across membranous cell walls which was, most likely, one of the earliest cellular functions. This function is very difficult (and perhaps impossible) to achieve using RNA molecules because they do not partition into membranes.
  3. To examine the evolutionary potential of a collection of proteins in the absence of self-replicating mechanisms. We will determine under what conditions evolution could progress and assess whether these conditions were protobiologically plausible.

The first three goals are pursued mostly experimentally. At the center of this effort is in vitro selection approach that has already been applied to obtain ATP-binding proteins. The last goal is pursued through theoretical and computational modeling. The experimental and theoretical approaches complement each other. Experiments provide estimates of efficiencies with which non-coded proteins can catalyze biochemical reactions. These estimates are critical parameters for modeling non-genomic evolution. Computer modeling, in turn, aids interpretation of experimental results in terms of protein structure, evolution and mechanisms of action. Finally, the proposed modeling effort can guide future experiments aimed at demonstrating the emergence of coordination between different metabolic processes using the suite of protein enzymes created in the laboratory.

Significance

The proposed work will advance our understanding of the physical and chemical principles underlying the origins of life, as outlined in Goal 3 of the Astrobiology Roadmap. Specifically, it directly addresses Objective 3.4 devoted to investigating the origins and early coordination of key cellular processes such as metabolism, energy transduction and translation. We will follow the main approach outlined in the Roadmap – to create and study artificial chemical systems that undergo natural selection in the laboratory, without regard to how life actually emerged on Earth. Our research will be a step towards the development of a broader discipline, a "Universal Biology", as described in the Roadmap.

Specifically, the proposed studies will yield novel, essential information about the origin and evolutionary potential of the earliest biopolymers that facilitated the chemical reactions supporting life. Successful selection of novel enzymes that are not derived from biological proteins could shed new light on the potential of life beyond earth and open new avenues for biotechnology.

In vitro Evolution of Model Protobiological Proteins

We have used a novel in vitro selection technique (Roberts and Szostak, 1997) to select ATP-binding proteins from six trillion random polypeptides (Keefe and Szostak, 2001). It yielded four new protein families, each containing proteins with highly similar amino acid sequences, that were unrelated to each other or to anything found in the current protein databases. Because they were selected from a random-sequence library, these proteins can be considered as the best currently available models of protocellular proteins.

 

Figure

In vitro selection of ATP-binding protein

Proteins from one family have been characterized in fair detail and were found to form folded structures. The originally selected protein contained 80 amino acids but deletion studies revealed that the minimal binding unit is less than 50 amino acids long and, thus, is the smallest known ATP-binding protein. The proteins are highly selective towards ATP, as they bind neither guanosine triphosphate (GTP) nor cyclic AMP. However, their sequences do not contain any known ATP-binding motifs. To function, they require zinc ions and contain conserved cysteine residues.

Recently, the high resolution, three-dimensional structure of a protein from the family was solved using X-ray crystallography (LoSutro et al., 2004). As all biological, water-soluble proteins, this structure has a hydrophobic core, but exhibits a novel fold. It consists of a three-stranded antiparallel b-sheet and two nonadjacent a-helices. ADP is stabilized in the binding pocket by stacking interactions with phenylalanine and tyrosine residues and by hydrogen bonds to several side chains in the protein. Selectivity of binding appears to be insured by hydrogen bonds between the N1, N3 and N6 of adenine and methianine-45 and glycine-63. A zinc ion is coordinated by the conserved cysteines in a region not adjacent to the binding pocket.

Figure

Structure of 1uw1 shown as ribbon. The different domains are labeled (h1,h2 refer to helix1 and 2; S1-3 are the beta sheets, L1-L4 are loop regions). In this representation, the ADP molecule can be seen as sticks. The Zn2+ atom is shown as a white sphere.

Figure

Ribbon view of 1uw1 showing two hydrogen-bonding pairs that help attach h1 to the rest of the protein. The steady positioning of h1 is critical to the placement of L1, which contains half of the Zn-finger domain.

Figure

Color-coded ribbon view showing inter-chain hydrogen bonding. Like colored residues interact. The interconnected domains add stability to the protein, and restrain the loops L1 and L2 which are critical for Zn, and ADP binding respectively. Note that in this representation, the Zn atom is shown as a white sphere, nestled between L1, L2, and h2. The h-bonding pairs CYS17-ALA23 and ARG25-PRO16 help form a lasso in the middle of L1, and which provides one half of the Zn-finger. Likewise, ASN38, GLY57, TYR55, and VAL18 form a hydrogen bonded network using a combination of sidechain, and backbone interactions. This network helps localize helix h2, which contains one of the CYS residues in the Zn finger.

Figure

Closeup of the Zn-finger domain. The four CYS residues tetrahedrally bind the Zn2+ ion. These cysteins are located in L1, H1, and L3. The ADP ligand is tucked into a pocket formed by L1, L3, L4, and h2. The adenine ring is very close in proximity to the Zn ion.

(Click on RELOAD button to play movie)

To determine if the selected protein can be optimized for improved folding stability we performed multiple rounds of m-RNA display selection under increasingly denaturating conditions (Chaput and Szostak, 2004). Starting from a pool of protein variants, we evolved a population of proteins capable of binding ATP in 3 M guanidine hydrochloride. One protein was chosen for further characterization. Circular dichroism, tryptophan fluorescence and 1H-15N correlation NMR studies show that this protein has unique folded structure.

We have used residual dipolar coupling NMR experiments to show that this protein has the same fold as the protein solved by crystallography, despite over 20% sequence divergence. We are currently analyzing the contributions of the individual mutations to the stability of the folded protein.

Also, we initiated large-scale molecular (MD) dynamics computer simulations of the protein. In the course of 25 ns MD trajectory the structure with bound ADP remained stable and close to the X-ray structure. In contrast, in the absence of ADP the binding pocket opened up. The protein, did not unfold and its structure remained essentially unchanged in the last 20 ns of a 50 ns trajectory.

What We Are Currently Doing

We have started with the RXR DNA binding domain, randomized the two loops of the zinc fingers, and generated an mRNA-display library of over 1014 variants. From this library we have selected and characterized a series of ATP binding proteins. This work shows that functional proteins are more abundant in libraries built from a protein scaffold that has a stable folded structure, compared to random sequence libraries.

From the same library, we have selected variants that catalyze an RNA ligation reaction. These novel catalysts are currently being optimized for greater activity and stability prior to detailed biophysical characterization.

We have simulated the original protein bound to GDP and we are currently calculating the free energy difference of binding ADP and GDP. The ultimate goal is to redesign the protein specificity from binding ADP to binding GDP.

Protein Evolution Without a Genome

The concept of non-genomic evolution. Our recent findings that truly different, simple peptides (Keefe and Szostak, 2001) can perform the same function (such as ATP binding) provide experimental support for a novel mechanism of early protobiological evolution without a genome. The central concept underlying this mechanism is that the reproduction of cellular functions alone was sufficient for self-maintenance of protocells, and that self-replication of macromolecules was not required at this stage of evolution. The precise transfer of information between successive generations of the earliest protocells was unnecessary and, possibly, undesirable. The key requirement in the initial stage of protocellular evolution was an ability to rapidly explore a large number of protein sequences in order to ``discover'' a set of molecules capable of supporting self-maintenance and growth of protocells. Undoubtedly, the essential protocellular functions were carried out by molecules not nearly as efficient or as specific as contemporary proteins. Many, potentially unrelated sequences could have performed each of these functions at an evolutionarily acceptable level. As evolution progressed, however proteins must have performed their functions with increasing efficiency and specificity. This, in turn, put additional constraints on protein sequences and the fraction of proteins capable of performing their functions at the required level decreased. At some point, the likelihood of generating a sufficiently efficient set of proteins through a non-coded synthesis was so small that further evolution was not possible without storing information about the sequences of these proteins. Beyond this point, further evolution required coupling between proteins and informational polymers that is characteristic to all known forms of life. The emergence of such coupling must be postulated in any scenario of the origin of life, no matter whether it starts with RNA or proteins.

Modeling Approach

To examine the evolutionary potential of a non-genomic system, we have developed a simple, computationally tractable model, which is still capable of capturing the essential features of the real system. In this model, protocellular walls are permeable to small molecules and amino acids but not to oligopeptides of any length. Within the protocells, chemical reactions are catalyzed by peptides, albeit possibly with low efficiency and specificity. Protocells can grow either by acquiring amphiphilic material from the environment or by producing it internally. Once the protocells reach sufficient size they can divide, distributing its content between the two "offspring" protocells.

In our model, which is stochastic in nature, the specific identities of the amino acids forming peptides are not considered. Instead, the key quantity is the probability distribution of finding a peptide with a given efficiency of catalyzing a desired function, irrespective of its sequence. In this case, efficiency can be thought of as the inverse of the turnover rate. Biochemical considerations dictate that the efficiencies of short peptides increase only slightly with the length of the polymer. Only when peptides reach lengths sufficient for them to adopt an ordered three-dimensional structure do the average efficiencies increase markedly with length. For even longer polymers, in which catalytic centers have already been formed, gaining additional length no longer produces significant improvement in catalytic properties. In the current formulation, it is assumed that the catalytic efficiencies of peptides of a given length are distributed normally. Other distributions, such as the decaying exponential or Gram-Charlier (distorted normal), can also be readily implemented.

Considering efficiencies of proteins without explicit reference to their sequences is also motivated by practical reasons. According to the canonical view of the structure-function relationship in proteins, the sequence of amino acids determines the three-dimensional structure of a protein, which, in turn, determines its function (Creighton, 1992). Thus, in principle, there is good correspondence between sequence and function. This might suggest an approach, in which large libraries of sequences are generated on the computer and then each peptide is assigned function and efficiency based on its sequence. However, despite extensive efforts, the nature of the sequence-function relationship in proteins has not yet been unraveled and, therefore, cannot be used in practice.

Central to our model of protein evolution is the emergence of protoenzymes forming peptide bonds (ligases). A simple ligase has already been developed experimentally by Ghadiri et al. (Severin et al. , 1997) and we expect to evolve other ligases, as described in section 3.3.3.2. Most of the peptides generated in the model are disordered. Since the ability to adopt ordered structure is the prerequisite for efficient catalytic activity these peptides are non-functional or only weakly functional. However, a few of the newly synthesized peptides are better ligases than the peptides that generated them. They, in turn, ligate even more peptide bonds and, by doing so, increase the repertoire of peptides in the protocellular system. As a consequence, the likelihood of finding an even better ligase increases. When two peptides are joined to form a new peptide, the catalytic properties of the product are chosen from a probability distribution contingent upon the properties of the peptides from which the new peptide was formed. This formulation captures the biochemical intuition that when two functional peptides are joined, the catalytic center of the product will be "inherited" from one of the parent peptides (although there is also a finite probability of forming a new catalytic center).

Some of the peptides generated by ligases act as proteases and hydrolyze the already formed peptide bonds. Peptide bonds in disordered and, therefore, non-functional molecules are more likely to be exposed to the aqueous medium than bonds in structured peptides. Since proteases require water for their function this means that they preferentially destroy non-functional peptides. This property is incorporated into our model. As in the case of ligases, the catalytic efficiencies of protein fragments cleaved by proteases are related to the efficiency of their "parent". The conditional probabilities of ligation and hydrolysis are not independent, however. In real proteins, joining two peptides and then cutting the newly formed bond reproduces the original two peptides. If the amino acid sequences are not explicitly considered this property cannot be exactly captured. However, by relating the two conditional probability distributions by Bayes' Theorem, we can preserve this relationship for the population of peptides.

Using Monte Carlo methods, we have already simulated in detail the behavior of a simple system composed of only ligases and proteases (New and Pohorille, 2000). That paper also provides mathematical detail of our model. We found that over a fairly wide range of parameters the number, length and overall catalytic efficiency of peptides in the system increases, and eventually reaches a steady state. The increase is determined by the balance between ligating and proteolytic activities and the bias towards the destruction of unstructured peptides. These conclusions were quite robust with respect to other parameters of the model, including the shape of the probability function.

Figure

Simulated probability, p(e), of finding ligating peptides with different efficiency (in arbitrary units) vs. the same probability expected from the Inherited Efficiencies Model, pexp(e). Most peptides exhibit no catalytic activity, but there is a small population of peptides, which act with catalytic efficiency approximately distributed according to a broad Gaussian. Specific numbers depend on the parameters of the model.

The simple, two-function model is too restricted to describe the emergence of novelty (emergent properties) in the system. To capture these features and to provide a more biochemically faithful description of possible protein evolution, the current model has to be extended in two directions. First, we assume that some of the newly produced peptides can catalyze reactions other than ligation and hydrolysis. Examples of such reactions are pathways and cycles that lead to the utilization of external energy for activating reactants with high-energy groups (e.g. thioesters), synthesis of amino acids, membrane-forming amphiphiles and nucleic acids, and metabolism of small molecules. Several such pathways and cycles have been postulated on experimental grounds. They may couple constructively to peptide synthesis, allow for protocellular growth and division and provide links to systems that involve both proteins and nucleic acids.

Second, we have added to the model new features that allow longer peptides to increase not only catalytic efficiency but also specificity. Specificity toward different substrates is determined from the set of descriptors assigned to each peptide. A similar approach is commonly taken in Qualitative Structure-Activity Relationship (QSAR). Of course, in our case the good descriptors and their relation to functions and specificities are not known, so they have to be assigned somewhat arbitrarily. However, they could be, in principle, established experimentally. Furthermore, we will systematically investigate the robustness of our results with respect to the mapping between descriptors and specificity. In particular, during ligation and hydrolysis, values of the descriptors propagate according to the same probabilistic rules as catalytic efficiency.

What We Expect to Establish

Our simulations are aimed at determining conditions that are necessary for evolution of a population of proteins in increasingly complicated systems. Examples of issues that we focus on are:

  • What are the frequencies of finding functional peptides that allow for evolution of the system and how do they compare with the frequencies estimated experimentally;
  • How does the balance between constructive and destructive processes (including substrate and product inhibition and possible emergence of useless pathways) influence evolutionary potential of the system;
  • Can we observe self-organized pathways and auto-catalytic cycles and what is the degree of complexity of the system in which they emerge;
  • How compartmentalization of proteins in vesicles influence their evolution;
  • How robust are the results with respect to the change of different parameters of the model?

A simple model of reaction (metabolic) networks catalyzed by functional proteins existing among random sequences has been developed and is being studied computationally. Biochemically plausible rules for identifying populations of functional proteins have been formulated in previous years of this project. By investigating large populations of networks it has been demonstrated that their subset can self-organize and evolve towards increasing complexity even in the absence of a genome. Networks can be classified into families (species) that persist even though individual networks disintegrate or transform with time. As the environmental conditions change, so do relative populations of different families. Initial results indicate that many concepts, such as speciation, developed in the context of genomic evolution may also hold for conditions in the absence of a genome. We are currently refining the model to capture main features of protobiological catalysis.

Ames Team Members Participating in this Investigation:

NAME

ROLE

ORGANIZATION

EMAIL

Pohorille, Andrew

Lead Co-Investigator

NASA Ames Research Center

pohorill@raphael.arc.nasa.gov

Chaput, John

Collaborator

Arizona State University

john.chaput@asu.edu

Seelig, Burckhard

Collaborator

Harvard Medical School

seelig@molbio.mgh.harvard.edu

Szostak, Jack

Co-Investigator

Massachusetts General Hospital

szostak@molbio.mgh.harvard.edu

Wilson, Michael

Co-Investigator

University of California, San Francisco

mwilson@mail.arc.nasa.gov

See the following Ames Team research pages:

Formation and Evolution of Habitable Planets
Prebiotic Organics from Space
Origin and Early Evolution of Proteins and Metabolism
Biosignatures in Chemosynthetic and Photosynthetic Systems
Modeling Ecosystems and Biospheres
Hind-Casting Past Environments
Interplanetary Pioneers

_____________________________________________________________________________________
Home | About the Ames Team | Research | Education and Public Outreach | Team Members
Curriculum Vitae | Publications | Photo Gallery | Related Links | Sitemap | Contact Us
NASA Ames Research Center | NASA Astrobiology Institute

FirstGov - Your First Click to the US Government
+ Inspector General Hotline
+ Equal Employment Opportunity Data Posted Pursuant to
spacerthe No Fear Act

+ Budgets, Strategic Plans and Accountability Reports
+ Freedom of Information Act
+ The President's Management Agenda
+ NASA Privacy Statement, Disclaimer,
spacerand Accessibility Certification

Click to visit the NASA Homepage


Editor: Colleen Howell
NASA Official: David Des Marais

Last Updated: June 12, 2008

+ Questions / Comments

spacer
spacer
spacer