|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
In vitro selection of ATP-binding protein |
Proteins from one family have been characterized in fair detail and were found to form folded structures. The originally selected protein contained 80 amino acids but deletion studies revealed that the minimal binding unit is less than 50 amino acids long and, thus, is the smallest known ATP-binding protein. The proteins are highly selective towards ATP, as they bind neither guanosine triphosphate (GTP) nor cyclic AMP. However, their sequences do not contain any known ATP-binding motifs. To function, they require zinc ions and contain conserved cysteine residues.
Recently, the high resolution, three-dimensional structure of a protein from the family was solved using X-ray crystallography (LoSutro et al., 2004). As all biological, water-soluble proteins, this structure has a hydrophobic core, but exhibits a novel fold. It consists of a three-stranded antiparallel b-sheet and two nonadjacent a-helices. ADP is stabilized in the binding pocket by stacking interactions with phenylalanine and tyrosine residues and by hydrogen bonds to several side chains in the protein. Selectivity of binding appears to be insured by hydrogen bonds between the N1, N3 and N6 of adenine and methianine-45 and glycine-63. A zinc ion is coordinated by the conserved cysteines in a region not adjacent to the binding pocket.
|
|
|
|
|
|
|
|
|
To determine if the selected protein can be optimized for improved folding stability we performed multiple rounds of m-RNA display selection under increasingly denaturating conditions (Chaput and Szostak, 2004). Starting from a pool of protein variants, we evolved a population of proteins capable of binding ATP in 3 M guanidine hydrochloride. One protein was chosen for further characterization. Circular dichroism, tryptophan fluorescence and 1H-15N correlation NMR studies show that this protein has unique folded structure.
We have used residual dipolar coupling NMR experiments to show that this protein has the same fold as the protein solved by crystallography, despite over 20% sequence divergence. We are currently analyzing the contributions of the individual mutations to the stability of the folded protein.
Also, we initiated large-scale molecular (MD) dynamics computer simulations of the protein. In the course of 25 ns MD trajectory the structure with bound ADP remained stable and close to the X-ray structure. In contrast, in the absence of ADP the binding pocket opened up. The protein, did not unfold and its structure remained essentially unchanged in the last 20 ns of a 50 ns trajectory.
We have started with the RXR DNA binding domain, randomized the two loops of the zinc fingers, and generated an mRNA-display library of over 1014 variants. From this library we have selected and characterized a series of ATP binding proteins. This work shows that functional proteins are more abundant in libraries built from a protein scaffold that has a stable folded structure, compared to random sequence libraries.
From the same library, we have selected variants that catalyze an RNA ligation reaction. These novel catalysts are currently being optimized for greater activity and stability prior to detailed biophysical characterization.
We have simulated the original protein bound to GDP and we are currently calculating the free energy difference of binding ADP and GDP. The ultimate goal is to redesign the protein specificity from binding ADP to binding GDP.
The concept of non-genomic evolution. Our recent findings that truly different, simple peptides (Keefe and Szostak, 2001) can perform the same function (such as ATP binding) provide experimental support for a novel mechanism of early protobiological evolution without a genome. The central concept underlying this mechanism is that the reproduction of cellular functions alone was sufficient for self-maintenance of protocells, and that self-replication of macromolecules was not required at this stage of evolution. The precise transfer of information between successive generations of the earliest protocells was unnecessary and, possibly, undesirable. The key requirement in the initial stage of protocellular evolution was an ability to rapidly explore a large number of protein sequences in order to ``discover'' a set of molecules capable of supporting self-maintenance and growth of protocells. Undoubtedly, the essential protocellular functions were carried out by molecules not nearly as efficient or as specific as contemporary proteins. Many, potentially unrelated sequences could have performed each of these functions at an evolutionarily acceptable level. As evolution progressed, however proteins must have performed their functions with increasing efficiency and specificity. This, in turn, put additional constraints on protein sequences and the fraction of proteins capable of performing their functions at the required level decreased. At some point, the likelihood of generating a sufficiently efficient set of proteins through a non-coded synthesis was so small that further evolution was not possible without storing information about the sequences of these proteins. Beyond this point, further evolution required coupling between proteins and informational polymers that is characteristic to all known forms of life. The emergence of such coupling must be postulated in any scenario of the origin of life, no matter whether it starts with RNA or proteins.
To examine the evolutionary potential of a non-genomic system, we have developed a simple, computationally tractable model, which is still capable of capturing the essential features of the real system. In this model, protocellular walls are permeable to small molecules and amino acids but not to oligopeptides of any length. Within the protocells, chemical reactions are catalyzed by peptides, albeit possibly with low efficiency and specificity. Protocells can grow either by acquiring amphiphilic material from the environment or by producing it internally. Once the protocells reach sufficient size they can divide, distributing its content between the two "offspring" protocells.
In our model, which is stochastic in nature, the specific identities of the amino acids forming peptides are not considered. Instead, the key quantity is the probability distribution of finding a peptide with a given efficiency of catalyzing a desired function, irrespective of its sequence. In this case, efficiency can be thought of as the inverse of the turnover rate. Biochemical considerations dictate that the efficiencies of short peptides increase only slightly with the length of the polymer. Only when peptides reach lengths sufficient for them to adopt an ordered three-dimensional structure do the average efficiencies increase markedly with length. For even longer polymers, in which catalytic centers have already been formed, gaining additional length no longer produces significant improvement in catalytic properties. In the current formulation, it is assumed that the catalytic efficiencies of peptides of a given length are distributed normally. Other distributions, such as the decaying exponential or Gram-Charlier (distorted normal), can also be readily implemented.
Considering efficiencies of proteins without explicit reference to their sequences is also motivated by practical reasons. According to the canonical view of the structure-function relationship in proteins, the sequence of amino acids determines the three-dimensional structure of a protein, which, in turn, determines its function (Creighton, 1992). Thus, in principle, there is good correspondence between sequence and function. This might suggest an approach, in which large libraries of sequences are generated on the computer and then each peptide is assigned function and efficiency based on its sequence. However, despite extensive efforts, the nature of the sequence-function relationship in proteins has not yet been unraveled and, therefore, cannot be used in practice.
Central to our model of protein evolution is the emergence of protoenzymes forming peptide bonds (ligases). A simple ligase has already been developed experimentally by Ghadiri et al. (Severin et al. , 1997) and we expect to evolve other ligases, as described in section 3.3.3.2. Most of the peptides generated in the model are disordered. Since the ability to adopt ordered structure is the prerequisite for efficient catalytic activity these peptides are non-functional or only weakly functional. However, a few of the newly synthesized peptides are better ligases than the peptides that generated them. They, in turn, ligate even more peptide bonds and, by doing so, increase the repertoire of peptides in the protocellular system. As a consequence, the likelihood of finding an even better ligase increases. When two peptides are joined to form a new peptide, the catalytic properties of the product are chosen from a probability distribution contingent upon the properties of the peptides from which the new peptide was formed. This formulation captures the biochemical intuition that when two functional peptides are joined, the catalytic center of the product will be "inherited" from one of the parent peptides (although there is also a finite probability of forming a new catalytic center).
Some of the peptides generated by ligases act as proteases and hydrolyze the already formed peptide bonds. Peptide bonds in disordered and, therefore, non-functional molecules are more likely to be exposed to the aqueous medium than bonds in structured peptides. Since proteases require water for their function this means that they preferentially destroy non-functional peptides. This property is incorporated into our model. As in the case of ligases, the catalytic efficiencies of protein fragments cleaved by proteases are related to the efficiency of their "parent". The conditional probabilities of ligation and hydrolysis are not independent, however. In real proteins, joining two peptides and then cutting the newly formed bond reproduces the original two peptides. If the amino acid sequences are not explicitly considered this property cannot be exactly captured. However, by relating the two conditional probability distributions by Bayes' Theorem, we can preserve this relationship for the population of peptides.
Using Monte Carlo methods, we have already simulated in detail the behavior of a simple system composed of only ligases and proteases (New and Pohorille, 2000). That paper also provides mathematical detail of our model. We found that over a fairly wide range of parameters the number, length and overall catalytic efficiency of peptides in the system increases, and eventually reaches a steady state. The increase is determined by the balance between ligating and proteolytic activities and the bias towards the destruction of unstructured peptides. These conclusions were quite robust with respect to other parameters of the model, including the shape of the probability function.

Simulated probability, p(e), of finding ligating peptides with different efficiency (in arbitrary units) vs. the same probability expected from the Inherited Efficiencies Model, pexp(e). Most peptides exhibit no catalytic activity, but there is a small population of peptides, which act with catalytic efficiency approximately distributed according to a broad Gaussian. Specific numbers depend on the parameters of the model.
The simple, two-function model is too restricted to describe the emergence of novelty (emergent properties) in the system. To capture these features and to provide a more biochemically faithful description of possible protein evolution, the current model has to be extended in two directions. First, we assume that some of the newly produced peptides can catalyze reactions other than ligation and hydrolysis. Examples of such reactions are pathways and cycles that lead to the utilization of external energy for activating reactants with high-energy groups (e.g. thioesters), synthesis of amino acids, membrane-forming amphiphiles and nucleic acids, and metabolism of small molecules. Several such pathways and cycles have been postulated on experimental grounds. They may couple constructively to peptide synthesis, allow for protocellular growth and division and provide links to systems that involve both proteins and nucleic acids.
Second, we have added to the model new features that allow longer peptides to increase not only catalytic efficiency but also specificity. Specificity toward different substrates is determined from the set of descriptors assigned to each peptide. A similar approach is commonly taken in Qualitative Structure-Activity Relationship (QSAR). Of course, in our case the good descriptors and their relation to functions and specificities are not known, so they have to be assigned somewhat arbitrarily. However, they could be, in principle, established experimentally. Furthermore, we will systematically investigate the robustness of our results with respect to the mapping between descriptors and specificity. In particular, during ligation and hydrolysis, values of the descriptors propagate according to the same probabilistic rules as catalytic efficiency.
Our simulations are aimed at determining conditions that are necessary for evolution of a population of proteins in increasingly complicated systems. Examples of issues that we focus on are:
A simple model of reaction (metabolic) networks catalyzed by functional proteins existing among random sequences has been developed and is being studied computationally. Biochemically plausible rules for identifying populations of functional proteins have been formulated in previous years of this project. By investigating large populations of networks it has been demonstrated that their subset can self-organize and evolve towards increasing complexity even in the absence of a genome. Networks can be classified into families (species) that persist even though individual networks disintegrate or transform with time. As the environmental conditions change, so do relative populations of different families. Initial results indicate that many concepts, such as speciation, developed in the context of genomic evolution may also hold for conditions in the absence of a genome. We are currently refining the model to capture main features of protobiological catalysis.
NAME |
ROLE |
ORGANIZATION |
|
Lead Co-Investigator |
NASA Ames Research Center |
||
Chaput, John |
Collaborator |
Arizona State University |
|
Seelig, Burckhard |
Collaborator |
Harvard Medical School |
|
Co-Investigator |
Massachusetts General Hospital |
||
Co-Investigator |
University of California, San Francisco |
See the following Ames Team research pages:
Formation and Evolution of Habitable Planets
Prebiotic Organics from Space
Origin and Early Evolution of Proteins and Metabolism
Biosignatures in Chemosynthetic and Photosynthetic Systems
Modeling Ecosystems and Biospheres
Hind-Casting Past Environments
Interplanetary Pioneers
Home | About the Ames Team | Research | Education and Public Outreach | Team Members
Curriculum Vitae | Publications | Photo Gallery | Related Links | Sitemap | Contact Us
NASA Ames Research Center | NASA Astrobiology Institute