Pesquisa | Secretaria de Estado da Saúde

Tertiary alphabet for the observable protein structural universe.

Mackenzie, Craig O; Zhou, Jianfu; Grigoryan, Gevorg.

Proc Natl Acad Sci U S A ; 113(47): E7438-E7447, 2016 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-27810958

RESUMO

Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only â¼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence-a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure.

Assuntos

Proteínas/química , Proteínas/genética , Algoritmos , Sequência de Aminoácidos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Espectroscopia de Ressonância Magnética , Modelos Moleculares , Conformação Proteica

Large-scale design and refinement of stable proteins using sequence-only models.

Singer, Jedediah M; Novotney, Scott; Strickland, Devin; Haddox, Hugh K; Leiby, Nicholas; Rocklin, Gabriel J; Chow, Cameron M; Roy, Anindya; Bera, Asim K; Motta, Francis C; Cao, Longxing; Strauch, Eva-Maria; Chidyausiku, Tamuka M; Ford, Alex; Ho, Ethan; Zaitzeff, Alexander; Mackenzie, Craig O; Eramian, Hamed; DiMaio, Frank; Grigoryan, Gevorg; Vaughn, Matthew; Stewart, Lance J; Baker, David; Klavins, Eric.

PLoS One ; 17(3): e0265020, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35286324

RESUMO

Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins.

Assuntos

Redes Neurais de Computação , Proteínas , Sequência de Aminoácidos , Aminoácidos , Estabilidade Proteica , Proteínas/química

Protein structural motifs in prediction and design.

Mackenzie, Craig O; Grigoryan, Gevorg.

Curr Opin Struct Biol ; 44: 161-167, 2017 06.

Artigo em Inglês | MEDLINE | ID: mdl-28460216

RESUMO

The Protein Data Bank (PDB) has been an integral resource for shaping our fundamental understanding of protein structure and for the advancement of such applications as protein design and structure prediction. Over the years, information from the PDB has been used to generate models ranging from specific structural mechanisms to general statistical potentials. With accumulating structural data, it has become possible to mine for more complete and complex structural observations, deducing more accurate generalizations. Motif libraries, which capture recurring structural features along with their sequence preferences, have exposed modularity in the structural universe and found successful application in various problems of structural biology. Here we summarize recent achievements in this arena, focusing on subdomain level structural patterns and their applications to protein design and structure prediction, and suggest promising future directions as the structural database continues to grow.

Assuntos

Biologia Computacional/métodos , Desenho de Fármacos , Proteínas/química , Motivos de Aminoácidos , Bases de Dados de Proteínas

OrbId: Origin-based identification of microRNA targets.

Filshtein, Teresa J; Mackenzie, Craig O; Dale, Maurice D; Dela-Cruz, Paul S; Ernst, Dale M; Frankenberger, Edward A; He, Chunyan; Heath, Kaylee L; Jones, Andria S; Jones, Daniel K; King, Edward R; Maher, Maggie B; Mitchell, Travis J; Morgan, Rachel R; Sirobhushanam, Sirisha; Halkyard, Scott D; Tiwari, Kiran B; Rubin, David A; Borchert, Glen M; Larson, Erik D.

Mob Genet Elements ; 2(4): 184-192, 2012 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-23087843

RESUMO

MicroRNAs coordinate networks of mRNAs, but predicting specific sites of interactions is complicated by the very few bases of complementarity needed for regulation. Although efforts to characterize the specific requirements for microRNA (miR) regulation have made some advances, no general model of target recognition has been widely accepted. In this work, we describe an entirely novel approach to miR target identification. The genomic events responsible for the creation of individual miR loci have now been described with many miRs now known to have been initially formed from transposable element (TE) sequences. In light of this, we propose that limiting miR target searches to transcripts containing a miR's progenitor TE can facilitate accurate target identification. In this report we outline the methodology behind OrbId (Origin-based identification of microRNA targets). In stark contrast to the principal miR target algorithms (which rely heavily on target site conservation across species and are therefore most effective at predicting targets for older miRs), we find OrbId is particularly efficacious at predicting the mRNA targets of miRs formed more recently in evolutionary time. After defining the TE origins of > 200 human miRs, OrbId successfully generated likely target sets for 191 predominately primate-specific human miR loci. While only a handful of the loci examined were well enough conserved to have been previously evaluated by existing algorithms, we find ~80% of the targets for the oldest miR (miR-28) in our analysis contained within the principal Diana and TargetScan prediction sets. More importantly, four of the 15 OrbId miR-28 putative targets have been previously verified experimentally. In light of OrbId proving best-suited for predicting targets for more recently formed miRs, we suggest OrbId makes a logical complement to existing, conservation based, miR target algorithms.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa