Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Nat Commun ; 15(1): 8094, 2024 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-39294145

RESUMO

Our views of fold space implicitly rest upon many assumptions that impact how we analyze, interpret and understand protein structure, function and evolution. For instance, is there an optimal granularity in viewing protein structural similarities (e.g., architecture, topology or some other level)? Similarly, the discrete/continuous dichotomy of fold space is central, but remains unresolved. Discrete views of fold space bin similar folds into distinct, non-overlapping groups; unfortunately, such binning can miss remote relationships. While hierarchical systems like CATH are indispensable resources, less heuristic and more conceptually flexible approaches could enable more nuanced explorations of fold space. Building upon an Urfold model of protein structure, here we present a deep generative modeling framework, termed DeepUrfold, for analyzing protein relationships at scale. DeepUrfold's learned embeddings occupy high-dimensional latent spaces that can be distilled for a given protein in terms of an amalgamated representation uniting sequence, structure and biophysical properties. This approach is structure-guided, versus being purely structure-based, and DeepUrfold learns representations that, in a sense, define superfamilies. Deploying DeepUrfold with CATH reveals evolutionarily-remote relationships that evade existing methodologies, and suggests a mostly-continuous view of fold space-a view that extends beyond simple geometric similarity, towards the realm of integrated sequence â†” structure â†” function properties.


Assuntos
Modelos Moleculares , Dobramento de Proteína , Proteínas , Proteínas/química , Proteínas/metabolismo , Conformação Proteica , Bases de Dados de Proteínas , Algoritmos , Aprendizado Profundo , Biologia Computacional/métodos
2.
BMC Bioinformatics ; 11: 310, 2010 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-20529369

RESUMO

BACKGROUND: Partitioning of a protein into structural components, known as domains, is an important initial step in protein classification and for functional and evolutionary studies. While the systematic assignments of domains by human experts exist (CATH and SCOP), the introduction of high throughput technologies for structure determination threatens to overwhelm expert approaches. A variety of algorithmic methods have been developed to expedite this process, allowing almost instant structural decomposition into domains. The performance of algorithmic methods can approach 85% agreement on the number of domains with the consensus reached by experts. However, each algorithm takes a somewhat different conceptual approach, each with unique strengths and weaknesses. Currently there is no simple way to automatically compare assignments from different structure-based domain assignment methods, thereby providing a comprehensive understanding of possible structure partitioning as well as providing some insight into the tendencies of particular algorithms. Most importantly, a consensus assignment drawn from multiple assignment methods can provide a singular and presumably more accurate view. RESULTS: We introduce dConsensus http://pdomains.sdsc.edu/dConsensus; a web resource that displays the results of calculations from multiple algorithmic methods and generates a domain assignment consensus with an associated reliability score. Domain assignments from seven structure-based algorithms - PDP, PUU, DomainParser2, NCBI method, DHcL, DDomains and Dodis are available for analysis and comparison alongside assignments made by expert methods. The assignments are available for all protein chains in the Protein Data Bank (PDB). A consensus domain assignment is built by either allowing each algorithm to contribute equally (simple approach) or by weighting the contribution of each method by its prior performance and observed tendencies. An analysis of secondary structure around domain and fragment boundaries is also available for display and further analysis. CONCLUSION: dConsensus provides a comprehensive assignment of protein domains. For the first time, seven algorithmic methods are brought together with no need to access each method separately via a webserver or local copy of the software. This aggregation permits a consensus domain assignment to be computed. Comparison viewing of the consensus and choice methods provides the user with insights into the fundamental units of protein structure so important to the study of evolutionary and functional relationships.


Assuntos
Proteínas/química , Software , Algoritmos , Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica
3.
PLoS Comput Biol ; 5(3): e1000315, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19282982

RESUMO

The spliceosome, a sophisticated molecular machine involved in the removal of intervening sequences from the coding sections of eukaryotic genes, appeared and subsequently evolved rapidly during the early stages of eukaryotic evolution. The last eukaryotic common ancestor (LECA) had both complex spliceosomal machinery and some spliceosomal introns, yet little is known about the early stages of evolution of the spliceosomal apparatus. The Sm/Lsm family of proteins has been suggested as one of the earliest components of the emerging spliceosome and hence provides a first in-depth glimpse into the evolving spliceosomal apparatus. An analysis of 335 Sm and Sm-like genes from 80 species across all three kingdoms of life reveals two significant observations. First, the eukaryotic Sm/Lsm family underwent two rapid waves of duplication with subsequent divergence resulting in 14 distinct genes. Each wave resulted in a more sophisticated spliceosome, reflecting a possible jump in the complexity of the evolving eukaryotic cell. Second, an unusually high degree of conservation in intron positions is observed within individual orthologous Sm/Lsm genes and between some of the Sm/Lsm paralogs. This suggests that functional spliceosomal introns existed before the emergence of the complete Sm/Lsm family of proteins; hence, spliceosomal machinery with considerably fewer components than today's spliceosome was already functional.


Assuntos
Evolução Molecular , Modelos Genéticos , Ribonucleoproteínas Nucleares Pequenas/genética , Análise de Sequência de DNA/métodos , Spliceossomos/genética , Sequência de Bases , Variação Genética/genética , Dados de Sequência Molecular , Especificidade da Espécie
4.
Protein Sci ; 28(12): 2119-2126, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31599042

RESUMO

We suspect that there is a level of granularity of protein structure intermediate between the classical levels of "architecture" and "topology," as reflected in such phenomena as extensive three-dimensional structural similarity above the level of (super)folds. Here, we examine this notion of architectural identity despite topological variability, starting with a concept that we call the "Urfold." We believe that this model could offer a new conceptual approach for protein structural analysis and classification: indeed, the Urfold concept may help reconcile various phenomena that have been frequently recognized or debated for years, such as the precise meaning of "significant" structural overlap and the degree of continuity of fold space. More broadly, the role of structural similarity in sequence↔structure↔function evolution has been studied via many models over the years; by addressing a conceptual gap that we believe exists between the architecture and topology levels of structural classification schemes, the Urfold eventually may help synthesize these models into a generalized, consistent framework. Here, we begin by qualitatively introducing the concept.


Assuntos
Proteínas/química , Algoritmos , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína
5.
Structure ; 27(1): 6-26, 2019 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-30393050

RESUMO

The small ß-barrel (SBB) is an ancient protein structural domain characterized by extremes: it features a broad range of structural varieties, a deeply intricate evolutionary history, and it is associated with a bewildering array of cellular pathways. Here, we present a thorough, survey-based analysis of the structural properties of SBBs. We first consider the defining properties of the SBB, including various systems of nomenclature used to describe it, and we introduce the unifying concept of an "urfold." To begin elucidating how vast functional diversity can be achieved by a relatively simple domain, we explore the anatomy of the SBB and its representative structural variants. Many SBB proteins assemble into cyclic oligomers as the biologically functional units; these oligomers often bind RNA, and typically exhibit great quaternary structural plasticity (homomeric and heteromeric rings, variable subunit stoichiometries, etc.). We conclude with three themes that emerge from the rich structure ↔ function versatility of the SBB.


Assuntos
Proteínas/química , Animais , Sítios de Ligação , Humanos , Modelos Moleculares , Ligação Proteica , Estrutura Secundária de Proteína
6.
J Mol Biol ; 361(3): 562-90, 2006 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-16863650

RESUMO

This analysis takes an in-depth look into the difficulties encountered by automatic methods for domain decomposition from three-dimensional structure. The analysis involves a multi-faceted set of criteria including the integrity of secondary structure elements, the tendency toward fragmentation of domains, domain boundary consistency and topology. The strength of the analysis comes from the use of a new comprehensive benchmark dataset, which is based on consensus among experts (CATH, SCOP and AUTHORS of the 3D structures) and covers 30 distinct architectures and 211 distinct topologies as defined by CATH. Furthermore, over 66% of the structures are multi-domain proteins; each domain combination occurring once per dataset. The performance of four automatic domain assignment methods, DomainParser, NCBI, PDP and PUU, is carefully analyzed using this broad spectrum of topology combinations and knowledge of rules and assumptions built into each algorithm. We conclude that it is practically impossible for an automatic method to achieve the level of performance of human experts. However, we propose specific improvements to automatic methods as well as broadening the concept of a structural domain. Such work is prerequisite for establishing improved approaches to domain recognition. (The benchmark dataset is available from http://pdomains.sdsc.edu).


Assuntos
Simulação por Computador , Modelos Moleculares , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Biologia Computacional
7.
Nucleic Acids Res ; 31(1): 342-4, 2003 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-12520018

RESUMO

PlantsP and PlantsT allow users to quickly gain a global understanding of plant phosphoproteins and plant membrane transporters, respectively, from evolutionary relationships to biochemical function as well as a deep understanding of the molecular biology of individual genes and their products. As one database with two functionally different web interfaces, PlantsP and PlantsT are curated plant-specific databases that combine sequence-derived information with experimental functional-genomics data. PlantsP focuses on proteins involved in the phosphorylation process (i.e., kinases and phosphatases), whereas PlantsT focuses on membrane transport proteins. Experimentally, PlantsP provides a resource for information on a collection of T-DNA insertion mutants (knockouts) in each kinase and phosphatase, primarily in Arabidopsis thaliana, and PlantsT uniquely combines experimental data regarding mineral composition (derived from inductively coupled plasma atomic emission spectroscopy) of mutant and wild-type strains. Both databases provide extensive information on motifs and domains, detailed information contributed by individual experts in their respective fields, and descriptive information drawn directly from the literature. The databases incorporate a unique user annotation and review feature aimed at acquiring expert annotation directly from the plant biology community. PlantsP is available at http://plantsp.sdsc.edu and PlantsT is available at http://plantst.sdsc.edu.


Assuntos
Bases de Dados Genéticas , Proteínas de Membrana Transportadoras/genética , Fosfoproteínas Fosfatases/genética , Proteínas de Plantas/genética , Proteínas Quinases/genética , Arabidopsis/enzimologia , Arabidopsis/genética , Arabidopsis/metabolismo , Genoma de Planta , Genômica , Proteínas de Membrana Transportadoras/classificação , Proteínas de Membrana Transportadoras/fisiologia , Fosfoproteínas Fosfatases/fisiologia , Fosfoproteínas/metabolismo , Fosforilação , Proteínas de Plantas/classificação , Proteínas de Plantas/fisiologia , Proteínas Quinases/fisiologia
8.
J Mol Biol ; 339(3): 647-78, 2004 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-15147847

RESUMO

The assignment of protein domains from three-dimensional structure is critically important in understanding protein evolution and function, yet little quality assurance has been performed. Here, the differences in the assignment of structural domains are evaluated using six common assignment methods. Three human expert methods (AUTHORS (authors' annotation), CATH and SCOP) and three fully automated methods (DALI, DomainParser and PDP) are investigated by analysis of individual methods against the author's assignment as well as analysis based on the consensus among groups of methods (only expert, only automatic, combined). The results demonstrate that caution is recommended in using current domain assignments, and indicates where additional work is needed. Specifically, the major factors responsible for conflicting domain assignments between methods, both experts and automatic, are: (1) the definition of very small domains; (2) splitting secondary structures between domains; (3) the size and number of discontinuous domains; (4) closely packed or convoluted domain-domain interfaces; (5) structures with large and complex architectures; and (6) the level of significance placed upon structural, functional and evolutionary concepts in considering structural domain definitions. A web-based resource that focuses on the results of benchmarking and the analysis of domain assignments is available at


Assuntos
Proteínas/química , Algoritmos , Modelos Moleculares , Conformação Proteica
10.
Plant Physiol ; 129(2): 908-25, 2002 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-12068129

RESUMO

Reversible protein phosphorylation is critically important in the modulation of a wide variety of cellular functions. Several families of protein phosphatases remove phosphate groups placed on key cellular proteins by protein kinases. The complete genomic sequence of the model plant Arabidopsis permits a comprehensive survey of the phosphatases encoded by this organism. Several errors in the sequencing project gene models were found via analysis of predicted phosphatase coding sequences. Structural sequence probes from aligned and unaligned sequence models, and all-against-all BLAST searches, were used to identify 112 phosphatase catalytic subunit sequences, distributed among the serine (Ser)/threonine (Thr) phosphatases (STs) of the protein phosphatase P (PPP) family, STs of the protein phosphatase M (PPM) family (protein phosphatases 2C [PP2Cs] subfamily), protein tyrosine (Tyr) phosphatases (PTPs), low-M(r) protein Tyr phosphatases, and dual-specificity (Tyr and Ser/Thr) phosphatases (DSPs). The Arabidopsis genome contains an abundance of PP2Cs (69) and a dearth of PTPs (one). Eight sequences were identified as new protein phosphatase candidates: five dual-specificity phosphatases and three PP2Cs. We used phylogenetic analyses to infer clustering patterns reflecting sequence similarity and evolutionary ancestry. These clusters, particularly for the largely unexplored PP2C set, will be a rich source of material for plant biologists, allowing the systematic sampling of protein function by genetic and biochemical means.


Assuntos
Arabidopsis/enzimologia , Genoma de Planta , Fosfoproteínas Fosfatases/genética , Filogenia , Proteínas de Saccharomyces cerevisiae , Animais , Arabidopsis/genética , Domínio Catalítico/genética , Bases de Dados de Ácidos Nucleicos , Fosfatases de Especificidade Dupla , Evolução Molecular , Humanos , Fosfatases da Proteína Quinase Ativada por Mitógeno , Proteína Fosfatase 2 , Proteína Fosfatase 2C , Proteínas Tirosina Fosfatases/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA