Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Protein Sci ; 33(8): e5116, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38979784

RESUMEN

Interactions between proteins and small organic compounds play a crucial role in regulating protein functions. These interactions can modulate various aspects of protein behavior, including enzymatic activity, signaling cascades, and structural stability. By binding to specific sites on proteins, small organic compounds can induce conformational changes, alter protein-protein interactions, or directly affect catalytic activity. Therefore, many drugs available on the market today are small molecules (72% of all approved drugs in the last 5 years). Proteins are composed of one or more domains: evolutionary units that convey function or fitness either singly or in concert with others. Understanding which domain(s) of the target protein binds to a drug can lead to additional opportunities for discovering novel targets. The evolutionary classification of protein domains (ECOD) classifies domains into an evolutionary hierarchy that focuses on distant homology. Previously, no structure-based protein domain classification existed that included information about both the interaction between small molecules or drugs and the structural domains of a target protein. This data is especially important for multidomain proteins and large complexes. Here, we present the DrugDomain database that reports the interaction between ECOD of human target proteins and DrugBank molecules and drugs. The pilot version of DrugDomain describes the interaction of 5160 DrugBank molecules associated with 2573 human proteins. It describes domains for all experimentally determined structures of these proteins and incorporates AlphaFold models when such structures are unavailable. The DrugDomain database is available online: http://prodata.swmed.edu/DrugDomain/.


Asunto(s)
Dominios Proteicos , Proteínas , Proteínas/química , Proteínas/metabolismo , Preparaciones Farmacéuticas/química , Preparaciones Farmacéuticas/metabolismo , Bases de Datos de Proteínas , Humanos , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/metabolismo , Evolución Molecular , Unión Proteica
2.
Sci Rep ; 14(1): 12260, 2024 05 28.
Artículo en Inglés | MEDLINE | ID: mdl-38806511

RESUMEN

Salmonella enterica is a pathogenic bacterium known for causing severe typhoid fever in humans, making it important to study due to its potential health risks and significant impact on public health. This study provides evolutionary classification of proteins from Salmonella enterica pangenome. We classified 17,238 domains from 13,147 proteins from 79,758 Salmonella enterica strains and studied in detail domains of 272 proteins from 14 characterized Salmonella pathogenicity islands (SPIs). Among SPIs-related proteins, 90 proteins function in the secretion machinery. 41% domains of SPI proteins have no previous sequence annotation. By comparing clinical and environmental isolates, we identified 3682 proteins that are overrepresented in clinical group that we consider as potentially pathogenic. Among domains of potentially pathogenic proteins only 50% domains were annotated by sequence methods previously. Moreover, 36% (1330 out of 3682) of potentially pathogenic proteins cannot be classified into Evolutionary Classification of Protein Domains database (ECOD). Among classified domains of potentially pathogenic proteins the most populated homology groups include helix-turn-helix (HTH), Immunoglobulin-related, and P-loop domains-related. Functional analysis revealed overrepresentation of these protein in biological processes related to viral entry into host cell, antibiotic biosynthesis, DNA metabolism and conformation change, and underrepresentation in translational processes. Analysis of the potentially pathogenic proteins indicates that they form 119 clusters or novel potential pathogenicity islands (NPPIs) within the Salmonella genome, suggesting their potential contribution to the bacterium's virulence. One of the NPPIs revealed significant overrepresentation of potentially pathogenic proteins. Overall, our analysis revealed that identified potentially pathogenic proteins are poorly studied.


Asunto(s)
Proteínas Bacterianas , Genoma Bacteriano , Islas Genómicas , Salmonella enterica , Islas Genómicas/genética , Salmonella enterica/genética , Salmonella enterica/patogenicidad , Salmonella enterica/clasificación , Proteínas Bacterianas/genética , Proteínas Bacterianas/química , Proteínas Bacterianas/metabolismo , Humanos , Dominios Proteicos
3.
PLoS Comput Biol ; 20(2): e1011586, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38416793

RESUMEN

Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets.


Asunto(s)
Evolución Biológica , Proteoma , Dominios Proteicos , Evolución Molecular , Bacterias , Bases de Datos de Proteínas
4.
mSystems ; 8(6): e0079623, 2023 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-38014954

RESUMEN

IMPORTANCE: The pandemic Vpar strain RIMD causes seafood-borne illness worldwide. Previous comparative genomic studies have revealed pathogenicity islands in RIMD that contribute to the success of the strain in infection. However, not all virulence determinants have been identified, and many of the proteins encoded in known pathogenicity islands are of unknown function. Based on the EOCD database, we used evolution-based classification of structure models for the RIMD proteome to improve our functional understanding of virulence determinants acquired by the pandemic strain. We further identify and classify previously unknown mobile protein domains as well as fast evolving residue positions in structure models that contribute to virulence and adaptation with respect to a pre-pandemic strain. Our work highlights key contributions of phage in mediating seafood born illness, suggesting this strain balances its avoidance of phage predators with its successful colonization of human hosts.


Asunto(s)
Vibrio parahaemolyticus , Humanos , Virulencia/genética , Vibrio parahaemolyticus/genética , Factores de Virulencia/genética , Genómica
5.
Protein Sci ; 32(9): e4750, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37572333

RESUMEN

Control of eukaryotic cellular function is heavily reliant on the phosphorylation of proteins at specific amino acid residues, such as serine, threonine, tyrosine, and histidine. Protein kinases that are responsible for this process comprise one of the largest families of evolutionarily related proteins. Dysregulation of protein kinase signaling pathways is a frequent cause of a large variety of human diseases including cancer, autoimmune, neurodegenerative, and cardiovascular disorders. In this study, we mapped all pathogenic mutations in 497 human protein kinase domains from the ClinVar database to the reference structure of Aurora kinase A (AURKA) and grouped them by the relevance to the disease type. Our study revealed that the majority of mutation hotspots associated with cancer are situated within the catalytic and activation loops of the kinase domain, whereas non-cancer-related hotspots tend to be located outside of these regions. Additionally, we identified a hotspot at residue R371 of the AURKA structure that has the highest number of exclusively non-cancer-related pathogenic mutations (21) and has not been previously discussed.


Asunto(s)
Proteínas Quinasas , Proteínas Serina-Treonina Quinasas , Humanos , Proteínas Quinasas/química , Proteínas Serina-Treonina Quinasas/química , Aurora Quinasa A/genética , Aurora Quinasa A/química , Aurora Quinasa A/metabolismo , Modelos Moleculares , Fosforilación , Mutación
6.
Sci Rep ; 13(1): 11988, 2023 07 25.
Artículo en Inglés | MEDLINE | ID: mdl-37491511

RESUMEN

The recent progress in the prediction of protein structures marked a historical milestone. AlphaFold predicted 200 million protein models with an accuracy comparable to experimental methods. Protein structures are widely used to understand evolution and to identify potential drug targets for the treatment of various diseases, including cancer. Thus, these recently predicted structures might convey previously unavailable information about cancer biology. Evolutionary classification of protein domains is challenging and different approaches exist. Recently our team presented a classification of domains from human protein models released by AlphaFold. Here we evaluated the pan-cancer structurome, domains from over and under expressed proteins in 21 cancer types, using the broadest levels of the ECOD classification: the architecture (A-groups) and possible homology (X-groups) levels. Our analysis reveals that AlphaFold has greatly increased the three-dimensional structural landscape for proteins that are differentially expressed in these 21 cancer types. We show that beta sandwich domains are significantly overrepresented and alpha helical domains are significantly underrepresented in the majority of cancer types. Our data suggest that the prevalence of the beta sandwiches is due to the high levels of immunoglobulins and immunoglobulin-like domains that arise during tumor development-related inflammation. On the other hand, proteins with exclusively alpha domains are important elements of homeostasis, apoptosis and transmembrane transport. Therefore cancer cells tend to reduce representation of these proteins to promote successful oncogeneses.


Asunto(s)
Neoplasias , Proteínas , Humanos , Proteínas/química , Dominios Proteicos , Conformación Proteica en Hélice alfa
7.
Proc Natl Acad Sci U S A ; 120(12): e2214069120, 2023 03 21.
Artículo en Inglés | MEDLINE | ID: mdl-36917664

RESUMEN

Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website (http://prodata.swmed.edu/ecod/index_human.php).


Asunto(s)
Aminoácidos , Proteoma , Humanos , Proteoma/genética , Alineación de Secuencia , Bases de Datos de Proteínas
8.
Protein Sci ; 32(2): e4548, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36539305

RESUMEN

The recent breakthroughs in structure prediction, where methods such as AlphaFold demonstrated near-atomic accuracy, herald a paradigm shift in structural biology. The 200 million high-accuracy models released in the AlphaFold Database are expected to guide protein science in the coming decades. Partitioning these AlphaFold models into domains and assigning them to an evolutionary hierarchy provide an efficient way to gain functional insights into proteins. However, classifying such a large number of predicted structures challenges the infrastructure of current structure classifications, including our Evolutionary Classification of protein Domains (ECOD). Better computational tools are urgently needed to parse and classify domains from AlphaFold models automatically. Here we present a Domain Parser for AlphaFold Models (DPAM) that can automatically recognize globular domains from these models based on inter-residue distances in 3D structures, predicted aligned errors, and ECOD domains found by sequence (HHsuite) and structural (Dali) similarity searches. Based on a benchmark of 18,759 AlphaFold models, we demonstrate that DPAM can recognize 98.8% of domains and assign correct boundaries for 87.5%, significantly outperforming structure-based domain parsers and homology-based domain assignment using ECOD domains found by HHsuite or Dali. Application of DPAM to the massive AlphaFold models will enable efficient classification of domains, providing evolutionary contexts and facilitating functional studies.


Asunto(s)
Proteínas , Programas Informáticos , Bases de Datos de Proteínas , Proteínas/química , Dominios Proteicos , Evolución Molecular
9.
Proteins ; 89(12): 1700-1710, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34455641

RESUMEN

The high accuracy of some CASP14 models at the domain level prompted a more detailed evaluation of structure predictions on whole targets. For the first time in critical assessment of structure prediction (CASP), we evaluated accuracy of difficult domain assembly in models submitted for multidomain targets where the community predicted individual evaluation units (EUs) with greater accuracy than full-length targets. Ten proteins with domain interactions that did not show evidence of conformational change and were not involved in significant oligomeric contacts were chosen as targets for the domain interaction assessment. Groups were ranked using complementary interaction scores (F1, QS score, and Jaccard coefficient), and their predictions were evaluated for their ability to correctly model inter-domain interfaces and overall protein folds. Target performance was broadly grouped into two clusters. The first consisted primarily of targets containing two EUs wherein predictors more broadly predicted domain positioning and interfacial contacts correctly. The other consisted of complex two-EU and three-EU targets where few predictors performed well. The highest ranked predictor, AlphaFold2, produced high-accuracy models on eight out of 10 targets. Their interdomain scores on three of these targets were significantly higher than all other groups and were responsible for their overall outperformance in the category. We further highlight the performance of AlphaFold2 and the next best group, BAKER-experimental on several interesting targets.


Asunto(s)
Modelos Moleculares , Conformación Proteica , Pliegue de Proteína , Dominios y Motivos de Interacción de Proteínas , Proteínas , Biología Computacional , Unión Proteica , Proteínas/química , Proteínas/metabolismo , Análisis de Secuencia de Proteína , Programas Informáticos
10.
Proteins ; 89(12): 1618-1632, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34350630

RESUMEN

An evolutionary-based definition and classification of target evaluation units (EUs) is presented for the 14th round of the critical assessment of structure prediction (CASP14). CASP14 targets included 84 experimental models submitted by various structural groups (designated T1024-T1101). Targets were split into EUs based on the domain organization of available templates and performance of server groups. Several targets required splitting (19 out of 25 multidomain targets) due in part to observed conformation changes. All in all, 96 CASP14 EUs were defined and assigned to tertiary structure assessment categories (Topology-based FM or High Accuracy-based TBM-easy and TBM-hard) considering their evolutionary relationship to existing ECOD fold space: 24 family level, 50 distant homologs (H-group), 12 analogs (X-group), and 10 new folds. Principal component analysis and heatmap visualization of sequence and structure similarity to known templates as well as performance of servers highlighted trends in CASP14 target difficulty. The assigned evolutionary levels (i.e., H-groups) and assessment classes (i.e., FM) displayed overlapping clusters of EUs. Many viral targets diverged considerably from their template homologs and thus were more difficult for prediction than other homology-related targets. On the other hand, some targets did not have sequence-identifiable templates, but were predicted better than expected due to relatively simple arrangements of secondary structural elements. An apparent improvement in overall server performance in CASP14 further complicated traditional classification, which ultimately assigned EUs into high-accuracy modeling (27 TBM-easy and 31 TBM-hard), topology (23 FM), or both (15 FM/TBM).


Asunto(s)
Modelos Moleculares , Conformación Proteica , Proteínas , Secuencia de Aminoácidos , Biología Computacional , Evolución Molecular , Proteínas/química , Proteínas/genética , Análisis de Secuencia de Proteína , Programas Informáticos
11.
Proteins ; 89(12): 1673-1686, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34240477

RESUMEN

This report describes the tertiary structure prediction assessment of difficult modeling targets in the 14th round of the Critical Assessment of Structure Prediction (CASP14). We implemented an official ranking scheme that used the same scores as the previous CASP topology-based assessment, but combined these scores with one that emphasized physically realistic models. The top performing AlphaFold2 group outperformed the rest of the prediction community on all but two of the difficult targets considered in this assessment. They provided high quality models for most of the targets (86% over GDT_TS 70), including larger targets above 150 residues, and they correctly predicted the topology of almost all the rest. AlphaFold2 performance was followed by two manual Baker methods, a Feig method that refined Zhang-server models, two notable automated Zhang server methods (QUARK and Zhang-server), and a Zhang manual group. Despite the remarkable progress in protein structure prediction of difficult targets, both the prediction community and AlphaFold2, to a lesser extent, faced challenges with flexible regions and obligate oligomeric assemblies. The official ranking of top-performing methods was supported by performance generated PCA and heatmap clusters that gave insight into target difficulties and the most successful state-of-the-art structure prediction methodologies.


Asunto(s)
Biología Computacional/métodos , Modelos Moleculares , Conformación Proteica , Pliegue de Proteína , Programas Informáticos , Bases de Datos de Proteínas , Proteínas/química , Proteínas/metabolismo , Análisis de Secuencia de Proteína
12.
Science ; 373(6557): 871-876, 2021 08 20.
Artículo en Inglés | MEDLINE | ID: mdl-34282049

RESUMEN

DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.


Asunto(s)
Aprendizaje Profundo , Conformación Proteica , Pliegue de Proteína , Proteínas/química , Proteínas ADAM/química , Secuencia de Aminoácidos , Simulación por Computador , Microscopía por Crioelectrón , Cristalografía por Rayos X , Bases de Datos de Proteínas , Proteínas de la Membrana/química , Modelos Moleculares , Complejos Multiproteicos/química , Redes Neurales de la Computación , Subunidades de Proteína/química , Proteínas/fisiología , Receptores Acoplados a Proteínas G/química , Esfingosina N-Aciltransferasa/química
13.
ACS Omega ; 6(24): 15698-15707, 2021 Jun 22.
Artículo en Inglés | MEDLINE | ID: mdl-34179613

RESUMEN

Domain classifications are a useful resource for computational analysis of the protein structure, but elements of their composition are often opaque to potential users. We perform a comparative analysis of our classification ECOD against the SCOPe, SCOP2, and CATH domain classifications with respect to their constituent domain boundaries and hierarchal organization. The coverage of these domain classifications with respect to ECOD and to the PDB was assessed by structure and by sequence. We also conducted domain pair analysis to determine broad differences in hierarchy between domains shared by ECOD and other classifications. Finally, we present domains from the major facilitator superfamily (MFS) of transporter proteins and provide evidence that supports their split into domains and for multiple conformations within these families. We find that the ECOD and CATH provide the most extensive structural coverage of the PDB. ECOD and SCOPe have the most consistent domain boundary conditions, whereas CATH and SCOP2 both differ significantly.

14.
PLoS Comput Biol ; 15(12): e1007569, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31869345

RESUMEN

Rossmann folds are ancient, frequently diverged domains found in many biological reaction pathways where they have adapted for different functions. Consequently, discernment and classification of their homologous relations and function can be complicated. We define a minimal Rossmann-like structure motif (RLM) that corresponds for the common core of known Rossmann domains and use this motif to identify all RLM domains in the Protein Data Bank (PDB), thus finding they constitute about 20% of all known 3D structures. The Evolutionary Classification of protein structure Domains (ECOD) classifies RLM domains in a number of groups that lack evidence for homology (X-groups), which suggests that they could have evolved independently multiple times. Closely related, homologous RLM enzyme families can diverge to bind different ligands using similar binding sites and to catalyze different reactions. Conversely, non-homologous RLM domains can converge to catalyze the same reactions or to bind the same ligand with alternate binding modes. We discuss a special case of such convergent evolution that is relevant to the polypharmacology paradigm, wherein the same drug (methotrexate) binds to multiple non-homologous RLM drug targets with different topologies. Finally, assigning proteins with RLM domain to the Enzyme Commission classification suggest that RLM enzymes function mainly in metabolism (and comprise 38% of reference metabolic pathways) and are overrepresented in extant pathways that represent ancient biosynthetic routes such as nucleotide metabolism, energy metabolism, and metabolism of amino acids. In fact, RLM enzymes take part in five out of eight enzymatic reactions of the Wood-Ljungdahl metabolic pathway thought to be used by the last universal common ancestor (LUCA). The prevalence of RLM domains in this ancient metabolism might explain their wide distribution among enzymes.


Asunto(s)
Evolución Molecular , Dominios Proteicos/genética , Sitios de Unión/genética , Dominio Catalítico/genética , Biología Computacional , Bases de Datos de Proteínas , Enzimas/química , Enzimas/genética , Enzimas/metabolismo , Humanos , Ligandos , Redes y Vías Metabólicas/genética , Modelos Moleculares , Unión Proteica/genética , Programas Informáticos , Homología Estructural de Proteína
15.
BMC Mol Cell Biol ; 20(1): 18, 2019 06 21.
Artículo en Inglés | MEDLINE | ID: mdl-31226926

RESUMEN

The manual classification of protein domains is approaching its 20th anniversary. ECOD is our mixed manual-automatic domain classification. Over time, the types of proteins which require manual curation has changed. Depositions with complex multidomain and multichain arrangements are commonplace. Transmembrane domains are regularly classified. Repeatedly, domains which are initially believed to be novel are found to have homologous links to existing classified domains. Here we present a brief summary of recent manual curation efforts in ECOD generally combined with specific case studies of transmembrane and multidomain proteins wherein manual curation was useful for discovering new homologous relationships. We present a new taxonomy for the classification of ABC transporter transmembrane domains. We examine alternate topologies of the leucine-specific (LS) domain of Leucine tRNA-synthetase. Finally, we elaborate on a distant homologous links between two helical dimerization domains.


Asunto(s)
Transportadoras de Casetes de Unión a ATP/química , Transportadoras de Casetes de Unión a ATP/clasificación , Dominios Proteicos , Homología Estructural de Proteína , Proteínas Portadoras/química , Proteínas de Ciclo Celular/química , Cristalografía por Rayos X , Bases de Datos de Proteínas , Endopeptidasas/química , Escherichia coli/química , Humanos , Leucina-ARNt Ligasa/química , Proteínas de la Membrana/química , Proteínas de Transporte de Catión Orgánico/química , Multimerización de Proteína , Estructura Secundaria de Proteína , Proteínas ras/química
16.
Curr Protoc Bioinformatics ; 61(1): e45, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-30040199

RESUMEN

ECOD is a database of evolutionary domains from structures deposited in the PDB. Domains in ECOD are classified by a mixed manual/automatic method wherein the bulk of newly deposited structures are classified automatically by protein-protein BLAST. Those structures that cannot be classified automatically are referred to manual curators who use a combination of alignment results, functional analysis, and close reading of the literature to generate novel assignments. ECOD differs from other structural domain resources in that it is continually updated, classifying thousands of proteins per week. ECOD recognizes homology as its key organizing concept, rather than structural or sequence similarity alone. Such a classification scheme provides functional information about proteins of interest by placing them in the correct evolutionary context among all proteins of known structure. This unit demonstrates how to access ECOD via the Web and how to search the database by sequence or structure. It also details the distributable data files available for large-scale bioinformatics users. © 2018 by John Wiley & Sons, Inc.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Dominios Proteicos , Proteínas/química , Homología de Secuencia de Aminoácido , Homología Estructural de Proteína , Secuencia de Aminoácidos , Alineación de Secuencia
17.
Bioinformatics ; 34(17): 2997-3003, 2018 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-29659718

RESUMEN

Motivation: The ECOD database classifies protein domains based on their evolutionary relationships, considering both remote and close homology. The family group in ECOD provides classification of domains that are closely related to each other based on sequence similarity. Due to different perspectives on domain definition, direct application of existing sequence domain databases, such as Pfam, to ECOD struggles with several shortcomings. Results: We created multiple sequence alignments and profiles from ECOD domains with the help of structural information in alignment building and boundary delineation. We validated the alignment quality by scoring structure superposition to demonstrate that they are comparable to curated seed alignments in Pfam. Comparison to Pfam and CDD reveals that 27 and 16% of ECOD families are new, but they are also dominated by small families, likely because of the sampling bias from the PDB database. There are 35 and 48% of families whose boundaries are modified comparing to counterparts in Pfam and CDD, respectively. Availability and implementation: The new families are now integrated in the ECOD website. The aggregate HMMER profile library and alignment are available for download on ECOD website (http://prodata.swmed.edu/ecod). Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Dominios Proteicos , Alineación de Secuencia , Programas Informáticos
18.
Nucleic Acids Res ; 45(D1): D296-D302, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899594

RESUMEN

Evolutionary Classification Of protein Domains (ECOD) (http://prodata.swmed.edu/ecod) comprehensively classifies protein with known spatial structures maintained by the Protein Data Bank (PDB) into evolutionary groups of protein domains. ECOD relies on a combination of automatic and manual weekly updates to achieve its high accuracy and coverage with a short update cycle. ECOD classifies the approximately 120 000 depositions of the PDB into more than 500 000 domains in ∼3400 homologous groups. We show the performance of the weekly update pipeline since the release of ECOD, describe improvements to the ECOD website and available search options, and discuss novel structures and homologous groups that have been classified in the recent updates. Finally, we discuss the future directions of ECOD and further improvements planned for the hierarchy and update process.


Asunto(s)
Bases de Datos de Proteínas , Evolución Molecular , Modelos Moleculares , Dominios Proteicos , Proteínas , Biología Computacional/métodos , Conformación Proteica , Proteínas/química , Proteínas/clasificación , Proteínas/genética
19.
PLoS One ; 11(5): e0154786, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27149620

RESUMEN

The Critical Assessment of techniques for protein Structure Prediction (or CASP) is a community-wide blind test experiment to reveal the best accomplishments of structure modeling. Assessors have been using the Global Distance Test (GDT_TS) measure to quantify prediction performance since CASP3 in 1998. However, identifying significant score differences between close models is difficult because of the lack of uncertainty estimations for this measure. Here, we utilized the atomic fluctuations caused by structure flexibility to estimate the uncertainty of GDT_TS scores. Structures determined by nuclear magnetic resonance are deposited as ensembles of alternative conformers that reflect the structural flexibility, whereas standard X-ray refinement produces the static structure averaged over time and space for the dynamic ensembles. To recapitulate the structural heterogeneous ensemble in the crystal lattice, we performed time-averaged refinement for X-ray datasets to generate structural ensembles for our GDT_TS uncertainty analysis. Using those generated ensembles, our study demonstrates that the time-averaged refinements produced structure ensembles with better agreement with the experimental datasets than the averaged X-ray structures with B-factors. The uncertainty of the GDT_TS scores, quantified by their standard deviations (SDs), increases for scores lower than 50 and 70, with maximum SDs of 0.3 and 1.23 for X-ray and NMR structures, respectively. We also applied our procedure to the high accuracy version of GDT-based score and produced similar results with slightly higher SDs. To facilitate score comparisons by the community, we developed a user-friendly web server that produces structure ensembles for NMR and X-ray structures and is accessible at http://prodata.swmed.edu/SEnCS. Our work helps to identify the significance of GDT_TS score differences, as well as to provide structure ensembles for estimating SDs of any scores.


Asunto(s)
Modelos Teóricos , Incertidumbre , Rayos X
20.
Protein Sci ; 25(7): 1188-203, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-26833690

RESUMEN

Proteins and their domains evolve by a set of events commonly including the duplication and divergence of small motifs. The presence of short repetitive regions in domains has generally constituted a difficult case for structural domain classifications and their hierarchies. We developed the Evolutionary Classification Of protein Domains (ECOD) in part to implement a new schema for the classification of these types of proteins. Here we document the ways in which ECOD classifies proteins with small internal repeats, widespread functional motifs, and assemblies of small domain-like fragments in its evolutionary schema. We illustrate the ways in which the structural genomics project impacted the classification and characterization of new structural domains and sequence families over the decade.


Asunto(s)
Secuencias de Aminoácidos , Proteínas/química , Proteómica/métodos , Bases de Datos de Proteínas , Evolución Molecular , Modelos Moleculares , Dominios Proteicos , Proteínas/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA