Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
Proc Natl Acad Sci U S A ; 120(12): e2214069120, 2023 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-36917664

RESUMO

Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website (http://prodata.swmed.edu/ecod/index_human.php).


Assuntos
Aminoácidos , Proteoma , Humanos , Proteoma/genética , Alinhamento de Sequência , Bases de Dados de Proteínas
2.
PLoS Comput Biol ; 20(2): e1011586, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38416793

RESUMO

Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets.


Assuntos
Evolução Biológica , Proteoma , Domínios Proteicos , Evolução Molecular , Bactérias , Bases de Dados de Proteínas
3.
Proteins ; 89(12): 1700-1710, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34455641

RESUMO

The high accuracy of some CASP14 models at the domain level prompted a more detailed evaluation of structure predictions on whole targets. For the first time in critical assessment of structure prediction (CASP), we evaluated accuracy of difficult domain assembly in models submitted for multidomain targets where the community predicted individual evaluation units (EUs) with greater accuracy than full-length targets. Ten proteins with domain interactions that did not show evidence of conformational change and were not involved in significant oligomeric contacts were chosen as targets for the domain interaction assessment. Groups were ranked using complementary interaction scores (F1, QS score, and Jaccard coefficient), and their predictions were evaluated for their ability to correctly model inter-domain interfaces and overall protein folds. Target performance was broadly grouped into two clusters. The first consisted primarily of targets containing two EUs wherein predictors more broadly predicted domain positioning and interfacial contacts correctly. The other consisted of complex two-EU and three-EU targets where few predictors performed well. The highest ranked predictor, AlphaFold2, produced high-accuracy models on eight out of 10 targets. Their interdomain scores on three of these targets were significantly higher than all other groups and were responsible for their overall outperformance in the category. We further highlight the performance of AlphaFold2 and the next best group, BAKER-experimental on several interesting targets.


Assuntos
Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas , Proteínas , Biologia Computacional , Ligação Proteica , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína , Software
4.
Proteins ; 89(12): 1618-1632, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34350630

RESUMO

An evolutionary-based definition and classification of target evaluation units (EUs) is presented for the 14th round of the critical assessment of structure prediction (CASP14). CASP14 targets included 84 experimental models submitted by various structural groups (designated T1024-T1101). Targets were split into EUs based on the domain organization of available templates and performance of server groups. Several targets required splitting (19 out of 25 multidomain targets) due in part to observed conformation changes. All in all, 96 CASP14 EUs were defined and assigned to tertiary structure assessment categories (Topology-based FM or High Accuracy-based TBM-easy and TBM-hard) considering their evolutionary relationship to existing ECOD fold space: 24 family level, 50 distant homologs (H-group), 12 analogs (X-group), and 10 new folds. Principal component analysis and heatmap visualization of sequence and structure similarity to known templates as well as performance of servers highlighted trends in CASP14 target difficulty. The assigned evolutionary levels (i.e., H-groups) and assessment classes (i.e., FM) displayed overlapping clusters of EUs. Many viral targets diverged considerably from their template homologs and thus were more difficult for prediction than other homology-related targets. On the other hand, some targets did not have sequence-identifiable templates, but were predicted better than expected due to relatively simple arrangements of secondary structural elements. An apparent improvement in overall server performance in CASP14 further complicated traditional classification, which ultimately assigned EUs into high-accuracy modeling (27 TBM-easy and 31 TBM-hard), topology (23 FM), or both (15 FM/TBM).


Assuntos
Modelos Moleculares , Conformação Proteica , Proteínas , Sequência de Aminoácidos , Biologia Computacional , Evolução Molecular , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína , Software
5.
Proteins ; 89(12): 1673-1686, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34240477

RESUMO

This report describes the tertiary structure prediction assessment of difficult modeling targets in the 14th round of the Critical Assessment of Structure Prediction (CASP14). We implemented an official ranking scheme that used the same scores as the previous CASP topology-based assessment, but combined these scores with one that emphasized physically realistic models. The top performing AlphaFold2 group outperformed the rest of the prediction community on all but two of the difficult targets considered in this assessment. They provided high quality models for most of the targets (86% over GDT_TS 70), including larger targets above 150 residues, and they correctly predicted the topology of almost all the rest. AlphaFold2 performance was followed by two manual Baker methods, a Feig method that refined Zhang-server models, two notable automated Zhang server methods (QUARK and Zhang-server), and a Zhang manual group. Despite the remarkable progress in protein structure prediction of difficult targets, both the prediction community and AlphaFold2, to a lesser extent, faced challenges with flexible regions and obligate oligomeric assemblies. The official ranking of top-performing methods was supported by performance generated PCA and heatmap clusters that gave insight into target difficulties and the most successful state-of-the-art structure prediction methodologies.


Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Software , Bases de Dados de Proteínas , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína
6.
PLoS Comput Biol ; 15(12): e1007569, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31869345

RESUMO

Rossmann folds are ancient, frequently diverged domains found in many biological reaction pathways where they have adapted for different functions. Consequently, discernment and classification of their homologous relations and function can be complicated. We define a minimal Rossmann-like structure motif (RLM) that corresponds for the common core of known Rossmann domains and use this motif to identify all RLM domains in the Protein Data Bank (PDB), thus finding they constitute about 20% of all known 3D structures. The Evolutionary Classification of protein structure Domains (ECOD) classifies RLM domains in a number of groups that lack evidence for homology (X-groups), which suggests that they could have evolved independently multiple times. Closely related, homologous RLM enzyme families can diverge to bind different ligands using similar binding sites and to catalyze different reactions. Conversely, non-homologous RLM domains can converge to catalyze the same reactions or to bind the same ligand with alternate binding modes. We discuss a special case of such convergent evolution that is relevant to the polypharmacology paradigm, wherein the same drug (methotrexate) binds to multiple non-homologous RLM drug targets with different topologies. Finally, assigning proteins with RLM domain to the Enzyme Commission classification suggest that RLM enzymes function mainly in metabolism (and comprise 38% of reference metabolic pathways) and are overrepresented in extant pathways that represent ancient biosynthetic routes such as nucleotide metabolism, energy metabolism, and metabolism of amino acids. In fact, RLM enzymes take part in five out of eight enzymatic reactions of the Wood-Ljungdahl metabolic pathway thought to be used by the last universal common ancestor (LUCA). The prevalence of RLM domains in this ancient metabolism might explain their wide distribution among enzymes.


Assuntos
Evolução Molecular , Domínios Proteicos/genética , Sítios de Ligação/genética , Domínio Catalítico/genética , Biologia Computacional , Bases de Dados de Proteínas , Enzimas/química , Enzimas/genética , Enzimas/metabolismo , Humanos , Ligantes , Redes e Vias Metabólicas/genética , Modelos Moleculares , Ligação Proteica/genética , Software , Homologia Estrutural de Proteína
7.
Bioinformatics ; 34(17): 2997-3003, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-29659718

RESUMO

Motivation: The ECOD database classifies protein domains based on their evolutionary relationships, considering both remote and close homology. The family group in ECOD provides classification of domains that are closely related to each other based on sequence similarity. Due to different perspectives on domain definition, direct application of existing sequence domain databases, such as Pfam, to ECOD struggles with several shortcomings. Results: We created multiple sequence alignments and profiles from ECOD domains with the help of structural information in alignment building and boundary delineation. We validated the alignment quality by scoring structure superposition to demonstrate that they are comparable to curated seed alignments in Pfam. Comparison to Pfam and CDD reveals that 27 and 16% of ECOD families are new, but they are also dominated by small families, likely because of the sampling bias from the PDB database. There are 35 and 48% of families whose boundaries are modified comparing to counterparts in Pfam and CDD, respectively. Availability and implementation: The new families are now integrated in the ECOD website. The aggregate HMMER profile library and alignment are available for download on ECOD website (http://prodata.swmed.edu/ecod). Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Domínios Proteicos , Alinhamento de Sequência , Software
8.
Nucleic Acids Res ; 45(D1): D296-D302, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899594

RESUMO

Evolutionary Classification Of protein Domains (ECOD) (http://prodata.swmed.edu/ecod) comprehensively classifies protein with known spatial structures maintained by the Protein Data Bank (PDB) into evolutionary groups of protein domains. ECOD relies on a combination of automatic and manual weekly updates to achieve its high accuracy and coverage with a short update cycle. ECOD classifies the approximately 120 000 depositions of the PDB into more than 500 000 domains in ∼3400 homologous groups. We show the performance of the weekly update pipeline since the release of ECOD, describe improvements to the ECOD website and available search options, and discuss novel structures and homologous groups that have been classified in the recent updates. Finally, we discuss the future directions of ECOD and further improvements planned for the hierarchy and update process.


Assuntos
Bases de Dados de Proteínas , Evolução Molecular , Modelos Moleculares , Domínios Proteicos , Proteínas , Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Proteínas/classificação , Proteínas/genética
9.
Proteins ; 84 Suppl 1: 20-33, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-26756794

RESUMO

Protein target structures for the Critical Assessment of Structure Prediction round 11 (CASP11) and CASP ROLL were split into domains and classified into categories suitable for assessment of template-based modeling (TBM) and free modeling (FM) based on their evolutionary relatedness to existing structures classified by the Evolutionary Classification of Protein Domains (ECOD) database. First, target structures were divided into domain-based evaluation units. Target splits were based on the domain organization of available templates as well as the performance of servers on whole targets compared to split target domains. Second, evaluation units were classified into TBM and FM categories using a combination of measures that evaluate prediction quality and template detectability. Generally, target domains with sequence-related templates and good server prediction performance were classified as TBM, whereas targets without sequence-identifiable templates and low server performance were classified as FM. As in previous CASP experiments, the boundaries for classification were blurred due to the presence of significant insertions and deteriorations in the targets with respect to homologous templates, as well as the presence of templates with partial coverage of new folds. The FM category included 45 target domains, which represents an unprecedented number of difficult CASP targets provided for modeling. Proteins 2016; 84(Suppl 1):20-33. © 2016 Wiley Periodicals, Inc.


Assuntos
Biologia Computacional/estatística & dados numéricos , Modelos Moleculares , Modelos Estatísticos , Proteínas/química , Software , Animais , Bacteriófagos/química , Biologia Computacional/métodos , Gráficos por Computador , Bases de Dados de Proteínas , Humanos , Cooperação Internacional , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas , Multimerização Proteica , Estrutura Secundária de Proteína , Proteínas/classificação , Homologia de Sequência de Aminoácidos
10.
Proteins ; 83(7): 1238-51, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25917548

RESUMO

ECOD (Evolutionary Classification Of protein Domains) is a comprehensive and up-to-date protein structure classification database. The majority of new structures released from the PDB (Protein Data Bank) each week already have close homologs in the ECOD hierarchy and thus can be reliably partitioned into domains and classified by software without manual intervention. However, those proteins that lack confidently detectable homologs require careful analysis by experts. Although many bioinformatics resources rely on expert curation to some degree, specific examples of how this curation occurs and in what cases it is necessary are not always described. Here, we illustrate the manual classification strategy in ECOD by example, focusing on two major issues in protein classification: domain partitioning and the relationship between homology and similarity scores. Most examples show recently released and manually classified PDB structures. We discuss multi-domain proteins, discordance between sequence and structural similarities, difficulties with assessing homology with scores, and integral membrane proteins homologous to soluble proteins. By timely assimilation of newly available structures into its hierarchy, ECOD strives to provide a most accurate and updated view of the protein structure world as a result of combined computational and expert-driven analysis.


Assuntos
Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Terminologia como Assunto , Sequência de Aminoácidos , Animais , Dimetilaliltranstransferase/química , Dimetilaliltranstransferase/classificação , Evolução Molecular , Humanos , Ligação de Hidrogênio , Interações Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Dados de Sequência Molecular , Neuropeptídeos/química , Neuropeptídeos/classificação , Neurotoxinas/química , Neurotoxinas/classificação , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Software , Venenos de Aranha/química , Venenos de Aranha/classificação , Eletricidade Estática
11.
PLoS Comput Biol ; 10(12): e1003926, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25474468

RESUMO

Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/classificação , Evolução Molecular , Modelos Moleculares
12.
Protein Sci ; 33(8): e5116, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38979784

RESUMO

Interactions between proteins and small organic compounds play a crucial role in regulating protein functions. These interactions can modulate various aspects of protein behavior, including enzymatic activity, signaling cascades, and structural stability. By binding to specific sites on proteins, small organic compounds can induce conformational changes, alter protein-protein interactions, or directly affect catalytic activity. Therefore, many drugs available on the market today are small molecules (72% of all approved drugs in the last 5 years). Proteins are composed of one or more domains: evolutionary units that convey function or fitness either singly or in concert with others. Understanding which domain(s) of the target protein binds to a drug can lead to additional opportunities for discovering novel targets. The evolutionary classification of protein domains (ECOD) classifies domains into an evolutionary hierarchy that focuses on distant homology. Previously, no structure-based protein domain classification existed that included information about both the interaction between small molecules or drugs and the structural domains of a target protein. This data is especially important for multidomain proteins and large complexes. Here, we present the DrugDomain database that reports the interaction between ECOD of human target proteins and DrugBank molecules and drugs. The pilot version of DrugDomain describes the interaction of 5160 DrugBank molecules associated with 2573 human proteins. It describes domains for all experimentally determined structures of these proteins and incorporates AlphaFold models when such structures are unavailable. The DrugDomain database is available online: http://prodata.swmed.edu/DrugDomain/.


Assuntos
Domínios Proteicos , Proteínas , Proteínas/química , Proteínas/metabolismo , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo , Bases de Dados de Proteínas , Humanos , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/metabolismo , Evolução Molecular , Ligação Proteica
13.
Sci Rep ; 14(1): 12260, 2024 05 28.
Artigo em Inglês | MEDLINE | ID: mdl-38806511

RESUMO

Salmonella enterica is a pathogenic bacterium known for causing severe typhoid fever in humans, making it important to study due to its potential health risks and significant impact on public health. This study provides evolutionary classification of proteins from Salmonella enterica pangenome. We classified 17,238 domains from 13,147 proteins from 79,758 Salmonella enterica strains and studied in detail domains of 272 proteins from 14 characterized Salmonella pathogenicity islands (SPIs). Among SPIs-related proteins, 90 proteins function in the secretion machinery. 41% domains of SPI proteins have no previous sequence annotation. By comparing clinical and environmental isolates, we identified 3682 proteins that are overrepresented in clinical group that we consider as potentially pathogenic. Among domains of potentially pathogenic proteins only 50% domains were annotated by sequence methods previously. Moreover, 36% (1330 out of 3682) of potentially pathogenic proteins cannot be classified into Evolutionary Classification of Protein Domains database (ECOD). Among classified domains of potentially pathogenic proteins the most populated homology groups include helix-turn-helix (HTH), Immunoglobulin-related, and P-loop domains-related. Functional analysis revealed overrepresentation of these protein in biological processes related to viral entry into host cell, antibiotic biosynthesis, DNA metabolism and conformation change, and underrepresentation in translational processes. Analysis of the potentially pathogenic proteins indicates that they form 119 clusters or novel potential pathogenicity islands (NPPIs) within the Salmonella genome, suggesting their potential contribution to the bacterium's virulence. One of the NPPIs revealed significant overrepresentation of potentially pathogenic proteins. Overall, our analysis revealed that identified potentially pathogenic proteins are poorly studied.


Assuntos
Proteínas de Bactérias , Genoma Bacteriano , Ilhas Genômicas , Salmonella enterica , Ilhas Genômicas/genética , Salmonella enterica/genética , Salmonella enterica/patogenicidade , Salmonella enterica/classificação , Proteínas de Bactérias/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Humanos , Domínios Proteicos
14.
Sci Rep ; 13(1): 11988, 2023 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-37491511

RESUMO

The recent progress in the prediction of protein structures marked a historical milestone. AlphaFold predicted 200 million protein models with an accuracy comparable to experimental methods. Protein structures are widely used to understand evolution and to identify potential drug targets for the treatment of various diseases, including cancer. Thus, these recently predicted structures might convey previously unavailable information about cancer biology. Evolutionary classification of protein domains is challenging and different approaches exist. Recently our team presented a classification of domains from human protein models released by AlphaFold. Here we evaluated the pan-cancer structurome, domains from over and under expressed proteins in 21 cancer types, using the broadest levels of the ECOD classification: the architecture (A-groups) and possible homology (X-groups) levels. Our analysis reveals that AlphaFold has greatly increased the three-dimensional structural landscape for proteins that are differentially expressed in these 21 cancer types. We show that beta sandwich domains are significantly overrepresented and alpha helical domains are significantly underrepresented in the majority of cancer types. Our data suggest that the prevalence of the beta sandwiches is due to the high levels of immunoglobulins and immunoglobulin-like domains that arise during tumor development-related inflammation. On the other hand, proteins with exclusively alpha domains are important elements of homeostasis, apoptosis and transmembrane transport. Therefore cancer cells tend to reduce representation of these proteins to promote successful oncogeneses.


Assuntos
Neoplasias , Proteínas , Humanos , Proteínas/química , Domínios Proteicos , Conformação Proteica em alfa-Hélice
15.
Protein Sci ; 32(2): e4548, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36539305

RESUMO

The recent breakthroughs in structure prediction, where methods such as AlphaFold demonstrated near-atomic accuracy, herald a paradigm shift in structural biology. The 200 million high-accuracy models released in the AlphaFold Database are expected to guide protein science in the coming decades. Partitioning these AlphaFold models into domains and assigning them to an evolutionary hierarchy provide an efficient way to gain functional insights into proteins. However, classifying such a large number of predicted structures challenges the infrastructure of current structure classifications, including our Evolutionary Classification of protein Domains (ECOD). Better computational tools are urgently needed to parse and classify domains from AlphaFold models automatically. Here we present a Domain Parser for AlphaFold Models (DPAM) that can automatically recognize globular domains from these models based on inter-residue distances in 3D structures, predicted aligned errors, and ECOD domains found by sequence (HHsuite) and structural (Dali) similarity searches. Based on a benchmark of 18,759 AlphaFold models, we demonstrate that DPAM can recognize 98.8% of domains and assign correct boundaries for 87.5%, significantly outperforming structure-based domain parsers and homology-based domain assignment using ECOD domains found by HHsuite or Dali. Application of DPAM to the massive AlphaFold models will enable efficient classification of domains, providing evolutionary contexts and facilitating functional studies.


Assuntos
Proteínas , Software , Bases de Dados de Proteínas , Proteínas/química , Domínios Proteicos , Evolução Molecular
16.
Protein Sci ; 32(9): e4750, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37572333

RESUMO

Control of eukaryotic cellular function is heavily reliant on the phosphorylation of proteins at specific amino acid residues, such as serine, threonine, tyrosine, and histidine. Protein kinases that are responsible for this process comprise one of the largest families of evolutionarily related proteins. Dysregulation of protein kinase signaling pathways is a frequent cause of a large variety of human diseases including cancer, autoimmune, neurodegenerative, and cardiovascular disorders. In this study, we mapped all pathogenic mutations in 497 human protein kinase domains from the ClinVar database to the reference structure of Aurora kinase A (AURKA) and grouped them by the relevance to the disease type. Our study revealed that the majority of mutation hotspots associated with cancer are situated within the catalytic and activation loops of the kinase domain, whereas non-cancer-related hotspots tend to be located outside of these regions. Additionally, we identified a hotspot at residue R371 of the AURKA structure that has the highest number of exclusively non-cancer-related pathogenic mutations (21) and has not been previously discussed.


Assuntos
Proteínas Quinases , Proteínas Serina-Treonina Quinases , Humanos , Proteínas Quinases/química , Proteínas Serina-Treonina Quinases/química , Aurora Quinase A/genética , Aurora Quinase A/química , Aurora Quinase A/metabolismo , Modelos Moleculares , Fosforilação , Mutação
17.
mSystems ; 8(6): e0079623, 2023 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-38014954

RESUMO

IMPORTANCE: The pandemic Vpar strain RIMD causes seafood-borne illness worldwide. Previous comparative genomic studies have revealed pathogenicity islands in RIMD that contribute to the success of the strain in infection. However, not all virulence determinants have been identified, and many of the proteins encoded in known pathogenicity islands are of unknown function. Based on the EOCD database, we used evolution-based classification of structure models for the RIMD proteome to improve our functional understanding of virulence determinants acquired by the pandemic strain. We further identify and classify previously unknown mobile protein domains as well as fast evolving residue positions in structure models that contribute to virulence and adaptation with respect to a pre-pandemic strain. Our work highlights key contributions of phage in mediating seafood born illness, suggesting this strain balances its avoidance of phage predators with its successful colonization of human hosts.


Assuntos
Vibrio parahaemolyticus , Humanos , Virulência/genética , Vibrio parahaemolyticus/genética , Fatores de Virulência/genética , Genômica
18.
Bioinformatics ; 27(1): 46-54, 2011 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-21068000

RESUMO

MOTIVATION: The discovery of new protein folds is a relatively rare occurrence even as the rate of protein structure determination increases. This rarity reinforces the concept of folds as reusable units of structure and function shared by diverse proteins. If the folding mechanism of proteins is largely determined by their topology, then the folding pathways of members of existing folds could encompass the full set used by globular protein domains. RESULTS: We have used recent versions of three common protein domain dictionaries (SCOP, CATH and Dali) to generate a consensus domain dictionary (CDD). Surprisingly, 40% of the metafolds in the CDD are not composed of autonomous structural domains, i.e. they are not plausible independent folding units. This finding has serious ramifications for bioinformatics studies mining these domain dictionaries for globular protein properties. However, our main purpose in deriving this CDD was to generate an updated CDD to choose targets for MD simulation as part of our dynameomics effort, which aims to simulate the native and unfolding pathways of representatives of all globular protein consensus folds (metafolds). Consequently, we also compiled a list of representative protein targets of each metafold in the CDD. AVAILABILITY AND IMPLEMENTATION: This domain dictionary is available at www.dynameomics.org.


Assuntos
Dicionários como Assunto , Estrutura Terciária de Proteína , Biologia Computacional , Modelos Moleculares , Anotação de Sequência Molecular , Dobramento de Proteína
19.
Biochemistry ; 50(6): 1029-41, 2011 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-21190388

RESUMO

To provide insight into the role of local sequence in the nonrandom coil behavior of the denatured state, we have extended our measurements of histidine-heme loop formation equilibria for cytochrome c' to 6 M guanidine hydrochloride. We observe that there is some reduction in the scatter about the best fit line of loop stability versus loop size data in 6 M versus 3 M guanidine hydrochloride, but the scatter is not eliminated. The scaling exponent, ν(3), of 2.5 ± 0.2 is also similar to that found previously in 3 M guanidine hydrochloride (2.6 ± 0.3). Rates of histidine-heme loop breakage in the denatured state of cytochrome c' show that some histidine-heme loops are significantly more persistent than others at both 3 and 6 M guanidine hydrochloride. Rates of histidine-heme loop formation more closely approximate random coil behavior. This observation indicates that heterogeneity in the denatured state ensemble results mainly from contact persistence. When mapped onto the structure of cytochrome c', the histidine-heme loops with slow breakage rates coincide with chain reversals between helices 1 and 2 and between helices 2 and 3. Molecular dynamics simulations of the unfolding of cytochrome c' at 498 K show that these reverse turns persist in the unfolded state. Thus, these portions of the primary structure of cytochrome c' set up the topology of cytochrome c' in the denatured state, predisposing the protein to fold efficiently to its native structure.


Assuntos
Proteínas de Bactérias/química , Citocromos c'/química , Rodopseudomonas/metabolismo , Guanidina/metabolismo , Concentração de Íons de Hidrogênio , Cinética , Modelos Moleculares , Conformação Proteica , Desnaturação Proteica , Dobramento de Proteína
20.
Curr Opin Struct Biol ; 18(1): 4-9, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18242977

RESUMO

All-atom molecular dynamics (MD) simulations on increasingly powerful computers have been combined with experiments to characterize protein folding in detail over wider time ranges. The folding of small ultrafast folding proteins is being simulated on micros timescales, leading to improved structural predictions and folding rates. To what extent is 'closing the gap' between simulation and experiment for such systems providing insights into general mechanisms of protein folding?


Assuntos
Simulação por Computador , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Animais , Humanos , Espectroscopia de Ressonância Magnética , Estrutura Terciária de Proteína , Proteínas/química , Ureia/química , Água/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA