Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Bioinformatics ; 36(17): 4643-4648, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32399560

RESUMO

MOTIVATION: The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. RESULTS: In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. AVAILABILITY AND IMPLEMENTATION: UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.


Assuntos
Bases de Conhecimento , Proteínas , Mapeamento Cromossômico , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Proteínas/genética
3.
Nucleic Acids Res ; 43(Database issue): D1064-70, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25348399

RESUMO

HAMAP (High-quality Automated and Manual Annotation of Proteins--available at http://hamap.expasy.org/) is a system for the automatic classification and annotation of protein sequences. HAMAP provides annotation of the same quality and detail as UniProtKB/Swiss-Prot, using manually curated profiles for protein sequence family classification and expert curated rules for functional annotation of family members. HAMAP data and tools are made available through our website and as part of the UniRule pipeline of UniProt, providing annotation for millions of unreviewed sequences of UniProtKB/TrEMBL. Here we report on the growth of HAMAP and updates to the HAMAP system since our last report in the NAR Database Issue of 2013. We continue to augment HAMAP with new family profiles and annotation rules as new protein families are characterized and annotated in UniProtKB/Swiss-Prot; the latest version of HAMAP (as of 3 September 2014) contains 1983 family classification profiles and 1998 annotation rules (up from 1780 and 1720). We demonstrate how the complex logic of HAMAP rules allows for precise annotation of individual functional variants within large homologous protein families. We also describe improvements to our web-based tool HAMAP-Scan which simplify the classification and annotation of sequences, and the incorporation of an improved sequence-profile search algorithm.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Homologia de Sequência de Aminoácidos , Humanos , Internet , Proteínas/classificação
4.
Nucleic Acids Res ; 41(Database issue): D584-9, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23193261

RESUMO

HAMAP (High-quality Automated and Manual Annotation of Proteins-available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profiles.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Proteínas/classificação , Eucariotos/genética , Internet
5.
Nucleic Acids Res ; 40(Database issue): D841-6, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22121220

RESUMO

IntAct is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. Two levels of curation are now available within the database, with both IMEx-level annotation and less detailed MIMIx-compatible entries currently supported. As from September 2011, IntAct contains approximately 275,000 curated binary interaction evidences from over 5000 publications. The IntAct website has been improved to enhance the search process and in particular the graphical display of the results. New data download formats are also available, which will facilitate the inclusion of IntAct's data in the Semantic Web. IntAct is an active contributor to the IMEx consortium (http://www.imexconsortium.org). IntAct source code and data are freely available at http://www.ebi.ac.uk/intact.


Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Gráficos por Computador , Genes , Internet , Anotação de Sequência Molecular , Análise de Sequência de Proteína , Software
6.
Nucleic Acids Res ; 40(Database issue): D565-70, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22123736

RESUMO

The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360,000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Vocabulário Controlado , Anotação de Sequência Molecular/normas
7.
ArXiv ; 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38903736

RESUMO

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text (Named Entity Recognition, or NER) and to extract the chemical conversions in which they participate (Relation Extraction, or RE), with average F1 score of 86.30% for NER, 86.66% for RE for chemical conversion pairs, and 83.79% for RE for chemical conversion pairs and linked enzymes. We combine the best performing methods after fine-tuning using EnzChemRED to create an end-to-end pipeline for knowledge extraction from text and apply this to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea. The EnzChemRED corpus is freely available at https://ftp.expasy.org/databases/rhea/nlp/.

8.
Database (Oxford) ; 20222022 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-35411389

RESUMO

SwissBioPics (www.swissbiopics.org) is a freely available resource of interactive, high-resolution cell images designed for the visualization of subcellular location data. SwissBioPics provides images describing cell types from all kingdoms of life-from the specialized muscle, neuronal and epithelial cells of animals, to the rods, cocci, clubs and spirals of prokaryotes. All cell images in SwissBioPics are drawn in Scalable Vector Graphics (SVG), with each subcellular location tagged with a unique identifier from the controlled vocabulary of subcellular locations and organelles of UniProt (https://www.uniprot.org/locations/). Users can search and explore SwissBioPics cell images through our website, which provides a platform for users to learn more about how cells are organized. A web component allows developers to embed SwissBioPics images in their own websites, using the associated JavaScript and a styling template, and to highlight subcellular locations and organelles by simply providing the web component with the appropriate identifier(s) from the UniProt-controlled vocabulary or the 'Cellular Component' branch of the Gene Ontology (www.geneontology.org), as well as an organism identifier from the National Center for Biotechnology Information taxonomy (https://www.ncbi.nlm.nih.gov/taxonomy). The UniProt website now uses SwissBioPics to visualize the subcellular locations and organelles where proteins function. SwissBioPics is freely available for anyone to use under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. DATABASE URL: www.swissbiopics.org.


Assuntos
Proteínas , Vocabulário Controlado , Animais
9.
Metabolites ; 11(1)2021 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-33445429

RESUMO

The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, and agronomical interest. Here we describe work that enhances the utility of UniProtKB as a support for both the study of natural products and for their discovery. The foundation of this work is an improved representation of natural product metabolism in UniProtKB using Rhea, an expert-curated knowledgebase of biochemical reactions, that is built on the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Knowledge of natural products and precursors is captured in ChEBI, enzyme-catalyzed reactions in Rhea, and enzymes in UniProtKB/Swiss-Prot, thereby linking chemical structure data directly to protein knowledge. We provide a practical demonstration of how users can search UniProtKB for protein knowledge relevant to natural products through interactive or programmatic queries using metabolite names and synonyms, chemical identifiers, chemical classes, and chemical structures and show how to federate UniProtKB with other data and knowledge resources and tools using semantic web technologies such as RDF and SPARQL. All UniProtKB data are freely available for download in a broad range of formats for users to further mine or exploit as an annotation source, to enrich other natural product datasets and databases.

10.
Gigascience ; 9(2)2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-32034905

RESUMO

BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. RESULTS: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. CONCLUSIONS: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.


Assuntos
Genômica/métodos , Anotação de Sequência Molecular/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Software/normas , Animais , Genômica/normas , Humanos , Anotação de Sequência Molecular/normas , Análise de Sequência de DNA/normas , Análise de Sequência de Proteína/normas
11.
Mol Cell Biol ; 25(1): 488-98, 2005 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15601868

RESUMO

The Ccr4-Not complex is a conserved global regulator of gene expression, which serves as a regulatory platform that senses and/or transmits nutrient and stress signals to various downstream effectors. Presumed effectors of this complex in yeast are TFIID, a general transcription factor that associates with the core promoter, and Msn2, a key transcription factor that regulates expression of stress-responsive element (STRE)-controlled genes. Here we show that the constitutively high level of STRE-driven expression in ccr4-not mutants results from two independent effects. Accordingly, loss of Ccr4-Not function causes a dramatic Msn2-independent redistribution of TFIID on promoters with a particular bias for STRE-controlled over ribosomal protein gene promoters. In parallel, loss of Ccr4-Not complex function results in an alteration of the posttranslational modification status of Msn2, which depends on the type 1 protein phosphatase Glc7 and its newly identified subunit Bud14. Tests of epistasis as well as transcriptional analyses of Bud14-dependent transcription support a model in which the Ccr4-Not complex prevents activation of Msn2 via inhibition of the Bud14/Glc7 module in exponentially growing cells. Thus, increased activity of STRE genes in ccr4-not mutants may result from both altered general distribution of TFIID and unscheduled activation of Msn2.


Assuntos
Proteínas de Ciclo Celular/fisiologia , Proteínas de Ligação a DNA/fisiologia , Ribonucleases/fisiologia , Proteínas de Saccharomyces cerevisiae/fisiologia , Fatores de Transcrição/fisiologia , Ativação Transcricional , Reagentes de Ligações Cruzadas/farmacologia , DNA/metabolismo , Regulação da Expressão Gênica , Genótipo , Glucose/metabolismo , Immunoblotting , Imunoprecipitação , Modelos Biológicos , Mutação , Hibridização de Ácido Nucleico , Fosfoproteínas Fosfatases/metabolismo , Plasmídeos/metabolismo , Regiões Promotoras Genéticas , Ligação Proteica , Proteína Fosfatase 1 , Processamento de Proteína Pós-Traducional , RNA Mensageiro/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Fatores de Tempo , Fator de Transcrição TFIID/química , Transcrição Gênica , Técnicas do Sistema de Duplo-Híbrido
12.
Genome Biol ; 12(1): R7, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21247460

RESUMO

BACKGROUND: Millions of humans and animals suffer from superficial infections caused by a group of highly specialized filamentous fungi, the dermatophytes, which exclusively infect keratinized host structures. To provide broad insights into the molecular basis of the pathogenicity-associated traits, we report the first genome sequences of two closely phylogenetically related dermatophytes, Arthroderma benhamiae and Trichophyton verrucosum, both of which induce highly inflammatory infections in humans. RESULTS: 97% of the 22.5 megabase genome sequences of A. benhamiae and T. verrucosum are unambiguously alignable and collinear. To unravel dermatophyte-specific virulence-associated traits, we compared sets of potentially pathogenicity-associated proteins, such as secreted proteases and enzymes involved in secondary metabolite production, with those of closely related onygenales (Coccidioides species) and the mould Aspergillus fumigatus. The comparisons revealed expansion of several gene families in dermatophytes and disclosed the peculiarities of the dermatophyte secondary metabolite gene sets. Secretion of proteases and other hydrolytic enzymes by A. benhamiae was proven experimentally by a global secretome analysis during keratin degradation. Molecular insights into the interaction of A. benhamiae with human keratinocytes were obtained for the first time by global transcriptome profiling. Given that A. benhamiae is able to undergo mating, a detailed comparison of the genomes further unraveled the genetic basis of sexual reproduction in this species. CONCLUSIONS: Our results enlighten the genetic basis of fundamental and putatively virulence-related traits of dermatophytes, advancing future research on these medically important pathogens.


Assuntos
Arthrodermataceae/genética , Arthrodermataceae/patogenicidade , Animais , Arthrodermataceae/classificação , Arthrodermataceae/metabolismo , Hibridização Genômica Comparativa , Evolução Molecular , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Regulação Fúngica da Expressão Gênica , Genoma Fúngico , Humanos , Queratinócitos/metabolismo , Queratinócitos/microbiologia , Queratinas/metabolismo , Família Multigênica , Peptídeo Hidrolases/genética , Filogenia , Transcriptoma
13.
Cell Div ; 1: 3, 2006 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-16759348

RESUMO

In recent years, the general understanding of nutrient sensing and signalling, as well as the knowledge about responses triggered by altered nutrient availability have greatly advanced. While initial studies were directed to top-down elucidation of single nutrient-induced pathways, recent investigations place the individual signalling pathways into signalling networks and pursue the identification of converging effector branches that orchestrate the dynamical responses to nutritional cues. In this review, we focus on Rim15, a protein kinase required in yeast for the proper entry into stationary phase (G0). Recent studies revealed that the activity of Rim15 is regulated by the interplay of at least four intercepting nutrient-responsive pathways.

14.
EMBO J ; 24(24): 4271-8, 2005 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-16308562

RESUMO

Eukaryotic cell proliferation is controlled by growth factors and essential nutrients. In their absence, cells may enter into a quiescent state (G0). In Saccharomyces cerevisiae, the conserved protein kinase A (PKA) and rapamycin-sensitive TOR (TORC1) pathways antagonize G0 entry in response to carbon and/or nitrogen availability primarily by inhibiting the PAS kinase Rim15 function. Here, we show that the phosphate-sensing Pho80-Pho85 cyclin-cyclin-dependent kinase (CDK) complex also participates in Rim15 inhibition through direct phosphorylation, thereby effectively sequestering Rim15 in the cytoplasm via its association with 14-3-3 proteins. Inactivation of either Pho80-Pho85 or TORC1 causes dephosphorylation of the 14-3-3-binding site in Rim15, thus enabling nuclear import of Rim15 and induction of the Rim15-controlled G0 program. Importantly, we also show that Pho80-Pho85 and TORC1 converge on a single amino acid in Rim15. Thus, Rim15 plays a key role in G0 entry through its ability to integrate signaling from the PKA, TORC1, and Pho80-Pho85 pathways.


Assuntos
Quinases Ciclina-Dependentes/metabolismo , Ciclinas/química , Proteínas Repressoras/química , Fase de Repouso do Ciclo Celular , Proteínas de Saccharomyces cerevisiae/química , Proteínas 14-3-3/metabolismo , Transporte Ativo do Núcleo Celular , Sítios de Ligação , Núcleo Celular/metabolismo , Proteínas Quinases Dependentes de AMP Cíclico/metabolismo , Quinases Ciclina-Dependentes/química , Ciclinas/metabolismo , Citoplasma/metabolismo , Glutationa Transferase/metabolismo , Proteínas de Fluorescência Verde/metabolismo , Immunoblotting , Imunoprecipitação , Modelos Biológicos , Mutação , Fosfatidilinositol 3-Quinases/metabolismo , Fosforilação , Fosfotransferases (Aceptor do Grupo Álcool)/metabolismo , Plasmídeos/metabolismo , Ligação Proteica , Proteínas Quinases/metabolismo , Estrutura Terciária de Proteína , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Transdução de Sinais , Serina-Treonina Quinases TOR , Fatores de Tempo
15.
EMBO J ; 24(17): 3000-11, 2005 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-16107882

RESUMO

Regulated interactions between microtubules (MTs) and the cell cortex control MT dynamics and position the mitotic spindle. In eukaryotic cells, the adenomatous polyposis coli/Kar9p and dynein/dynactin pathways are involved in guiding MT plus ends and MT sliding along the cortex, respectively. Here we identify Bud14p as a novel cortical activator of the dynein/dynactin complex in budding yeast. Bud14p accumulates at sites of polarized growth and the mother-bud neck during cytokinesis. The localization to bud and shmoo tips requires an intact actin cytoskeleton and the kelch-domain-containing proteins Kel1p and Kel2p. While cells lacking Bud14p function fail to stabilize the pre-anaphase spindle at the mother-bud neck, overexpression of Bud14p is toxic and leads to elongated astral MTs and increased dynein-dependent sliding along the cell cortex. Bud14p physically interacts with the type-I phosphatase Glc7p, and localizes Glc7p to the bud cortex. Importantly, the formation of Bud14p-Glc7p complexes is necessary to regulate MT dynamics at the cortex. Taken together, our results suggest that Bud14p functions as a regulatory subunit of the Glc7p type-I phosphatase to stabilize MT interactions specifically at sites of polarized growth.


Assuntos
Dineínas/metabolismo , Fosfoproteínas Fosfatases/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Citoesqueleto de Actina/metabolismo , Proteínas Adaptadoras de Transdução de Sinal , Motivos de Aminoácidos , Proteínas de Transporte/metabolismo , Sequência Conservada , Citocinese , Complexo Dinactina , Proteínas Associadas aos Microtúbulos/metabolismo , Microtúbulos/metabolismo , Mitose , Mutação , Fosfoproteínas Fosfatases/genética , Ligação Proteica , Proteína Fosfatase 1 , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Fuso Acromático
16.
Mol Cell ; 12(6): 1607-13, 2003 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-14690612

RESUMO

The highly conserved Tor kinases (TOR) and the protein kinase A (PKA) pathway regulate cell proliferation in response to growth factors and/or nutrients. In Saccharomyces cerevisiae, loss of either TOR or PKA causes cells to arrest growth early in G(1) and to enter G(0) by mechanisms that are poorly understood. Here we demonstrate that the protein kinase Rim15 is required for entry into G(0) following inactivation of TOR and/or PKA. Induction of Rim15-dependent G(0) traits requires two discrete processes, i.e., nuclear accumulation of Rim15, which is negatively regulated both by a Sit4-independent TOR effector branch and the protein kinase B (PKB/Akt) homolog Sch9, and release from PKA-mediated inhibition of its protein kinase activity. Thus, Rim15 integrates signals from at least three nutrient-sensory kinases (TOR, PKA, and Sch9) to properly control entry into G(0), a key developmental process in eukaryotic cells.


Assuntos
Proteínas Quinases Dependentes de AMP Cíclico/metabolismo , Fosfatidilinositol 3-Quinases/metabolismo , Fosfotransferases (Aceptor do Grupo Álcool)/metabolismo , Proteínas Quinases/metabolismo , Fase de Repouso do Ciclo Celular/fisiologia , Proteínas de Saccharomyces cerevisiae/metabolismo , Transdução de Sinais/fisiologia , Animais , Antifúngicos/metabolismo , Regulação Fúngica da Expressão Gênica , Glucose/metabolismo , Fenótipo , Fosfoproteínas Fosfatases/metabolismo , Fosforilação , Proteína Fosfatase 2 , Saccharomyces cerevisiae/fisiologia , Sirolimo/metabolismo
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa