Búsqueda | BVS Bolivia

1.

EnzChemRED, a rich enzyme chemistry relation extraction dataset.

Lai, Po-Ting; Coudert, Elisabeth; Aimo, Lucila; Axelsen, Kristian; Breuza, Lionel; de Castro, Edouard; Feuermann, Marc; Morgat, Anne; Pourcel, Lucille; Pedruzzi, Ivo; Poux, Sylvain; Redaschi, Nicole; Rivoire, Catherine; Sveshnikova, Anastasia; Wei, Chih-Hsuan; Leaman, Robert; Luo, Ling; Lu, Zhiyong; Bridge, Alan.

ArXiv ; 2024 Apr 22.

Artículo en Inglés | MEDLINE | ID: mdl-38903736

RESUMEN

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text (Named Entity Recognition, or NER) and to extract the chemical conversions in which they participate (Relation Extraction, or RE), with average F1 score of 86.30% for NER, 86.66% for RE for chemical conversion pairs, and 83.79% for RE for chemical conversion pairs and linked enzymes. We combine the best performing methods after fine-tuning using EnzChemRED to create an end-to-end pipeline for knowledge extraction from text and apply this to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea. The EnzChemRED corpus is freely available at https://ftp.expasy.org/databases/rhea/nlp/.

2.

ViralZone 2024 provides higher-resolution images and advanced virus-specific resources.

De Castro, Edouard; Hulo, Chantal; Masson, Patrick; Auchincloss, Andrea; Bridge, Alan; Le Mercier, Philippe.

Nucleic Acids Res ; 52(D1): D817-D821, 2024 Jan 05.

Artículo en Inglés | MEDLINE | ID: mdl-37897348

RESUMEN

ViralZone (http://viralzone.expasy.org) is a knowledge repository for viruses that links biological knowledge and databases. It contains data on virion structure, genome, proteome, replication cycle and host-virus interactions. The new update provides better access to the data through contextual popups and higher resolution images in Scalable Vector Graphics (SVG) format. These images are designed to be dynamic and interactive with human viruses to give users better access to the data. In addition, a new coronavirus-specific resource provides regularly updated data on variants and molecular biology of SARS-CoV-2. Other virus-specific resources have been added to the database, particularly for HIV, herpesviruses and poxviruses.

Asunto(s)

Bases del Conocimiento , Virus , Humanos , Virión/química , Virión/genética , Virión/crecimiento & desarrollo , Virus/química , Virus/genética , Virus/crecimiento & desarrollo

3.

Annotation of biologically relevant ligands in UniProtKB using ChEBI.

Coudert, Elisabeth; Gehant, Sebastien; de Castro, Edouard; Pozzato, Monica; Baratin, Delphine; Neto, Teresa; Sigrist, Christian J A; Redaschi, Nicole; Bridge, Alan.

Bioinformatics ; 39(1)2023 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-36484697

RESUMEN

MOTIVATION: To provide high quality, computationally tractable annotation of binding sites for biologically relevant (cognate) ligands in UniProtKB using the chemical ontology ChEBI (Chemical Entities of Biological Interest), to better support efforts to study and predict functionally relevant interactions between protein sequences and structures and small molecule ligands. RESULTS: We structured the data model for cognate ligand binding site annotations in UniProtKB and performed a complete reannotation of all cognate ligand binding sites using stable unique identifiers from ChEBI, which we now use as the reference vocabulary for all such annotations. We developed improved search and query facilities for cognate ligands in the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that ChEBI provides. AVAILABILITY AND IMPLEMENTATION: Binding site annotations for cognate ligands described using ChEBI are available for UniProtKB protein sequence records in several formats (text, XML and RDF) and are freely available to query and download through the UniProt website (www.uniprot.org), REST API (www.uniprot.org/help/api), SPARQL endpoint (sparql.uniprot.org/) and FTP site (https://ftp.uniprot.org/pub/databases/uniprot/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Bases del Conocimiento , Bases de Datos de Proteínas , Ligandos , Secuencia de Aminoácidos , Sitios de Unión , Anotación de Secuencia Molecular

4.

SwissBioPics-an interactive library of cell images for the visualization of subcellular location data.

Le Mercier, Philippe; Bolleman, Jerven; de Castro, Edouard; Gasteiger, Elisabeth; Bansal, Parit; Auchincloss, Andrea H; Boutet, Emmanuel; Breuza, Lionel; Casals-Casas, Cristina; Estreicher, Anne; Feuermann, Marc; Lieberherr, Damien; Rivoire, Catherine; Pedruzzi, Ivo; Redaschi, Nicole; Bridge, Alan.

Database (Oxford) ; 20222022 04 12.

Artículo en Inglés | MEDLINE | ID: mdl-35411389

RESUMEN

SwissBioPics (www.swissbiopics.org) is a freely available resource of interactive, high-resolution cell images designed for the visualization of subcellular location data. SwissBioPics provides images describing cell types from all kingdoms of life-from the specialized muscle, neuronal and epithelial cells of animals, to the rods, cocci, clubs and spirals of prokaryotes. All cell images in SwissBioPics are drawn in Scalable Vector Graphics (SVG), with each subcellular location tagged with a unique identifier from the controlled vocabulary of subcellular locations and organelles of UniProt (https://www.uniprot.org/locations/). Users can search and explore SwissBioPics cell images through our website, which provides a platform for users to learn more about how cells are organized. A web component allows developers to embed SwissBioPics images in their own websites, using the associated JavaScript and a styling template, and to highlight subcellular locations and organelles by simply providing the web component with the appropriate identifier(s) from the UniProt-controlled vocabulary or the 'Cellular Component' branch of the Gene Ontology (www.geneontology.org), as well as an organism identifier from the National Center for Biotechnology Information taxonomy (https://www.ncbi.nlm.nih.gov/taxonomy). The UniProt website now uses SwissBioPics to visualize the subcellular locations and organelles where proteins function. SwissBioPics is freely available for anyone to use under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. DATABASE URL: www.swissbiopics.org.

Asunto(s)

Proteínas , Vocabulario Controlado , Animales

5.

UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase.

MacDougall, Alistair; Volynkin, Vladimir; Saidi, Rabie; Poggioli, Diego; Zellner, Hermann; Hatton-Ellis, Emma; Joshi, Vishal; O'Donovan, Claire; Orchard, Sandra; Auchincloss, Andrea H; Baratin, Delphine; Bolleman, Jerven; Coudert, Elisabeth; de Castro, Edouard; Hulo, Chantal; Masson, Patrick; Pedruzzi, Ivo; Rivoire, Catherine; Arighi, Cecilia; Wang, Qinghua; Chen, Chuming; Huang, Hongzhan; Garavelli, John; Vinayaka, C R; Yeh, Lai-Su; Natale, Darren A; Laiho, Kati; Martin, Maria-Jesus; Renaux, Alexandre; Pichler, Klemens.

Bioinformatics ; 36(22-23): 5562, 2021 04 01.

Artículo en Inglés | MEDLINE | ID: mdl-33821964

6.

Diverse Taxonomies for Diverse Chemistries: Enhanced Representation of Natural Product Metabolism in UniProtKB.

Feuermann, Marc; Boutet, Emmanuel; Morgat, Anne; Axelsen, Kristian B; Bansal, Parit; Bolleman, Jerven; de Castro, Edouard; Coudert, Elisabeth; Gasteiger, Elisabeth; Géhant, Sébastien; Lieberherr, Damien; Lombardot, Thierry; Neto, Teresa B; Pedruzzi, Ivo; Poux, Sylvain; Pozzato, Monica; Redaschi, Nicole; Bridge, Alan.

Metabolites ; 11(1)2021 Jan 12.

Artículo en Inglés | MEDLINE | ID: mdl-33445429

RESUMEN

The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, and agronomical interest. Here we describe work that enhances the utility of UniProtKB as a support for both the study of natural products and for their discovery. The foundation of this work is an improved representation of natural product metabolism in UniProtKB using Rhea, an expert-curated knowledgebase of biochemical reactions, that is built on the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Knowledge of natural products and precursors is captured in ChEBI, enzyme-catalyzed reactions in Rhea, and enzymes in UniProtKB/Swiss-Prot, thereby linking chemical structure data directly to protein knowledge. We provide a practical demonstration of how users can search UniProtKB for protein knowledge relevant to natural products through interactive or programmatic queries using metabolite names and synonyms, chemical identifiers, chemical classes, and chemical structures and show how to federate UniProtKB with other data and knowledge resources and tools using semantic web technologies such as RDF and SPARQL. All UniProtKB data are freely available for download in a broad range of formats for users to further mine or exploit as an annotation source, to enrich other natural product datasets and databases.

7.

UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase.

MacDougall, Alistair; Volynkin, Vladimir; Saidi, Rabie; Poggioli, Diego; Zellner, Hermann; Hatton-Ellis, Emma; Joshi, Vishal; O'Donovan, Claire; Orchard, Sandra; Auchincloss, Andrea H; Baratin, Delphine; Bolleman, Jerven; Coudert, Elisabeth; de Castro, Edouard; Hulo, Chantal; Masson, Patrick; Pedruzzi, Ivo; Rivoire, Catherine; Arighi, Cecilia; Wang, Qinghua; Chen, Chuming; Huang, Hongzhan; Garavelli, John; Vinayaka, C R; Yeh, Lai-Su; Natale, Darren A; Laiho, Kati; Martin, Maria-Jesus; Renaux, Alexandre; Pichler, Klemens.

Bioinformatics ; 36(17): 4643-4648, 2020 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-32399560

RESUMEN

MOTIVATION: The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. RESULTS: In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. AVAILABILITY AND IMPLEMENTATION: UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.

Asunto(s)

Bases del Conocimiento , Proteínas , Mapeo Cromosómico , Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Proteínas/genética

8.

HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes.

Bolleman, Jerven; de Castro, Edouard; Baratin, Delphine; Gehant, Sebastien; Cuche, Beatrice A; Auchincloss, Andrea H; Coudert, Elisabeth; Hulo, Chantal; Masson, Patrick; Pedruzzi, Ivo; Rivoire, Catherine; Xenarios, Ioannis; Redaschi, Nicole; Bridge, Alan.

Gigascience ; 9(2)2020 02 01.

Artículo en Inglés | MEDLINE | ID: mdl-32034905

RESUMEN

BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. RESULTS: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. CONCLUSIONS: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.

Asunto(s)

Genómica/métodos , Anotación de Secuencia Molecular/métodos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos/normas , Animales , Genómica/normas , Humanos , Anotación de Secuencia Molecular/normas , Análisis de Secuencia de ADN/normas , Análisis de Secuencia de Proteína/normas

9.

[ViralZone : digitalizing for knowledge sharing]. / ViralZone : le numérique au service du partage des savoirs.

Le Mercier, Philippe; Hulo, Chantal; Masson, Patrick; de Castro, Edouard.

Virologie (Montrouge) ; 24(6): 437-440, 2020 Dec 01.

Artículo en Francés | MEDLINE | ID: mdl-33441292

10.

Enzyme annotation in UniProtKB using Rhea.

Morgat, Anne; Lombardot, Thierry; Coudert, Elisabeth; Axelsen, Kristian; Neto, Teresa Batista; Gehant, Sebastien; Bansal, Parit; Bolleman, Jerven; Gasteiger, Elisabeth; de Castro, Edouard; Baratin, Delphine; Pozzato, Monica; Xenarios, Ioannis; Poux, Sylvain; Redaschi, Nicole; Bridge, Alan.

Bioinformatics ; 36(6): 1896-1901, 2020 03 01.

Artículo en Inglés | MEDLINE | ID: mdl-31688925

RESUMEN

MOTIVATION: To provide high quality computationally tractable enzyme annotation in UniProtKB using Rhea, a comprehensive expert-curated knowledgebase of biochemical reactions which describes reaction participants using the ChEBI (Chemical Entities of Biological Interest) ontology. RESULTS: We replaced existing textual descriptions of biochemical reactions in UniProtKB with their equivalents from Rhea, which is now the standard for annotation of enzymatic reactions in UniProtKB. We developed improved search and query facilities for the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that Rhea and ChEBI provide. AVAILABILITY AND IMPLEMENTATION: UniProtKB at https://www.uniprot.org; UniProt REST API at https://www.uniprot.org/help/api; UniProt SPARQL endpoint at https://sparql.uniprot.org/; Rhea at https://www.rhea-db.org.

Asunto(s)

Reiformes , Animales , Bases de Datos de Proteínas , Bases del Conocimiento

11.

Bacterial Virus Ontology; Coordinating across Databases.

Hulo, Chantal; Masson, Patrick; Toussaint, Ariane; Osumi-Sutherland, David; de Castro, Edouard; Auchincloss, Andrea H; Poux, Sylvain; Bougueleret, Lydie; Xenarios, Ioannis; Le Mercier, Philippe.

Viruses ; 9(6)2017 05 23.

Artículo en Inglés | MEDLINE | ID: mdl-28545254

RESUMEN

Bacterial viruses, also called bacteriophages, display a great genetic diversity and utilize unique processes for infecting and reproducing within a host cell. All these processes were investigated and indexed in the ViralZone knowledge base. To facilitate standardizing data, a simple ontology of viral life-cycle terms was developed to provide a common vocabulary for annotating data sets. New terminology was developed to address unique viral replication cycle processes, and existing terminology was modified and adapted. Classically, the viral life-cycle is described by schematic pictures. Using this ontology, it can be represented by a combination of successive events: entry, latency, transcription/replication, host-virus interactions and virus release. Each of these parts is broken down into discrete steps. For example enterobacteria phage lambda entry is broken down in: viral attachment to host adhesion receptor, viral attachment to host entry receptor, viral genome ejection and viral genome circularization. To demonstrate the utility of a standard ontology for virus biology, this work was completed by annotating virus data in the ViralZone, UniProtKB and Gene Ontology databases.

Asunto(s)

Bacteriófagos/genética , Bacteriófagos/fisiología , Ontologías Biológicas , Bacteriófagos/clasificación , Bacteriófagos/crecimiento & desarrollo , Bases de Datos Factuales , Interacciones Huésped-Patógeno , Terminología como Asunto

12.

The ins and outs of eukaryotic viruses: Knowledge base and ontology of a viral infection.

Hulo, Chantal; Masson, Patrick; de Castro, Edouard; Auchincloss, Andrea H; Foulger, Rebecca; Poux, Sylvain; Lomax, Jane; Bougueleret, Lydie; Xenarios, Ioannis; Le Mercier, Philippe.

PLoS One ; 12(2): e0171746, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-28207819

RESUMEN

Viruses are genetically diverse, infect a wide range of tissues and host cells and follow unique processes for replicating themselves. All these processes were investigated and indexed in ViralZone knowledge base. To facilitate standardizing data, a simple ontology of viral life-cycle terms was developed to provide a common vocabulary for annotating data sets. New terminology was developed to address unique viral replication cycle processes, and existing terminology was modified and adapted. The virus life-cycle is classically described by schematic pictures. Using this ontology, it can be represented by a combination of successive terms: "entry", "latency", "transcription", "replication" and "exit". Each of these parts is broken down into discrete steps. For example Zika virus "entry" is broken down in successive steps: "Attachment", "Apoptotic mimicry", "Viral endocytosis/ macropinocytosis", "Fusion with host endosomal membrane", "Viral factory". To demonstrate the utility of a standard ontology for virus biology, this work was completed by annotating virus data in the ViralZone, UniProtKB and Gene Ontology databases.

Asunto(s)

Células Eucariotas/virología , Terminología como Asunto , Virosis/virología , Fenómenos Fisiológicos de los Virus , Bases de Datos Genéticas , Replicación Viral , Virus/genética , Virus/patogenicidad

13.

HAMAP in 2015: updates to the protein family classification and annotation system.

Pedruzzi, Ivo; Rivoire, Catherine; Auchincloss, Andrea H; Coudert, Elisabeth; Keller, Guillaume; de Castro, Edouard; Baratin, Delphine; Cuche, Béatrice A; Bougueleret, Lydie; Poux, Sylvain; Redaschi, Nicole; Xenarios, Ioannis; Bridge, Alan.

Nucleic Acids Res ; 43(Database issue): D1064-70, 2015 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-25348399

RESUMEN

HAMAP (High-quality Automated and Manual Annotation of Proteins--available at http://hamap.expasy.org/) is a system for the automatic classification and annotation of protein sequences. HAMAP provides annotation of the same quality and detail as UniProtKB/Swiss-Prot, using manually curated profiles for protein sequence family classification and expert curated rules for functional annotation of family members. HAMAP data and tools are made available through our website and as part of the UniRule pipeline of UniProt, providing annotation for millions of unreviewed sequences of UniProtKB/TrEMBL. Here we report on the growth of HAMAP and updates to the HAMAP system since our last report in the NAR Database Issue of 2013. We continue to augment HAMAP with new family profiles and annotation rules as new protein families are characterized and annotated in UniProtKB/Swiss-Prot; the latest version of HAMAP (as of 3 September 2014) contains 1983 family classification profiles and 1998 annotation rules (up from 1780 and 1720). We demonstrate how the complex logic of HAMAP rules allows for precise annotation of individual functional variants within large homologous protein families. We also describe improvements to our web-based tool HAMAP-Scan which simplify the classification and annotation of sequences, and the incorporation of an improved sequence-profile search algorithm.

Asunto(s)

Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Homología de Secuencia de Aminoácido , Humanos , Internet , Proteínas/clasificación

14.

An integrated ontology resource to explore and study host-virus relationships.

Masson, Patrick; Hulo, Chantal; de Castro, Edouard; Foulger, Rebecca; Poux, Sylvain; Bridge, Alan; Lomax, Jane; Bougueleret, Lydie; Xenarios, Ioannis; Le Mercier, Philippe.

PLoS One ; 9(9): e108075, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-25233094

RESUMEN

Our growing knowledge of viruses reveals how these pathogens manage to evade innate host defenses. A global scheme emerges in which many viruses usurp key cellular defense mechanisms and often inhibit the same components of antiviral signaling. To accurately describe these processes, we have generated a comprehensive dictionary for eukaryotic host-virus interactions. This controlled vocabulary has been detailed in 57 ViralZone resource web pages which contain a global description of all molecular processes. In order to annotate viral gene products with this vocabulary, an ontology has been built in a hierarchy of UniProt Knowledgebase (UniProtKB) keyword terms and corresponding Gene Ontology (GO) terms have been developed in parallel. The results are 65 UniProtKB keywords related to 57 GO terms, which have been used in 14,390 manual annotations; 908,723 automatic annotations and propagated to an estimation of 922,941 GO annotations. ViralZone pages, UniProtKB keywords and GO terms provide complementary tools to users, and the three resources have been linked to each other through host-virus vocabulary.

Asunto(s)

Ontología de Genes , Interacciones Huésped-Patógeno/genética , Inmunidad Adaptativa/genética , Animales , Bases de Datos de Ácidos Nucleicos , Regulación de la Expresión Génica/inmunología , Humanos , Inmunidad Innata , Interferones/genética , Virosis/genética , Virosis/inmunología , Virosis/virología

15.

Genome and transcriptome sequencing identifies breeding targets in the orphan crop tef (Eragrostis tef).

Cannarozzi, Gina; Plaza-Wüthrich, Sonia; Esfeld, Korinna; Larti, Stéphanie; Wilson, Yi Song; Girma, Dejene; de Castro, Edouard; Chanyalew, Solomon; Blösch, Regula; Farinelli, Laurent; Lyons, Eric; Schneider, Michel; Falquet, Laurent; Kuhlemeier, Cris; Assefa, Kebebew; Tadele, Zerihun.

BMC Genomics ; 15: 581, 2014 Jul 09.

Artículo en Inglés | MEDLINE | ID: mdl-25007843

RESUMEN

BACKGROUND: Tef (Eragrostis tef), an indigenous cereal critical to food security in the Horn of Africa, is rich in minerals and protein, resistant to many biotic and abiotic stresses and safe for diabetics as well as sufferers of immune reactions to wheat gluten. We present the genome of tef, the first species in the grass subfamily Chloridoideae and the first allotetraploid assembled de novo. We sequenced the tef genome for marker-assisted breeding, to shed light on the molecular mechanisms conferring tef's desirable nutritional and agronomic properties, and to make its genome publicly available as a community resource. RESULTS: The draft genome contains 672 Mbp representing 87% of the genome size estimated from flow cytometry. We also sequenced two transcriptomes, one from a normalized RNA library and another from unnormalized RNASeq data. The normalized RNA library revealed around 38000 transcripts that were then annotated by the SwissProt group. The CoGe comparative genomics platform was used to compare the tef genome to other genomes, notably sorghum. Scaffolds comprising approximately half of the genome size were ordered by syntenic alignment to sorghum producing tef pseudo-chromosomes, which were sorted into A and B genomes as well as compared to the genetic map of tef. The draft genome was used to identify novel SSR markers, investigate target genes for abiotic stress resistance studies, and understand the evolution of the prolamin family of proteins that are responsible for the immune response to gluten. CONCLUSIONS: It is highly plausible that breeding targets previously identified in other cereal crops will also be valuable breeding targets in tef. The draft genome and transcriptome will be of great use for identifying these targets for genetic improvement of this orphan crop that is vital for feeding 50 million people in the Horn of Africa.

Asunto(s)

Eragrostis/genética , Genoma de Planta , Transcriptoma , Mapeo Cromosómico , Eragrostis/clasificación , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Repeticiones de Microsatélite/genética , Anotación de Secuencia Molecular , Monoéster Fosfórico Hidrolasas/clasificación , Monoéster Fosfórico Hidrolasas/genética , Filogenia , Proteínas de Plantas/clasificación , Proteínas de Plantas/genética , Prolaminas/clasificación , Prolaminas/genética , ARN no Traducido/genética , ARN no Traducido/metabolismo , Análisis de Secuencia de ARN

16.

HAMAP in 2013, new developments in the protein family classification and annotation system.

Pedruzzi, Ivo; Rivoire, Catherine; Auchincloss, Andrea H; Coudert, Elisabeth; Keller, Guillaume; de Castro, Edouard; Baratin, Delphine; Cuche, Béatrice A; Bougueleret, Lydie; Poux, Sylvain; Redaschi, Nicole; Xenarios, Ioannis; Bridge, Alan.

Nucleic Acids Res ; 41(Database issue): D584-9, 2013 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-23193261

RESUMEN

HAMAP (High-quality Automated and Manual Annotation of Proteins-available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profiles.

Asunto(s)

Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Proteínas/clasificación , Eucariontes/genética , Internet

17.

ViralZone: recent updates to the virus knowledge resource.

Masson, Patrick; Hulo, Chantal; De Castro, Edouard; Bitter, Hans; Gruenbaum, Lore; Essioux, Laurent; Bougueleret, Lydie; Xenarios, Ioannis; Le Mercier, Philippe.

Nucleic Acids Res ; 41(Database issue): D579-83, 2013 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-23193299

RESUMEN

ViralZone (http://viralzone.expasy.org) is a knowledge repository that allows users to learn about viruses including their virion structure, replication cycle and host-virus interactions. The information is divided into viral fact sheets that describe virion shape, molecular biology and epidemiology for each viral genus, with links to the corresponding annotated proteomes of UniProtKB. Each viral genus page contains detailed illustrations, text and PubMed references. This new update provides a linked view of viral molecular biology through 133 new viral ontology pages that describe common steps of viral replication cycles shared by several viral genera. This viral cell-cycle ontology is also represented in UniProtKB in the form of annotated keywords. In this way, users can navigate from the description of a replication-cycle event, to the viral genus concerned, and the associated UniProtKB protein records.

Asunto(s)

Bases de Datos Genéticas , Fenómenos Fisiológicos de los Virus , Genoma Viral , Virus de la Hepatitis B/fisiología , Interacciones Huésped-Patógeno , Internet , Proteínas Virales/genética , Internalización del Virus , Replicación Viral , Vocabulario Controlado

18.

New and continuing developments at PROSITE.

Sigrist, Christian J A; de Castro, Edouard; Cerutti, Lorenzo; Cuche, Béatrice A; Hulo, Nicolas; Bridge, Alan; Bougueleret, Lydie; Xenarios, Ioannis.

Nucleic Acids Res ; 41(Database issue): D344-7, 2013 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-23161676

RESUMEN

PROSITE (http://prosite.expasy.org/) consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule a collection of rules, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE signatures, together with ProRule, are used for the annotation of domains and features of UniProtKB/Swiss-Prot entries. Here, we describe recent developments that allow users to perform whole-proteome annotation as well as a number of filtering options that can be combined to perform powerful targeted searches for biological discovery. The latest version of PROSITE (release 20.85, of 30 August 2012) contains 1308 patterns, 1039 profiles and 1041 ProRules.

Asunto(s)

Secuencias de Aminoácidos , Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Análisis de Secuencia de Proteína , Secuencia de Aminoácidos , Secuencia Conservada , Internet , Anotación de Secuencia Molecular , Proteínas/química , Proteínas/clasificación , Proteoma/química

19.

ExPASy: SIB bioinformatics resource portal.

Artimo, Panu; Jonnalagedda, Manohar; Arnold, Konstantin; Baratin, Delphine; Csardi, Gabor; de Castro, Edouard; Duvaud, Séverine; Flegel, Volker; Fortier, Arnaud; Gasteiger, Elisabeth; Grosdidier, Aurélien; Hernandez, Céline; Ioannidis, Vassilios; Kuznetsov, Dmitry; Liechti, Robin; Moretti, Sébastien; Mostaguir, Khaled; Redaschi, Nicole; Rossier, Grégoire; Xenarios, Ioannis; Stockinger, Heinz.

Nucleic Acids Res ; 40(Web Server issue): W597-603, 2012 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-22661580

RESUMEN

ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a 'decentralized' way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across 'selected' resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPASy.

Asunto(s)

Biología Computacional , Proteómica , Programas Informáticos , Gráficos por Computador , Genómica , Internet , Integración de Sistemas , Interfaz Usuario-Computador

20.

InterPro in 2011: new developments in the family and domain prediction database.

Hunter, Sarah; Jones, Philip; Mitchell, Alex; Apweiler, Rolf; Attwood, Teresa K; Bateman, Alex; Bernard, Thomas; Binns, David; Bork, Peer; Burge, Sarah; de Castro, Edouard; Coggill, Penny; Corbett, Matthew; Das, Ujjwal; Daugherty, Louise; Duquenne, Lauranne; Finn, Robert D; Fraser, Matthew; Gough, Julian; Haft, Daniel; Hulo, Nicolas; Kahn, Daniel; Kelly, Elizabeth; Letunic, Ivica; Lonsdale, David; Lopez, Rodrigo; Madera, Martin; Maslen, John; McAnulla, Craig; McDowall, Jennifer; McMenamin, Conor; Mi, Huaiyu; Mutowo-Muellenet, Prudence; Mulder, Nicola; Natale, Darren; Orengo, Christine; Pesseat, Sebastien; Punta, Marco; Quinn, Antony F; Rivoire, Catherine; Sangrador-Vegas, Amaia; Selengut, Jeremy D; Sigrist, Christian J A; Scheremetjew, Maxim; Tate, John; Thimmajanarthanan, Manjulapramila; Thomas, Paul D; Wu, Cathy H; Yeats, Corin; Yong, Siew-Yit.

Nucleic Acids Res ; 40(Database issue): D306-12, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-22096229

RESUMEN

InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.

Asunto(s)

Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Proteínas/clasificación , Proteínas/fisiología , Análisis de Secuencia de Proteína , Programas Informáticos , Terminología como Asunto , Interfaz Usuario-Computador

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA