Pesquisa | BVS IEC

CheNER: chemical named entity recognizer.

Usié, Anabel; Alves, Rui; Solsona, Francesc; Vázquez, Miguel; Valencia, Alfonso.

Bioinformatics ; 30(7): 1039-40, 2014 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-24227678

RESUMO

MOTIVATION: Chemical named entity recognition is used to automatically identify mentions to chemical compounds in text and is the basis for more elaborate information extraction. However, only a small number of applications are freely available to identify such mentions. Particularly challenging and useful is the identification of International Union of Pure and Applied Chemistry (IUPAC) chemical compounds, which due to the complex morphology of IUPAC names requires more advanced techniques than that of brand names. RESULTS: We present CheNER, a tool for automated identification of systematic IUPAC chemical mentions. We evaluated different systems using an established literature corpus to show that CheNER has a superior performance in identifying IUPAC names specifically, and that it makes better use of computational resources. AVAILABILITY AND IMPLEMENTATION: http://metres.udl.cat/index.php/9-download/4-chener, http://chener.bioinfo.cnio.es/

Assuntos

Bases de Dados de Compostos Químicos , Software , Armazenamento e Recuperação da Informação

Database constraints applied to metabolic pathway reconstruction tools.

Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi.

ScientificWorldJournal ; 2014: 967294, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25202745

RESUMO

Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Redes e Vias Metabólicas , Software , Modelos Biológicos

Hybrid assembly and comparative genomics unveil insights into the evolution and biology of the red-legged partridge.

Eleiwa, Abderrahmane; Nadal, Jesus; Vilaprinyo, Ester; Marin-Sanguino, Alberto; Sorribas, Albert; Basallo, Oriol; Lucido, Abel; Richart, Cristobal; Pena, Ramona N; Ros-Freixedes, Roger; Usie, Anabel; Alves, Rui.

Sci Rep ; 14(1): 19531, 2024 08 22.

Artigo em Inglês | MEDLINE | ID: mdl-39174643

RESUMO

The red-legged partridge Alectoris rufa plays a crucial role in the ecosystem of southwestern Europe, and understanding its genetics is vital for conservation and management. Here we sequence, assemble, and annotate a highly contiguous and nearly complete version of its genome. This assembly encompasses 96.9% of the avian genes flagged as essential in the BUSCO aves_odb10 dataset. Moreover, we pinpointed RNA and protein-coding genes, 95% of which had functional annotations. Notably, we observed significant chromosome rearrangements in comparison to quail (Coturnix japonica) and chicken (Gallus gallus). In addition, a comparative phylogenetic analysis of these genomes suggests that A. rufa and C. japonica diverged roughly 20 million years ago and that their common ancestor diverged from G. gallus 35 million years ago. Our assembly represents a significant advancement towards a complete reference genome for A. rufa, facilitating comparative avian genomics, and providing a valuable resource for future research and conservation efforts for the red-legged partridge.

Assuntos

Galliformes , Genômica , Filogenia , Animais , Galliformes/genética , Galliformes/classificação , Genômica/métodos , Evolução Molecular , Genoma , Anotação de Sequência Molecular , Galinhas/genética

Biblio-MetReS: a bibliometric network reconstruction application and server.

Usié, Anabel; Karathia, Hiren; Teixidó, Ivan; Valls, Joan; Faus, Xavier; Alves, Rui; Solsona, Francesc.

BMC Bioinformatics ; 12: 387, 2011 Oct 05.

Artigo em Inglês | MEDLINE | ID: mdl-21975133

RESUMO

BACKGROUND: Reconstruction of genes and/or protein networks from automated analysis of the literature is one of the current targets of text mining in biomedical research. Some user-friendly tools already perform this analysis on precompiled databases of abstracts of scientific papers. Other tools allow expert users to elaborate and analyze the full content of a corpus of scientific documents. However, to our knowledge, no user friendly tool that simultaneously analyzes the latest set of scientific documents available on line and reconstructs the set of genes referenced in those documents is available. RESULTS: This article presents such a tool, Biblio-MetReS, and compares its functioning and results to those of other user-friendly applications (iHOP, STRING) that are widely used. Under similar conditions, Biblio-MetReS creates networks that are comparable to those of other user friendly tools. Furthermore, analysis of full text documents provides more complete reconstructions than those that result from using only the abstract of the document. CONCLUSIONS: Literature-based automated network reconstruction is still far from providing complete reconstructions of molecular networks. However, its value as an auxiliary tool is high and it will increase as standards for reporting biological entities and relationships become more widely accepted and enforced. Biblio-MetReS is an application that can be downloaded from http://metres.udl.cat/. It provides an easy to use environment for researchers to reconstruct their networks of interest from an always up to date set of scientific documents.

Assuntos

Bibliometria , Mineração de Dados , Redes Reguladoras de Genes , Software , Animais , Internet , Publicações , Interface Usuário-Computador

Draft Genome Sequence of a Rare Pigmented Mycobacterium avium subsp. paratuberculosis Type C Strain.

Barbosa, Pedro; Leão, Célia; Usié, Anabel; Amaro, Ana; Botelho, Ana; Pinto, Carlos; Inácio, João; Stevenson, Karen; Ramos, António Marcos.

Genome Announc ; 5(41)2017 Oct 12.

Artigo em Inglês | MEDLINE | ID: mdl-29025941

RESUMO

Mycobacterium avium subsp. paratuberculosis is the causative agent of paratuberculosis. We report here the draft genome sequence of a rare pigmented M. avium subsp. paratuberculosis type C strain, comprising 58 contigs and having a genome size of 4,851,414 bp. The genome will assist in the execution of pigmentation and virulence studies on this mycobacterium.

CheNER: a tool for the identification of chemical entities and their classes in biomedical literature.

Usié, Anabel; Cruz, Joaquim; Comas, Jorge; Solsona, Francesc; Alves, Rui.

J Cheminform ; 7(Suppl 1 Text mining for chemistry and the CHEMDNER track): S15, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25810772

RESUMO

BACKGROUND: Small chemical molecules regulate biological processes at the molecular level. Those molecules are often involved in causing or treating pathological states. Automatically identifying such molecules in biomedical text is difficult due to both, the diverse morphology of chemical names and the alternative types of nomenclature that are simultaneously used to describe them. To address these issues, the last BioCreAtIvE challenge proposed a CHEMDNER task, which is a Named Entity Recognition (NER) challenge that aims at labelling different types of chemical names in biomedical text. METHODS: To address this challenge we tested various approaches to recognizing chemical entities in biomedical documents. These approaches range from linear Conditional Random Fields (CRFs) to a combination of CRFs with regular expression and dictionary matching, followed by a post-processing step to tag those chemical names in a corpus of Medline abstracts. We named our best performing systems CheNER. RESULTS: We evaluate the performance of the various approaches using the F-score statistics. Higher F-scores indicate better performance. The highest F-score we obtain in identifying unique chemical entities is 72.88%. The highest F-score we obtain in identifying all chemical entities is 73.07%. We also evaluate the F-Score of combining our system with ChemSpot, and find an increase from 72.88% to 73.83%. CONCLUSIONS: CheNER presents a valid alternative for automated annotation of chemical entities in biomedical documents. In addition, CheNER may be used to derive new features to train newer methods for tagging chemical entities. CheNER can be downloaded from http://metres.udl.cat and included in text annotation pipelines.

The CHEMDNER corpus of chemicals and drugs and its annotation principles.

Krallinger, Martin; Rabal, Obdulia; Leitner, Florian; Vazquez, Miguel; Salgado, David; Lu, Zhiyong; Leaman, Robert; Lu, Yanan; Ji, Donghong; Lowe, Daniel M; Sayle, Roger A; Batista-Navarro, Riza Theresa; Rak, Rafal; Huber, Torsten; Rocktäschel, Tim; Matos, Sérgio; Campos, David; Tang, Buzhou; Xu, Hua; Munkhdalai, Tsendsuren; Ryu, Keun Ho; Ramanan, S V; Nathan, Senthil; Zitnik, Slavko; Bajec, Marko; Weber, Lutz; Irmer, Matthias; Akhondi, Saber A; Kors, Jan A; Xu, Shuo; An, Xin; Sikdar, Utpal Kumar; Ekbal, Asif; Yoshioka, Masaharu; Dieb, Thaer M; Choi, Miji; Verspoor, Karin; Khabsa, Madian; Giles, C Lee; Liu, Hongfang; Ravikumar, Komandur Elayavilli; Lamurias, Andre; Couto, Francisco M; Dai, Hong-Jie; Tsai, Richard Tzong-Han; Ata, Caglar; Can, Tolga; Usié, Anabel; Alves, Rui; Segura-Bedmar, Isabel.

J Cheminform ; 7(Suppl 1 Text mining for chemistry and the CHEMDNER track): S2, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25810773

RESUMO

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.

Biblio-MetReS for user-friendly mining of genes and biological processes in scientific documents.

Usie, Anabel; Karathia, Hiren; Teixidó, Ivan; Alves, Rui; Solsona, Francesc.

PeerJ ; 2: e276, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24688854

RESUMO

UNLABELLED: One way to initiate the reconstruction of molecular circuits is by using automated text-mining techniques. Developing more efficient methods for such reconstruction is a topic of active research, and those methods are typically included by bioinformaticians in pipelines used to mine and curate large literature datasets. Nevertheless, experimental biologists have a limited number of available user-friendly tools that use text-mining for network reconstruction and require no programming skills to use. One of these tools is Biblio-MetReS. Originally, this tool permitted an on-the-fly analysis of documents contained in a number of web-based literature databases to identify co-occurrence of proteins/genes. This approach ensured results that were always up-to-date with the latest live version of the databases. However, this 'up-to-dateness' came at the cost of large execution times. Here we report an evolution of the application Biblio-MetReS that permits constructing co-occurrence networks for genes, GO processes, Pathways, or any combination of the three types of entities and graphically represent those entities. We show that the performance of Biblio-MetReS in identifying gene co-occurrence is as least as good as that of other comparable applications (STRING and iHOP). In addition, we also show that the identification of GO processes is on par to that reported in the latest BioCreAtIvE challenge. Finally, we also report the implementation of a new strategy that combines on-the-fly analysis of new documents with preprocessed information from documents that were encountered in previous analyses. This combination simultaneously decreases program run time and maintains 'up-to-dateness' of the results. AVAILABILITY: http://metres.udl.cat/index.php/downloads, CONTACT: metres.cmb@gmail.com.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA