Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
Database (Oxford) ; 20222022 10 05.
Artículo en Inglés | MEDLINE | ID: mdl-36197453

RESUMEN

The coronavirus disease 2019 (COVID-19) pandemic has compelled biomedical researchers to communicate data in real time to establish more effective medical treatments and public health policies. Nontraditional sources such as preprint publications, i.e. articles not yet validated by peer review, have become crucial hubs for the dissemination of scientific results. Natural language processing (NLP) systems have been recently developed to extract and organize COVID-19 data in reasoning systems. Given this scenario, the BioCreative COVID-19 text mining tool interactive demonstration track was created to assess the landscape of the available tools and to gauge user interest, thereby providing a two-way communication channel between NLP system developers and potential end users. The goal was to inform system designers about the performance and usability of their products and to suggest new additional features. Considering the exploratory nature of this track, the call for participation solicited teams to apply for the track, based on their system's ability to perform COVID-19-related tasks and interest in receiving user feedback. We also recruited volunteer users to test systems. Seven teams registered systems for the track, and >30 individuals volunteered as test users; these volunteer users covered a broad range of specialties, including bench scientists, bioinformaticians and biocurators. The users, who had the option to participate anonymously, were provided with written and video documentation to familiarize themselves with the NLP tools and completed a survey to record their evaluation. Additional feedback was also provided by NLP system developers. The track was well received as shown by the overall positive feedback from the participating teams and the users. Database URL: https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-4/.


Asunto(s)
COVID-19 , COVID-19/epidemiología , Minería de Datos/métodos , Bases de Datos Factuales , Documentación , Humanos , Procesamiento de Lenguaje Natural
2.
Protein Sci ; 30(1): 187-200, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33070389

RESUMEN

The BioGRID (Biological General Repository for Interaction Datasets, thebiogrid.org) is an open-access database resource that houses manually curated protein and genetic interactions from multiple species including yeast, worm, fly, mouse, and human. The ~1.93 million curated interactions in BioGRID can be used to build complex networks to facilitate biomedical discoveries, particularly as related to human health and disease. All BioGRID content is curated from primary experimental evidence in the biomedical literature, and includes both focused low-throughput studies and large high-throughput datasets. BioGRID also captures protein post-translational modifications and protein or gene interactions with bioactive small molecules including many known drugs. A built-in network visualization tool combines all annotations and allows users to generate network graphs of protein, genetic and chemical interactions. In addition to general curation across species, BioGRID undertakes themed curation projects in specific aspects of cellular regulation, for example the ubiquitin-proteasome system, as well as specific disease areas, such as for the SARS-CoV-2 virus that causes COVID-19 severe acute respiratory syndrome. A recent extension of BioGRID, named the Open Repository of CRISPR Screens (ORCS, orcs.thebiogrid.org), captures single mutant phenotypes and genetic interactions from published high throughput genome-wide CRISPR/Cas9-based genetic screens. BioGRID-ORCS contains datasets for over 1,042 CRISPR screens carried out to date in human, mouse and fly cell lines. The biomedical research community can freely access all BioGRID data through the web interface, standardized file downloads, or via model organism databases and partner meta-databases.


Asunto(s)
COVID-19/genética , Bases de Datos Factuales , Mapeo de Interacción de Proteínas , Proteínas/genética , Animales , COVID-19/virología , Humanos , Ratones , SARS-CoV-2/genética , SARS-CoV-2/patogenicidad , Interfaz Usuario-Computador
3.
Database (Oxford) ; 20192019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-30689846

RESUMEN

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein-protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.


Asunto(s)
Minería de Datos/métodos , Bases de Datos de Proteínas , Mutación , Medicina de Precisión/métodos , Mapas de Interacción de Proteínas , Programas Informáticos , Biología Computacional/métodos , Humanos , Mutación/genética , Mutación/fisiología , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas/genética , Mapas de Interacción de Proteínas/fisiología
4.
Nucleic Acids Res ; 47(D1): D529-D541, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30476227

RESUMEN

The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the curation and archival storage of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2018 (build 3.4.164), BioGRID contains records for 1 598 688 biological interactions manually annotated from 55 809 publications for 71 species, as classified by an updated set of controlled vocabularies for experimental detection methods. BioGRID also houses records for >700 000 post-translational modification sites. BioGRID now captures chemical interaction data, including chemical-protein interactions for human drug targets drawn from the DrugBank database and manually curated bioactive compounds reported in the literature. A new dedicated aspect of BioGRID annotates genome-wide CRISPR/Cas9-based screens that report gene-phenotype and gene-gene relationships. An extension of the BioGRID resource called the Open Repository for CRISPR Screens (ORCS) database (https://orcs.thebiogrid.org) currently contains over 500 genome-wide screens carried out in human or mouse cell lines. All data in BioGRID is made freely available without restriction, is directly downloadable in standard formats and can be readily incorporated into existing applications via our web service platforms. BioGRID data are also freely distributed through partner model organism databases and meta-databases.


Asunto(s)
Bases de Datos Factuales , Animales , Sistemas CRISPR-Cas , Curaduría de Datos , Descubrimiento de Drogas , Genes , Humanos , Ratones , Mapeo de Interacción de Proteínas
5.
Mol Cell ; 70(4): 568-571, 2018 05 17.
Artículo en Inglés | MEDLINE | ID: mdl-29775575

RESUMEN

The ubiquitin-proteasome system controls the stability of myriad protein substrates via short sequence motifs called degrons. Studies by Koren et al. (2018) and Lin et al. (2018) have uncovered a broad new class of degrons located at the extreme C terminus of substrates.


Asunto(s)
Complejo de la Endopetidasa Proteasomal , Ubiquitina , Proteínas , Ubiquitina-Proteína Ligasas
6.
Mol Cell Biol ; 38(1)2018 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-29038160

RESUMEN

To interrogate genes essential for cell growth, proliferation and survival in human cells, we carried out a genome-wide clustered regularly interspaced short palindromic repeat (CRISPR)/Cas9 screen in a B-cell lymphoma line using a custom extended-knockout (EKO) library of 278,754 single-guide RNAs (sgRNAs) that targeted 19,084 RefSeq genes, 20,852 alternatively spliced exons, and 3,872 hypothetical genes. A new statistical analysis tool called robust analytics and normalization for knockout screens (RANKS) identified 2,280 essential genes, 234 of which were unique. Individual essential genes were validated experimentally and linked to ribosome biogenesis and stress responses. Essential genes exhibited a bimodal distribution across 10 different cell lines, consistent with a continuous variation in essentiality as a function of cell type. Genes essential in more lines had more severe fitness defects and encoded the evolutionarily conserved structural cores of protein complexes, whereas genes essential in fewer lines formed context-specific modules and encoded subunits at the periphery of essential complexes. The essentiality of individual protein residues across the proteome correlated with evolutionary conservation, structural burial, modular domains, and protein interaction interfaces. Many alternatively spliced exons in essential genes were dispensable and were enriched for disordered regions. Fitness defects were observed for 44 newly evolved hypothetical reading frames. These results illuminate the contextual nature and evolution of essential gene functions in human cells.


Asunto(s)
Sistemas CRISPR-Cas , Genes Esenciales/genética , Estudio de Asociación del Genoma Completo/métodos , Proteoma/genética , Proteoma/metabolismo , Línea Celular Tumoral , Supervivencia Celular/genética , Biblioteca de Genes , Humanos , Proteómica/métodos
7.
Artículo en Inglés | MEDLINE | ID: mdl-28077563

RESUMEN

A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein-protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future uses of the BioC-BioGRID corpus are detailed in this report.Database URL: http://bioc.sourceforge.net/BioC-BioGRID.html.


Asunto(s)
Curaduría de Datos/métodos , Minería de Datos/métodos , Bases de Datos Genéticas , Proteínas/genética , Proteínas/metabolismo
8.
Nucleic Acids Res ; 45(D1): D369-D379, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27980099

RESUMEN

The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical-protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases.


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Proteínas , Animales , Biología Computacional/métodos , Curaduría de Datos , Minería de Datos , Humanos , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Procesamiento Proteico-Postraduccional , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Programas Informáticos
9.
Artículo en Inglés | MEDLINE | ID: mdl-27589961

RESUMEN

Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested.Database URL: http://www.biocreative.org.


Asunto(s)
Curaduría de Datos/métodos , Minería de Datos/métodos , Procesamiento Automatizado de Datos/métodos
10.
Artículo en Inglés | MEDLINE | ID: mdl-27589962

RESUMEN

BioC is a simple XML format for text, annotations and relations, and was developed to achieve interoperability for biomedical text processing. Following the success of BioC in BioCreative IV, the BioCreative V BioC track addressed a collaborative task to build an assistant system for BioGRID curation. In this paper, we describe the framework of the collaborative BioC task and discuss our findings based on the user survey. This track consisted of eight subtasks including gene/protein/organism named entity recognition, protein-protein/genetic interaction passage identification and annotation visualization. Using BioC as their data-sharing and communication medium, nine teams, world-wide, participated and contributed either new methods or improvements of existing tools to address different subtasks of the BioC track. Results from different teams were shared in BioC and made available to other teams as they addressed different subtasks of the track. In the end, all submitted runs were merged using a machine learning classifier to produce an optimized output. The biocurator assistant system was evaluated by four BioGRID curators in terms of practical usability. The curators' feedback was overall positive and highlighted the user-friendly design and the convenient gene/protein curation tool based on text mining.Database URL: http://www.biocreative.org/tasks/biocreative-v/track-1-bioc/.


Asunto(s)
Curaduría de Datos/métodos , Minería de Datos/métodos , Procesamiento Automatizado de Datos/métodos , Difusión de la Información/métodos
11.
Cold Spring Harb Protoc ; 2016(1): pdb.prot088880, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26729909

RESUMEN

The BioGRID database is an extensive repository of curated genetic and protein interactions for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe, and the yeast Candida albicans SC5314, as well as for several other model organisms and humans. This protocol describes how to use the BioGRID website to query genetic or protein interactions for any gene of interest, how to visualize the associated interactions using an embedded interactive network viewer, and how to download data files for either selected interactions or the entire BioGRID interaction data set.


Asunto(s)
Bases de Datos Genéticas , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Redes Reguladoras de Genes , Animales , Internet , Mapeo de Interacción de Proteínas , Levaduras/metabolismo
12.
Cold Spring Harb Protoc ; 2016(1): pdb.top080754, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26729913

RESUMEN

The Biological General Repository for Interaction Datasets (BioGRID) is a freely available public database that provides the biological and biomedical research communities with curated protein and genetic interaction data. Structured experimental evidence codes, an intuitive search interface, and visualization tools enable the discovery of individual gene, protein, or biological network function. BioGRID houses interaction data for the major model organism species--including yeast, nematode, fly, zebrafish, mouse, and human--with particular emphasis on the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe as pioneer eukaryotic models for network biology. BioGRID has achieved comprehensive curation coverage of the entire literature for these two major yeast models, which is actively maintained through monthly curation updates. As of September 2015, BioGRID houses approximately 335,400 biological interactions for budding yeast and approximately 67,800 interactions for fission yeast. BioGRID also supports an integrated posttranslational modification (PTM) viewer that incorporates more than 20,100 yeast phosphorylation sites curated through its sister database, the PhosphoGRID.


Asunto(s)
Bases de Datos Genéticas/estadística & datos numéricos , Redes Reguladoras de Genes , Mapeo de Interacción de Proteínas , Animales , Humanos , Saccharomyces cerevisiae , Proteínas de Saccharomyces cerevisiae , Levaduras/genética , Levaduras/metabolismo
13.
Nucleic Acids Res ; 43(Database issue): D470-8, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25428363

RESUMEN

The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein interactions curated from the primary biomedical literature for all major model organism species and humans. As of September 2014, the BioGRID contains 749,912 interactions as drawn from 43,149 publications that represent 30 model organisms. This interaction count represents a 50% increase compared to our previous 2013 BioGRID update. BioGRID data are freely distributed through partner model organism databases and meta-databases and are directly downloadable in a variety of formats. In addition to general curation of the published literature for the major model species, BioGRID undertakes themed curation projects in areas of particular relevance for biomedical sciences, such as the ubiquitin-proteasome system and various human disease-associated interaction networks. BioGRID curation is coordinated through an Interaction Management System (IMS) that facilitates the compilation interaction records through structured evidence codes, phenotype ontologies, and gene annotation. The BioGRID architecture has been improved in order to support a broader range of interaction and post-translational modification types, to allow the representation of more complex multi-gene/protein interactions, to account for cellular phenotypes through structured ontologies, to expedite curation through semi-automated text-mining approaches, and to enhance curation quality control.


Asunto(s)
Bases de Datos Genéticas , Redes Reguladoras de Genes , Mapeo de Interacción de Proteínas , Ácido Araquidónico/metabolismo , Enfermedad/genética , Humanos , Internet
14.
Artículo en Inglés | MEDLINE | ID: mdl-25052701

RESUMEN

The time-consuming nature of manual curation and the rapid growth of biomedical literature severely limit the number of articles that database curators can scrutinize and annotate. Hence, semi-automatic tools can be a valid support to increase annotation throughput. Although a handful of curation assistant tools are already available, to date, little has been done to formally evaluate their benefit to biocuration. Moreover, most curation tools are designed for specific problems. Thus, it is not easy to apply an annotation tool for multiple tasks. BioQRator is a publicly available web-based tool for annotating biomedical literature. It was designed to support general tasks, i.e. any task annotating entities and relationships. In the BioCreative IV edition, BioQRator was tailored for protein- protein interaction (PPI) annotation by migrating information from PIE the search. The results obtained from six curators showed that the precision on the top 10 documents doubled with PIE the search compared with PubMed search results. It was also observed that the annotation time for a full PPI annotation task decreased for a beginner-intermediate level annotator. This finding is encouraging because text-mining techniques were not directly involved in the full annotation task and BioQRator can be easily integrated with any text-mining resources. Database URL: http://www.bioqrator.org/.


Asunto(s)
Curaduría de Datos/métodos , Minería de Datos/métodos , Internet , Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Humanos
16.
Database (Oxford) ; 2013: bas056, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23327936

RESUMEN

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.


Asunto(s)
Minería de Datos , Educación , Bases de Datos como Asunto , Documentación , Humanos , Programas Informáticos , Factores de Tiempo
17.
Nucleic Acids Res ; 41(Database issue): D816-23, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23203989

RESUMEN

The Biological General Repository for Interaction Datasets (BioGRID: http//thebiogrid.org) is an open access archive of genetic and protein interactions that are curated from the primary biomedical literature for all major model organism species. As of September 2012, BioGRID houses more than 500 000 manually annotated interactions from more than 30 model organisms. BioGRID maintains complete curation coverage of the literature for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe and the model plant Arabidopsis thaliana. A number of themed curation projects in areas of biomedical importance are also supported. BioGRID has established collaborations and/or shares data records for the annotation of interactions and phenotypes with most major model organism databases, including Saccharomyces Genome Database, PomBase, WormBase, FlyBase and The Arabidopsis Information Resource. BioGRID also actively engages with the text-mining community to benchmark and deploy automated tools to expedite curation workflows. BioGRID data are freely accessible through both a user-defined interactive interface and in batch downloads in a wide variety of formats, including PSI-MI2.5 and tab-delimited files. BioGRID records can also be interrogated and analyzed with a series of new bioinformatics tools, which include a post-translational modification viewer, a graphical viewer, a REST service and a Cytoscape plugin.


Asunto(s)
Bases de Datos Genéticas , Redes Reguladoras de Genes , Mapeo de Interacción de Proteínas , Arabidopsis/genética , Arabidopsis/metabolismo , Humanos , Internet , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Schizosaccharomyces/genética , Schizosaccharomyces/metabolismo , Interfaz Usuario-Computador
18.
Database (Oxford) ; 2012: bas020, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22513129

RESUMEN

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.


Asunto(s)
Investigación Biomédica , Minería de Datos , Procesamiento de Lenguaje Natural , Flujo de Trabajo , Animales , Bases de Datos Factuales , Humanos
19.
Database (Oxford) ; 2012: bas017, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22438567

RESUMEN

There is an increasing interest in developing ontologies and controlled vocabularies to improve the efficiency and consistency of manual literature curation, to enable more formal biocuration workflow results and ultimately to improve analysis of biological data. Two ontologies that have been successfully used for this purpose are the Gene Ontology (GO) for annotating aspects of gene products and the Molecular Interaction ontology (PSI-MI) used by databases that archive protein-protein interactions. The examination of protein interactions has proven to be extremely promising for the understanding of cellular processes. Manual mapping of information from the biomedical literature to bio-ontology terms is one of the most challenging components in the curation pipeline. It requires that expert curators interpret the natural language descriptions contained in articles and infer their semantic equivalents in the ontology (controlled vocabulary). Since manual curation is a time-consuming process, there is strong motivation to implement text-mining techniques to automatically extract annotations from free text. A range of text mining strategies has been devised to assist in the automated extraction of biological data. These strategies either recognize technical terms used recurrently in the literature and propose them as candidates for inclusion in ontologies, or retrieve passages that serve as evidential support for annotating an ontology term, e.g. from the PSI-MI or GO controlled vocabularies. Here, we provide a general overview of current text-mining methods to automatically extract annotations of GO and PSI-MI ontology terms in the context of the BioCreative (Critical Assessment of Information Extraction Systems in Biology) challenge. Special emphasis is given to protein-protein interaction data and PSI-MI terms referring to interaction detection methods.


Asunto(s)
Minería de Datos/métodos , Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Mapeo de Interacción de Proteínas , Proteómica/métodos , Procesamiento de Lenguaje Natural , Vocabulario Controlado
20.
Nat Methods ; 9(4): 345-50, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22453911

RESUMEN

The International Molecular Exchange (IMEx) consortium is an international collaboration between major public interaction data providers to share literature-curation efforts and make a nonredundant set of protein interactions available in a single search interface on a common website (http://www.imexconsortium.org/). Common curation rules have been developed, and a central registry is used to manage the selection of articles to enter into the dataset. We discuss the advantages of such a service to the user, our quality-control measures and our data-distribution practices.


Asunto(s)
Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas , Proteínas/metabolismo , Publicaciones Periódicas como Asunto , Unión Proteica , Proteínas/química , Control de Calidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...