Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Curr Protoc ; 3(6): e804, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37347557

RESUMO

The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have relevance to human physiology and disease. The Rat Genome Database (RGD, https://rgd.mcw.edu) is a model organism database that provides access to a wide variety of curated rat data including disease associations, phenotypes, pathways, molecular functions, biological processes, cellular components, and chemical interactions for genes, quantitative trait loci, and strains. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat and other species. © 2023 Wiley Periodicals LLC. Basic Protocol 1: Navigating the Rat Genome Database (RGD) home page Basic Protocol 2: Using the RGD search functions Basic Protocol 3: Searching for quantitative trait loci Basic Protocol 4: Using the RGD genome browser (JBrowse) to find phenotypic annotations Basic Protocol 5: Using OntoMate to find gene-disease data Basic Protocol 6: Using MOET to find gene-ontology enrichment Basic Protocol 7: Using OLGA to generate gene lists for analysis Basic Protocol 8: Using the GA tool to analyze ontology annotations for genes Basic Protocol 9: Using the RGD InterViewer tool to find protein interaction data Basic Protocol 10: Using the RGD Variant Visualizer tool to find genetic variant data Basic Protocol 11: Using the RGD Disease Portals to find disease, phenotype, and other information Basic Protocol 12: Using the RGD Phenotypes & Models Portal to find qualitative and quantitative phenotype data and other rat strain-related information Basic Protocol 13: Using the RGD Pathway Portal to find disease and phenotype data via molecular pathways.


Assuntos
Genômica , Locos de Características Quantitativas , Humanos , Animais , Ratos , Bases de Dados de Proteínas , Fenótipo , Oligopeptídeos
2.
Genetics ; 224(4)2023 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-37119810

RESUMO

Rare diseases individually affect relatively few people, but as a group they impact considerable numbers of people. The Rat Genome Database (https://rgd.mcw.edu) is a knowledgebase that offers resources for rare disease research. This includes disease definitions, genes, quantitative trail loci (QTLs), genetic variants, annotations to published literature, links to external resources, and more. One important resource is identifying relevant cell lines and rat strains that serve as models for disease research. Diseases, genes, and strains have report pages with consolidated data, and links to analysis tools. Utilizing these globally accessible resources for rare disease research, potentiating discovery of mechanisms and new treatments, can point researchers toward solutions to alleviate the suffering of those afflicted with these diseases.


Assuntos
Genoma , Doenças Raras , Ratos , Animais , Genoma/genética , Doenças Raras/genética , Doenças Raras/terapia , Bases de Dados Genéticas
3.
Genes (Basel) ; 13(12)2022 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-36553571

RESUMO

The COVID-19 pandemic stemmed a parallel upsurge in the scientific literature about SARS-CoV-2 infection and its health burden. The Rat Genome Database (RGD) created a COVID-19 Disease Portal to leverage information from the scientific literature. In the COVID-19 Portal, gene-disease associations are established by manual curation of PubMed literature. The portal contains data for nine ontologies related to COVID-19, an embedded enrichment analysis tool, as well as links to a toolkit. Using these information and tools, we performed analyses on the curated COVID-19 disease genes. As expected, Disease Ontology enrichment analysis showed that the COVID-19 gene set is highly enriched with coronavirus infectious disease and related diseases. However, other less related diseases were also highly enriched, such as liver and rheumatic diseases. Using the comparison heatmap tool, we found nearly 60 percent of the COVID-19 genes were associated with nervous system disease and 40 percent were associated with gastrointestinal disease. Our analysis confirms the role of the immune system in COVID-19 pathogenesis as shown by substantial enrichment of immune system related Gene Ontology terms. The information in RGD's COVID-19 disease portal can generate new hypotheses to potentiate novel therapies and prevention of acute and long-term complications of COVID-19.


Assuntos
COVID-19 , Doenças do Sistema Nervoso , Ratos , Animais , Humanos , COVID-19/genética , Pandemias , SARS-CoV-2/genética , Oligopeptídeos
4.
Genetics ; 220(4)2022 04 04.
Artigo em Inglês | MEDLINE | ID: mdl-35380657

RESUMO

Biological interpretation of a large amount of gene or protein data is complex. Ontology analysis tools are imperative in finding functional similarities through overrepresentation or enrichment of terms associated with the input gene or protein lists. However, most tools are limited by their ability to do ontology-specific and species-limited analyses. Furthermore, some enrichment tools are not updated frequently with recent information from databases, thus giving users inaccurate, outdated or uninformative data. Here, we present MOET or the Multi-Ontology Enrichment Tool (v.1 released in April 2019 and v.2 released in May 2021), an ontology analysis tool leveraging data that the Rat Genome Database (RGD) integrated from in-house expert curation and external databases including the National Center for Biotechnology Information (NCBI), Mouse Genome Informatics (MGI), The Kyoto Encyclopedia of Genes and Genomes (KEGG), The Gene Ontology Resource, UniProt-GOA, and others. Given a gene or protein list, MOET analysis identifies significantly overrepresented ontology terms using a hypergeometric test and provides nominal and Bonferroni corrected P-values and odds ratios for the overrepresented terms. The results are shown as a downloadable list of terms with and without Bonferroni correction, and a graph of the P-values and number of annotated genes for each term in the list. MOET can be accessed freely from https://rgd.mcw.edu/rgdweb/enrichment/start.html.


Assuntos
Bases de Dados Genéticas , Genoma , Animais , Ontologia Genética , Genoma/genética , Internet , Camundongos , Ratos , Software
5.
Artigo em Inglês | MEDLINE | ID: mdl-34584774

RESUMO

Complex diseases such as hypertension, cancer, and diabetes cause nearly 70% of the deaths in the U.S. and involve multiple genes and their interactions with environmental factors. Therefore, identification of genetic factors to understand and decrease the morbidity and mortality from complex diseases is an important and challenging task. With the generation of an unprecedented amount of multi-omics datasets, network-based methods have become popular to represent the multilayered complex molecular interactions. Particularly node embeddings, the low-dimensional representations of nodes in a network are utilized for gene function prediction. Integrated network analysis of multi-omics data alleviates the issues related to missing data and lack of context-specific datasets. Most of the node embedding methods, however, are unable to integrate multiple types of datasets from genes and phenotypes. To address this limitation, we developed a node embedding algorithm called Node Embeddings of Complex networks (NECo) that can utilize multilayered heterogeneous networks of genes and phenotypes. We evaluated the performance of NECo using genotypic and phenotypic datasets from rat (Rattus norvegicus) disease models to classify hypertension disease-related genes. Our method significantly outperformed the state-of-the-art node embedding methods, with AUC of 94.97% compared 85.98% in the second-best performer, and predicted genes not previously implicated in hypertension.

6.
Nucleic Acids Res ; 48(D1): D731-D742, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31713623

RESUMO

Formed in late 1999, the Rat Genome Database (RGD, https://rgd.mcw.edu) will be 20 in 2020, the Year of the Rat. Because the laboratory rat, Rattus norvegicus, has been used as a model for complex human diseases such as cardiovascular disease, diabetes, cancer, neurological disorders and arthritis, among others, for >150 years, RGD has always been disease-focused and committed to providing data and tools for researchers doing comparative genomics and translational studies. At its inception, before the sequencing of the rat genome, RGD started with only a few data types localized on genetic and radiation hybrid (RH) maps and offered only a few tools for querying and consolidating that data. Since that time, RGD has expanded to include a wealth of structured and standardized genetic, genomic, phenotypic, and disease-related data for eight species, and a suite of innovative tools for querying, analyzing and visualizing this data. This article provides an overview of recent substantial additions and improvements to RGD's data and tools that can assist researchers in finding and utilizing the data they need, whether their goal is to develop new precision models of disease or to more fully explore emerging details within a system or across multiple systems.


Assuntos
Mapeamento Cromossômico , Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma , Ratos/genética , Algoritmos , Animais , Chinchila/genética , Modelos Animais de Doenças , Cães/genética , Marcadores Genéticos , Variação Genética , Humanos , Internet , Camundongos/genética , Pan troglodytes/genética , Fenótipo , Mapeamento de Interação de Proteínas , Retina/metabolismo , Sciuridae/genética , Software , Especificidade da Espécie , Suínos/genética , Interface Usuário-Computador
7.
Methods Mol Biol ; 2018: 71-96, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31228152

RESUMO

Resources for rat researchers are extensive, including strain repositories and databases all around the world. The Rat Genome Database (RGD) serves as the primary rat data repository, providing both manual and computationally collected data from other databases.


Assuntos
Bases de Dados Factuais , Genoma , Modelos Animais , Animais , Pesquisa Biomédica , Anotação de Sequência Molecular , Fenótipo , Locos de Características Quantitativas , Ratos
8.
Database (Oxford) ; 20192019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30753478

RESUMO

Rats have been used as research models in biomedical research for over 150 years. These disease models arise from naturally occurring mutations, selective breeding and, more recently, genome manipulation. Through the innovation of genome-editing technologies, genome-modified rats provide precision models of disease by disrupting or complementing targeted genes. To facilitate the use of these data produced from rat disease models, the Rat Genome Database (RGD) organizes rat strains and annotates these strains with disease and qualitative phenotype terms as well as quantitative phenotype measurements. From the curated quantitative data, the expected phenotype profile ranges were established through a meta-analysis pipeline using inbred rat strains in control conditions. The disease and qualitative phenotype annotations are propagated to their associated genes and alleles if applicable. Currently, RGD has curated nearly 1300 rat strains with disease/phenotype annotations and about 11% of them have known allele associations. All of the annotations (disease and phenotype) are integrated and displayed on the strain, gene and allele report pages. Finding disease and phenotype models at RGD can be done by searching for terms in the ontology browser, browsing the disease or phenotype ontology branches or entering keywords in the general search. Use cases are provided to show different targeted searches of rat strains at RGD.


Assuntos
Curadoria de Dados , Mineração de Dados , Bases de Dados Genéticas , Doença/genética , Genoma , Animais , Sistema Enzimático do Citocromo P-450/genética , Modelos Animais de Doenças , Anotação de Sequência Molecular , Fenótipo , Ratos
9.
Methods Mol Biol ; 1757: 163-209, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29761460

RESUMO

The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have relevance to human physiology and disease. The Rat Genome Database (RGD, http://rgd.mcw.edu ) is a model organism database that provides access to a wide variety of curated rat data including disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components for genes, quantitative trait loci, and strains. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat.


Assuntos
Bases de Dados Genéticas , Genoma , Genômica , Animais , Biologia Computacional/métodos , Análise de Dados , Mineração de Dados , Ontologia Genética , Genômica/métodos , Fenótipo , Locos de Características Quantitativas , Ratos , Ferramenta de Busca , Software , Interface Usuário-Computador , Navegador
10.
Dis Model Mech ; 9(10): 1089-1095, 2016 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-27736745

RESUMO

Rattus norvegicus, the laboratory rat, has been a crucial model for studies of the environmental and genetic factors associated with human diseases for over 150 years. It is the primary model organism for toxicology and pharmacology studies, and has features that make it the model of choice in many complex-disease studies. Since 1999, the Rat Genome Database (RGD; http://rgd.mcw.edu) has been the premier resource for genomic, genetic, phenotype and strain data for the laboratory rat. The primary role of RGD is to curate rat data and validate orthologous relationships with human and mouse genes, and make these data available for incorporation into other major databases such as NCBI, Ensembl and UniProt. RGD also provides official nomenclature for rat genes, quantitative trait loci, strains and genetic markers, as well as unique identifiers. The RGD team adds enormous value to these basic data elements through functional and disease annotations, the analysis and visual presentation of pathways, and the integration of phenotype measurement data for strains used as disease models. Because much of the rat research community focuses on understanding human diseases, RGD provides a number of datasets and software tools that allow users to easily explore and make disease-related connections among these datasets. RGD also provides comprehensive human and mouse data for comparative purposes, illustrating the value of the rat in translational research. This article introduces RGD and its suite of tools and datasets to researchers - within and beyond the rat community - who are particularly interested in leveraging rat-based insights to understand human diseases.


Assuntos
Bases de Dados Genéticas , Doença/genética , Genoma , Animais , Mineração de Dados , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Ratos
11.
Comput Struct Biotechnol J ; 14: 35-48, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27602200

RESUMO

Understanding the pathogenesis of disease is instrumental in delineating its progression mechanisms and for envisioning ways to counteract it. In the process, animal models represent invaluable tools for identifying disease-related loci and their genetic components. Amongst them, the laboratory rat is used extensively in the study of many conditions and disorders. The Rat Genome Database (RGD-http://rgd.mcw.edu) has been established to house rat genetic, genomic and phenotypic data. Since its inception, it has continually expanded the depth and breadth of its content. Currently, in addition to rat genes, QTLs and strains, RGD houses mouse and human genes and QTLs and offers pertinent associated data, acquired through manual literature curation and imported via pipelines. A collection of controlled vocabularies and ontologies is employed for the standardized extraction and provision of biological data. The vocabularies/ontologies allow the capture of disease and phenotype associations of rat strains and QTLs, as well as disease and pathway associations of rat, human and mouse genes. A suite of tools enables the retrieval, manipulation, viewing and analysis of data. Genes associated with particular conditions or with altered networks underlying disease pathways can be retrieved. Genetic variants in humans or in sequenced rat strains can be searched and compared. Lists of rat strains and species-specific genes and QTLs can be generated for selected ontology terms and then analyzed, downloaded or sent to other tools. From many entry points, data can be accessed and results retrieved. To illustrate, diabetes is used as a case study to initiate and embark upon an exploratory journey.

12.
Physiol Genomics ; 48(8): 589-600, 2016 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-27287925

RESUMO

Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality.


Assuntos
Doenças Cardiovasculares/genética , Genoma/genética , Animais , Bases de Dados Genéticas , Ontologia Genética , Genômica/métodos , Humanos , Anotação de Sequência Molecular/métodos , Locos de Características Quantitativas/genética , Ratos
13.
Artigo em Inglês | MEDLINE | ID: mdl-27009807

RESUMO

The Rat Genome Database (RGD;http://rgd.mcw.edu/) provides critical datasets and software tools to a diverse community of rat and non-rat researchers worldwide. To meet the needs of the many users whose research is disease oriented, RGD has created a series of Disease Portals and has prioritized its curation efforts on the datasets important to understanding the mechanisms of various diseases. Gene-disease relationships for three species, rat, human and mouse, are annotated to capture biomarkers, genetic associations, molecular mechanisms and therapeutic targets. To generate gene-disease annotations more effectively and in greater detail, RGD initially adopted the MEDIC disease vocabulary from the Comparative Toxicogenomics Database and adapted it for use by expanding this framework with the addition of over 1000 terms to create the RGD Disease Ontology (RDO). The RDO provides the foundation for, at present, 10 comprehensive disease area-related dataset and analysis platforms at RGD, the Disease Portals. Two major disease areas are the focus of data acquisition and curation efforts each year, leading to the release of the related Disease Portals. Collaborative efforts to realize a more robust disease ontology are underway. Database URL:http://rgd.mcw.edu.


Assuntos
Bases de Dados Genéticas , Doença/genética , Ontologia Genética , Genoma , Anotação de Sequência Molecular , Animais , Predisposição Genética para Doença , Humanos , Camundongos , Ratos , Software , Especificidade da Espécie
14.
Artigo em Inglês | MEDLINE | ID: mdl-25619558

RESUMO

The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu.


Assuntos
Mineração de Dados/métodos , Bases de Dados de Ácidos Nucleicos , Ontologia Genética , Genoma , Processamento de Linguagem Natural , Animais , PubMed , Ratos
15.
Artigo em Inglês | MEDLINE | ID: mdl-25632109

RESUMO

Rats have been used extensively as animal models to study physiological and pathological processes involved in human diseases. Numerous rat strains have been selectively bred for certain biological traits related to specific medical interests. Recently, the Rat Genome Database (http://rgd.mcw.edu) has initiated the PhenoMiner project to integrate quantitative phenotype data from the PhysGen Program for Genomic Applications and the National BioResource Project in Japan as well as manual annotations from biomedical literature. PhenoMiner, the search engine for these integrated phenotype data, facilitates mining of data sets across studies by searching the database with a combination of terms from four different ontologies/vocabularies (Rat Strain Ontology, Clinical Measurement Ontology, Measurement Method Ontology and Experimental Condition Ontology). In this study, salt-induced hypertension was used as a model to retrieve blood pressure records of Brown Norway, Fawn-Hooded Hypertensive (FHH) and Dahl salt-sensitive (SS) rat strains. The records from these three strains served as a basis for comparing records from consomic/congenic/mutant offspring derived from them. We examined the cardiovascular and renal phenotypes of consomics derived from FHH and SS, and of SS congenics and mutants. The availability of quantitative records across laboratories in one database, such as these provided by PhenoMiner, can empower researchers to make the best use of publicly available data. Database URL: http://rgd.mcw.edu.


Assuntos
Angiotensina Amida , Ontologias Biológicas , Mineração de Dados/métodos , Bases de Dados Genéticas , Nefropatias , Software , Angiotensina Amida/genética , Angiotensina Amida/metabolismo , Animais , Humanos , Nefropatias/genética , Nefropatias/metabolismo , Ratos
16.
Nucleic Acids Res ; 43(Database issue): D743-50, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25355511

RESUMO

The Rat Genome Database (RGD, http://rgd.mcw.edu) provides the most comprehensive data repository and informatics platform related to the laboratory rat, one of the most important model organisms for disease studies. RGD maintains and updates datasets for genomic elements such as genes, transcripts and increasingly in recent years, sequence variations, as well as map positions for multiple assemblies and sequence information. Functional annotations for genomic elements are curated from published literature, submitted by researchers and integrated from other public resources. Complementing the genomic data catalogs are those associated with phenotypes and disease, including strains, QTL and experimental phenotype measurements across hundreds of strains. Data are submitted by researchers, acquired through bulk data pipelines or curated from published literature. Innovative software tools provide users with an integrated platform to query, mine, display and analyze valuable genomic and phenomic datasets for discovery and enhancement of their own research. This update highlights recent developments that reflect an increasing focus on: (i) genomic variation, (ii) phenotypes and diseases, (iii) data related to the environment and experimental conditions and (iv) datasets and software tools that allow the user to explore and analyze the interactions among these and their impact on disease.


Assuntos
Bases de Dados Genéticas , Variação Genética , Genômica , Fenótipo , Ratos/genética , Animais , Doença/genética , Meio Ambiente , Genoma , Internet , Anotação de Sequência Molecular
17.
Hum Genomics ; 8: 17, 2014 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-25265995

RESUMO

BACKGROUND: Biological systems are exquisitely poised to respond and adjust to challenges, including damage. However, sustained damage can overcome the ability of the system to adjust and result in a disease phenotype, its underpinnings many times elusive. Unraveling the molecular mechanisms of systems biology, of how and why it falters, is essential for delineating the details of the path(s) leading to the diseased state and for designing strategies to revert its progression. An important aspect of this process is not only to define the function of a gene but to identify the context within which gene functions act. It is within the network, or pathway context, that the function of a gene fulfills its ultimate biological role. Resolving the extent to which defective function(s) affect the proceedings of pathway(s) and how altered pathways merge into overpowering the system's defense machinery are key to understanding the molecular aspects of disease and envisioning ways to counteract it. A network-centric approach to diseases is increasingly being considered in current research. It also underlies the deployment of disease pathways at the Rat Genome Database Pathway Portal. The portal is presented with an emphasis on disease and altered pathways, associated drug pathways, pathway suites, and suite networks. RESULTS: The Pathway Portal at the Rat Genome Database (RGD) provides an ever-increasing collection of interactive pathway diagrams and associated annotations for metabolic, signaling, regulatory, and drug pathways, including disease and altered pathways. A disease pathway is viewed from the perspective of networks whose alterations are manifested in the affected phenotype. The Pathway Ontology (PW), built and maintained at RGD, facilitates the annotations of genes, the deployment of pathway diagrams, and provides an overall navigational tool. Pathways that revolve around a common concept and are globally connected are presented within pathway suites; a suite network combines two or more pathway suites. CONCLUSIONS: The Pathway Portal is a rich resource that offers a range of pathway data and visualization, including disease pathways and related pathway suites. Viewing a disease pathway from the perspective of underlying altered pathways is an aid for dissecting the molecular mechanisms of disease.


Assuntos
Bases de Dados Genéticas , Redes Reguladoras de Genes/genética , Genoma , Redes e Vias Metabólicas/genética , Biologia de Sistemas/métodos , Animais , Modelos Animais de Doenças , Feminino , Masculino , Anotação de Sequência Molecular , Fenótipo , Ratos , Transdução de Sinais , Interface Usuário-Computador
18.
Artigo em Inglês | MEDLINE | ID: mdl-25157073

RESUMO

Gene ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. DATABASE URL: http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/.


Assuntos
Biologia Computacional/métodos , Mineração de Dados , Ontologia Genética , Anotação de Sequência Molecular/métodos , Algoritmos , Humanos , Reprodutibilidade dos Testes
19.
Artigo em Inglês | MEDLINE | ID: mdl-25070993

RESUMO

Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparable with humans. One recognized challenge in developing such systems is the lack of marked sentence-level evidence text that provides the basis for making GO annotations. We aim to create a corpus that includes the GO evidence text along with the three core elements of GO annotations: (i) a gene or gene product, (ii) a GO term and (iii) a GO evidence code. To ensure our results are consistent with real-life GO data, we recruited eight professional GO curators and asked them to follow their routine GO annotation protocols. Our annotators marked up more than 5000 text passages in 200 articles for 1356 distinct GO terms. For evidence sentence selection, the inter-annotator agreement (IAA) results are 9.3% (strict) and 42.7% (relaxed) in F1-measures. For GO term selection, the IAAs are 47% (strict) and 62.9% (hierarchical). Our corpus analysis further shows that abstracts contain ∼ 10% of relevant evidence sentences and 30% distinct GO terms, while the Results/Experiment section has nearly 60% relevant sentences and >70% GO terms. Further, of those evidence sentences found in abstracts, less than one-third contain enough experimental detail to fulfill the three core criteria of a GO annotation. This result demonstrates the need of using full-text articles for text mining GO annotations. Through its use at the BioCreative IV GO (BC4GO) task, we expect our corpus to become a valuable resource for the BioNLP research community. Database URL: http://www.biocreative.org/resources/corpora/bc-iv-go-task-corpus/.


Assuntos
Mineração de Dados/métodos , Bases de Dados Genéticas , Anotação de Sequência Molecular , Software , Vocabulário Controlado , Biologia Computacional/métodos , Humanos
20.
J Biomed Semantics ; 5(1): 7, 2014 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-24499703

RESUMO

BACKGROUND: The Pathway Ontology (PW) developed at the Rat Genome Database (RGD), covers all types of biological pathways, including altered and disease pathways and captures the relationships between them within the hierarchical structure of a directed acyclic graph. The ontology allows for the standardized annotation of rat, and of human and mouse genes to pathway terms. It also constitutes a vehicle for easy navigation between gene and ontology report pages, between reports and interactive pathway diagrams, between pathways directly connected within a diagram and between those that are globally related in pathway suites and suite networks. Surveys of the literature and the development of the Pathway and Disease Portals are important sources for the ongoing development of the ontology. User requests and mapping of pathways in other databases to terms in the ontology further contribute to increasing its content. Recently built automated pipelines use the mapped terms to make available the annotations generated by other groups. RESULTS: The two released pipelines - the Pathway Interaction Database (PID) Annotation Import Pipeline and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Annotation Import Pipeline, make available over 7,400 and 31,000 pathway gene annotations, respectively. Building the PID pipeline lead to the addition of new terms within the signaling node, also augmented by the release of the RGD "Immune and Inflammatory Disease Portal" at that time. Building the KEGG pipeline lead to a substantial increase in the number of disease pathway terms, such as those within the 'infectious disease pathway' parent term category. The 'drug pathway' node has also seen increases in the number of terms as well as a restructuring of the node. Literature surveys, disease portal deployments and user requests have contributed and continue to contribute additional new terms across the ontology. Since first presented, the content of PW has increased by over 75%. CONCLUSIONS: Ongoing development of the Pathway Ontology and the implementation of pipelines promote an enriched provision of pathway data. The ontology is freely available for download and use from the RGD ftp site at ftp://rgd.mcw.edu/pub/ontology/pathway/ or from the National Center for Biomedical Ontology (NCBO) BioPortal website at http://bioportal.bioontology.org/ontologies/PW.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...