Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
4.
Nat Biotechnol ; 37(1): 29-37, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30556814

RESUMO

We present an extension of the Minimum Information about any (x) Sequence (MIxS) standard for reporting sequences of uncultivated virus genomes. Minimum Information about an Uncultivated Virus Genome (MIUViG) standards were developed within the Genomic Standards Consortium framework and include virus origin, genome quality, genome annotation, taxonomic classification, biogeographic distribution and in silico host prediction. Community-wide adoption of MIUViG standards, which complement the Minimum Information about a Single Amplified Genome (MISAG) and Metagenome-Assembled Genome (MIMAG) standards for uncultivated bacteria and archaea, will improve the reporting of uncultivated virus genomes in public databases. In turn, this should enable more robust comparative studies and a systematic exploration of the global virosphere.


Assuntos
Genoma Viral , Genômica/métodos , Cultura de Vírus , Vírus/genética , Vírus/isolamento & purificação , Bases de Dados Genéticas
5.
Nucleic Acids Res ; 47(D1): D955-D962, 2019 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-30407550

RESUMO

The Human Disease Ontology (DO) (http://www.disease-ontology.org), database has undergone significant expansion in the past three years. The DO disease classification includes specific formal semantic rules to express meaningful disease models and has expanded from a single asserted classification to include multiple-inferred mechanistic disease classifications, thus providing novel perspectives on related diseases. Expansion of disease terms, alternative anatomy, cell type and genetic disease classifications and workflow automation highlight the updates for the DO since 2015. The enhanced breadth and depth of the DO's knowledgebase has expanded the DO's utility for exploring the multi-etiology of human disease, thus improving the capture and communication of health-related data across biomedical databases, bioinformatics tools, genomic and cancer resources and demonstrated by a 6.6× growth in DO's user community since 2015. The DO's continual integration of human disease knowledge, evidenced by the more than 200 SVN/GitHub releases/revisions, since previously reported in our DO 2015 NAR paper, includes the addition of 2650 new disease terms, a 30% increase of textual definitions, and an expanding suite of disease classification hierarchies constructed through defined logical axioms.

6.
Nucleic Acids Res ; 47(D1): D1186-D1194, 2019 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-30407590

RESUMO

The Evidence and Conclusion Ontology (ECO) contains terms (classes) that describe types of evidence and assertion methods. ECO terms are used in the process of biocuration to capture the evidence that supports biological assertions (e.g. gene product X has function Y as supported by evidence Z). Capture of this information allows tracking of annotation provenance, establishment of quality control measures and query of evidence. ECO contains over 1500 terms and is in use by many leading biological resources including the Gene Ontology, UniProt and several model organism databases. ECO is continually being expanded and revised based on the needs of the biocuration community. The ontology is freely available for download from GitHub (https://github.com/evidenceontology/) or the project's website (http://evidenceontology.org/). Users can request new terms or changes to existing terms through the project's GitHub site. ECO is released into the public domain under CC0 1.0 Universal.

7.
Dis Model Mech ; 11(3)2018 03 12.
Artigo em Inglês | MEDLINE | ID: mdl-29590633

RESUMO

Model organisms are vital to uncovering the mechanisms of human disease and developing new therapeutic tools. Researchers collecting and integrating relevant model organism and/or human data often apply disparate terminologies (vocabularies and ontologies), making comparisons and inferences difficult. A unified disease ontology is required that connects data annotated using diverse disease terminologies, and in which the terminology relationships are continuously maintained. The Mouse Genome Database (MGD, http://www.informatics.jax.org), Rat Genome Database (RGD, http://rgd.mcw.edu) and Disease Ontology (DO, http://www.disease-ontology.org) projects are collaborating to augment DO, aligning and incorporating disease terms used by MGD and RGD, and improving DO as a tool for unifying disease annotations across species. Coordinated assessment of MGD's and RGD's disease term annotations identified new terms that enhance DO's representation of human diseases. Expansion of DO term content and cross-references to clinical vocabularies (e.g. OMIM, ORDO, MeSH) has enriched the DO's domain coverage and utility for annotating many types of data generated from experimental and clinical investigations. The extension of anatomy-based DO classification structure of disease improves accessibility of terms and facilitates application of DO for computational research. A consistent representation of disease associations across data types from cellular to whole organism, generated from clinical and model organism studies, will promote the integration, mining and comparative analysis of these data. The coordinated enrichment of the DO and adoption of DO by MGD and RGD demonstrates DO's usability across human data, MGD, RGD and the rest of the model organism database community.


Assuntos
Doença/genética , Ontologia Genética , Anotação de Sequência Molecular , Animais , Bases de Dados Genéticas , Camundongos , Ratos , Especificidade da Espécie
8.
NPJ Sci Food ; 2: 23, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-31304272

RESUMO

The construction of high capacity data sharing networks to support increasing government and commercial data exchange has highlighted a key roadblock: the content of existing Internet-connected information remains siloed due to a multiplicity of local languages and data dictionaries. This lack of a digital lingua franca is obvious in the domain of human food as materials travel from their wild or farm origin, through processing and distribution chains, to consumers. Well defined, hierarchical vocabulary, connected with logical relationships-in other words, an ontology-is urgently needed to help tackle data harmonization problems that span the domains of food security, safety, quality, production, distribution, and consumer health and convenience. FoodOn (http://foodon.org) is a consortium-driven project to build a comprehensive and easily accessible global farm-to-fork ontology about food, that accurately and consistently describes foods commonly known in cultures from around the world. FoodOn addresses food product terminology gaps and supports food traceability. Focusing on human and domesticated animal food description, FoodOn contains animal and plant food sources, food categories and products, and other facets like preservation processes, contact surfaces, and packaging. Much of FoodOn's vocabulary comes from transforming LanguaL, a mature and popular food indexing thesaurus, into a World Wide Web Consortium (W3C) OWL Web Ontology Language-formatted vocabulary that provides system interoperability, quality control, and software-driven intelligence. FoodOn compliments other technologies facilitating food traceability, which is becoming critical in this age of increasing globalization of food networks.

9.
Sci Data ; 4: 170030, 2017 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-28350385

RESUMO

The use of induced pluripotent stem cells (iPSC) derived from independent patients and sources holds considerable promise to improve the understanding of development and disease. However, optimized use of iPSC depends on our ability to develop methods to efficiently qualify cell lines and protocols, monitor genetic stability, and evaluate self-renewal and differentiation potential. To accomplish these goals, 57 stem cell lines from 10 laboratories were differentiated to 7 different states, resulting in 248 analyzed samples. Cell lines were differentiated and characterized at a central laboratory using standardized cell culture methodologies, protocols, and metadata descriptors. Stem cell and derived differentiated lines were characterized using RNA-seq, miRNA-seq, copy number arrays, DNA methylation arrays, flow cytometry, and molecular histology. All materials, including raw data, metadata, analysis and processing code, and methodological and provenance documentation are publicly available for re-use and interactive exploration at https://www.synapse.org/pcbc. The goal is to provide data that can improve our ability to robustly and reproducibly use human pluripotent stem cells to understand development and disease.


Assuntos
Células-Tronco Pluripotentes Induzidas , Células-Tronco Pluripotentes , Animais , Técnicas de Cultura de Células , Humanos
10.
J Biomol Tech ; 28(1): 31-39, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28337070

RESUMO

The Extreme Microbiome Project (XMP) is a project launched by the Association of Biomolecular Resource Facilities Metagenomics Research Group (ABRF MGRG) that focuses on whole genome shotgun sequencing of extreme and unique environments using a wide variety of biomolecular techniques. The goals are multifaceted, including development and refinement of new techniques for the following: 1) the detection and characterization of novel microbes, 2) the evaluation of nucleic acid techniques for extremophilic samples, and 3) the identification and implementation of the appropriate bioinformatics pipelines. Here, we highlight the different ongoing projects that we have been working on, as well as details on the various methods we use to characterize the microbiome and metagenome of these complex samples. In particular, we present data of a novel multienzyme extraction protocol that we developed, called Polyzyme or MetaPolyZyme. Presently, the XMP is characterizing sample sites around the world with the intent of discovering new species, genes, and gene clusters. Once a project site is complete, the resulting data will be publically available. Sites include Lake Hillier in Western Australia, the "Door to Hell" crater in Turkmenistan, deep ocean brine lakes of the Gulf of Mexico, deep ocean sediments from Greenland, permafrost tunnels in Alaska, ancient microbial biofilms from Antarctica, Blue Lagoon Iceland, Ethiopian toxic hot springs, and the acidic hypersaline ponds in Western Australia.


Assuntos
Microbiologia Ambiental , Microbiota/genética , DNA Bacteriano/genética , DNA Bacteriano/isolamento & purificação , Ambientes Extremos , Metagenoma , Tipagem Molecular/normas , RNA Bacteriano/genética , RNA Bacteriano/isolamento & purificação , Padrões de Referência , Análise de Sequência de DNA/normas
11.
Stem Cell Reports ; 7(1): 110-25, 2016 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-27293150

RESUMO

The rigorous characterization of distinct induced pluripotent stem cells (iPSC) derived from multiple reprogramming technologies, somatic sources, and donors is required to understand potential sources of variability and downstream potential. To achieve this goal, the Progenitor Cell Biology Consortium performed comprehensive experimental and genomic analyses of 58 iPSC from ten laboratories generated using a variety of reprogramming genes, vectors, and cells. Associated global molecular characterization studies identified functionally informative correlations in gene expression, DNA methylation, and/or copy-number variation among key developmental and oncogenic regulators as a result of donor, sex, line stability, reprogramming technology, and cell of origin. Furthermore, X-chromosome inactivation in PSC produced highly correlated differences in teratoma-lineage staining and regulator expression upon differentiation. All experimental results, and raw, processed, and metadata from these analyses, including powerful tools, are interactively accessible from a new online portal at https://www.synapse.org to serve as a reusable resource for the stem cell community.


Assuntos
Diferenciação Celular/genética , Metilação de DNA/genética , Genoma Humano , Células-Tronco Pluripotentes Induzidas , Reprogramação Celular , Expressão Gênica/genética , Genômica , Humanos , Células-Tronco/metabolismo
12.
Artigo em Inglês | MEDLINE | ID: mdl-26989148

RESUMO

Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes have been imported from NCBI and 27,306 human proteins and 16,728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists. In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web. Database URL: https://www.wikidata.org/.


Assuntos
Bases de Dados de Ácidos Nucleicos , Semântica , Animais , Humanos , Camundongos , Modelos Teóricos , Ferramenta de Busca
13.
Mamm Genome ; 26(9-10): 584-9, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26093607

RESUMO

The Disease Ontology (DO) enables cross-domain data integration through a common standard of human disease terms and their etiological descriptions. Standardized disease descriptors that are integrated across mammalian genomic resources provide a human-readable, machine-interpretable, community-driven disease corpus that unifies the representation of human common and rare diseases. The DO is populated by consensus-driven disease data descriptors that incorporate disease terms utilized by genomic and genetic projects and resources engaged in studies to understand the genetics of human disease through the study of model organisms. The DO project serves multiple roles for the model organism community by providing: (1) a structured "backbone" of disease concepts represented among the model organism databases; (2) authoritative disease curation services to researchers and resource providers; and (3) development of subsets of the DO representative of human diseases annotated to animal models curated within the model organism databases.


Assuntos
Bases de Dados Genéticas , Modelos Animais de Doenças , Doenças Genéticas Inatas/classificação , Animais , Doenças Genéticas Inatas/genética , Genoma , Humanos , Fenótipo
14.
Am J Hum Genet ; 97(1): 111-24, 2015 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-26119816

RESUMO

The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.


Assuntos
Ontologia Genética/tendências , Doenças Genéticas Inatas/classificação , Doenças Genéticas Inatas/genética , Fenótipo , Terminologia como Assunto , Doenças Genéticas Inatas/patologia , Humanos , MEDLINE , Modelos Biológicos
15.
Database (Oxford) ; 2015: bav032, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25841438

RESUMO

Bio-ontologies provide terminologies for the scientific community to describe biomedical entities in a standardized manner. There are multiple initiatives that are developing biomedical terminologies for the purpose of providing better annotation, data integration and mining capabilities. Terminology resources devised for multiple purposes inherently diverge in content and structure. A major issue of biomedical data integration is the development of overlapping terms, ambiguous classifications and inconsistencies represented across databases and publications. The disease ontology (DO) was developed over the past decade to address data integration, standardization and annotation issues for human disease data. We have established a DO cancer project to be a focused view of cancer terms within the DO. The DO cancer project mapped 386 cancer terms from the Catalogue of Somatic Mutations in Cancer (COSMIC), The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium, Therapeutically Applicable Research to Generate Effective Treatments, Integrative Oncogenomics and the Early Detection Research Network into a cohesive set of 187 DO terms represented by 63 top-level DO cancer terms. For example, the COSMIC term 'kidney, NS, carcinoma, clear_cell_renal_cell_carcinoma' and TCGA term 'Kidney renal clear cell carcinoma' were both grouped to the term 'Disease Ontology Identification (DOID):4467 / renal clear cell carcinoma' which was mapped to the TopNodes_DOcancerslim term 'DOID:263 / kidney cancer'. Mapping of diverse cancer terms to DO and the use of top level terms (DO slims) will enable pan-cancer analysis across datasets generated from any of the cancer term sources where pan-cancer means including or relating to all or multiple types of cancer. The terms can be browsed from the DO web site (http://www.disease-ontology.org) and downloaded from the DO's Apache Subversion or GitHub repositories. Database URL: http://www.disease-ontology.org


Assuntos
Ontologias Biológicas , Mineração de Dados , Bases de Dados Factuais , Neoplasias , Animais , Humanos
16.
Nucleic Acids Res ; 43(Database issue): D1071-8, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25348409

RESUMO

The current version of the Human Disease Ontology (DO) (http://www.disease-ontology.org) database expands the utility of the ontology for the examination and comparison of genetic variation, phenotype, protein, drug and epitope data through the lens of human disease. DO is a biomedical resource of standardized common and rare disease concepts with stable identifiers organized by disease etiology. The content of DO has had 192 revisions since 2012, including the addition of 760 terms. Thirty-two percent of all terms now include definitions. DO has expanded the number and diversity of research communities and community members by 50+ during the past two years. These community members actively submit term requests, coordinate biomedical resource disease representation and provide expert curation guidance. Since the DO 2012 NAR paper, there have been hundreds of term requests and a steady increase in the number of DO listserv members, twitter followers and DO website usage. DO is moving to a multi-editor model utilizing Protégé to curate DO in web ontology language. This will enable closer collaboration with the Human Phenotype Ontology, EBI's Ontology Working Group, Mouse Genome Informatics and the Monarch Initiative among others, and enhance DO's current asserted view and multiple inferred views through reasoning.


Assuntos
Ontologias Biológicas , Bases de Dados Factuais , Doença , Doenças Genéticas Inatas , Humanos , Internet , Doenças Raras/genética
17.
Stand Genomic Sci ; 9(3): 599-601, 2014 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-25197446

RESUMO

The Genomic Standards Consortium (GSC) is an open-membership community that was founded in 2005 to work towards the development, implementation and harmonization of standards in the field of genomics. Starting with the defined task of establishing a minimal set of descriptions the GSC has evolved into an active standards-setting body that currently has 18 ongoing projects, with additional projects regularly proposed from within and outside the GSC. Here we describe our recently enacted policy for proposing new activities that are intended to be taken on by the GSC, along with the template for proposing such new activities.

18.
PLoS One ; 9(6): e99979, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24936976

RESUMO

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.


Assuntos
Bases de Dados Genéticas/normas , Animais , Doenças Transmissíveis/microbiologia , Doenças Transmissíveis/parasitologia , Conjuntos de Dados como Assunto , Vetores de Doenças , Ontologia Genética , Genoma , Humanos , Padrões de Referência , Análise de Sequência de DNA , Virulência/genética
20.
Stand Genomic Sci ; 6(2): 276-86, 2012 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-22768370

RESUMO

This report details the outcome of the 13(th) Meeting of the Genomic Standards Consortium. The three-day conference was held at the Kingkey Palace Hotel, Shenzhen, China, on March 5-7, 2012, and was hosted by the Beijing Genomics Institute. The meeting, titled From Genomes to Interactions to Communities to Models, highlighted the role of data standards associated with genomic, metagenomic, and amplicon sequence data and the contextual information associated with the sample. To this end the meeting focused on genomic projects for animals, plants, fungi, and viruses; metagenomic studies in host-microbe interactions; and the dynamics of microbial communities. In addition, the meeting hosted a Genomic Observatories Network session, a Genomic Standards Consortium biodiversity working group session, and a Microbiology of the Built Environment session sponsored by the Alfred P. Sloan Foundation.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA