Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 35
Filter
Add more filters










Publication year range
1.
Genetics ; 224(1)2023 05 04.
Article in English | MEDLINE | ID: mdl-36866529

ABSTRACT

The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO-a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations-evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)-mechanistic models of molecular "pathways" (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.


Subject(s)
Databases, Genetic , Proteins , Gene Ontology , Proteins/genetics , Molecular Sequence Annotation , Computational Biology
2.
Nucleic Acids Res ; 51(D1): D1075-D1085, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36318260

ABSTRACT

Scalable technologies to sequence the transcriptomes and epigenomes of single cells are transforming our understanding of cell types and cell states. The Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative Cell Census Network (BICCN) is applying these technologies at unprecedented scale to map the cell types in the mammalian brain. In an effort to increase data FAIRness (Findable, Accessible, Interoperable, Reusable), the NIH has established repositories to make data generated by the BICCN and related BRAIN Initiative projects accessible to the broader research community. Here, we describe the Neuroscience Multi-Omic Archive (NeMO Archive; nemoarchive.org), which serves as the primary repository for genomics data from the BRAIN Initiative. Working closely with other BRAIN Initiative researchers, we have organized these data into a continually expanding, curated repository, which contains transcriptomic and epigenomic data from over 50 million brain cells, including single-cell genomic data from all of the major regions of the adult and prenatal human and mouse brains, as well as substantial single-cell genomic data from non-human primates. We make available several tools for accessing these data, including a searchable web portal, a cloud-computing interface for large-scale data processing (implemented on Terra, terra.bio), and a visualization and analysis platform, NeMO Analytics (nemoanalytics.org).


Subject(s)
Brain , Databases, Genetic , Epigenomics , Multiomics , Transcriptome , Animals , Mice , Genomics , Mammals , Primates , Brain/cytology , Brain/metabolism
3.
Gigascience ; 112022 11 21.
Article in English | MEDLINE | ID: mdl-36409836

ABSTRACT

The Common Fund Data Ecosystem (CFDE) has created a flexible system of data federation that enables researchers to discover datasets from across the US National Institutes of Health Common Fund without requiring that data owners move, reformat, or rehost those data. This system is centered on a catalog that integrates detailed descriptions of biomedical datasets from individual Common Fund Programs' Data Coordination Centers (DCCs) into a uniform metadata model that can then be indexed and searched from a centralized portal. This Crosscut Metadata Model (C2M2) supports the wide variety of data types and metadata terms used by individual DCCs and can readily describe nearly all forms of biomedical research data. We detail its use to ingest and index data from 11 DCCs.


Subject(s)
Ecosystem , Financial Management , Metadata
4.
Nucleic Acids Res ; 50(D1): D1515-D1521, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34986598

ABSTRACT

The Evidence and Conclusion Ontology (ECO) is a community resource that provides an ontology of terms used to capture the type of evidence that supports biomedical annotations and assertions. Consistent capture of evidence information with ECO allows tracking of annotation provenance, establishment of quality control measures, and evidence-based data mining. ECO is in use by dozens of data repositories and resources with both specific and general areas of focus. ECO is continually being expanded and enhanced in response to user requests as well as our aim to adhere to community best-practices for ontology development. The ECO support team engages in multiple collaborations with other ontologies and annotating groups. Here we report on recent updates to the ECO ontology itself as well as associated resources that are available through this project. ECO project products are freely available for download from the project website (https://evidenceontology.org/) and GitHub (https://github.com/evidenceontology/evidenceontology). ECO is released into the public domain under a CC0 1.0 Universal license.


Subject(s)
Computational Biology/standards , Databases, Genetic , Gene Ontology , Software , Humans , Molecular Sequence Annotation
5.
Nucleic Acids Res ; 50(D1): D1255-D1261, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34755882

ABSTRACT

The Human Disease Ontology (DO) (www.disease-ontology.org) database, has significantly expanded the disease content and enhanced our userbase and website since the DO's 2018 Nucleic Acids Research DATABASE issue paper. Conservatively, based on available resource statistics, terms from the DO have been annotated to over 1.5 million biomedical data elements and citations, a 10× increase in the past 5 years. The DO, funded as a NHGRI Genomic Resource, plays a key role in disease knowledge organization, representation, and standardization, serving as a reference framework for multiscale biomedical data integration and analysis across thousands of clinical, biomedical and computational research projects and genomic resources around the world. This update reports on the addition of 1,793 new disease terms, a 14% increase of textual definitions and the integration of 22 137 new SubClassOf axioms defining disease to disease connections representing the DO's complex disease classification. The DO's updated website provides multifaceted etiology searching, enhanced documentation and educational resources.


Subject(s)
Biological Ontologies , Databases, Factual , Databases, Genetic , Genetic Diseases, Inborn/classification , Genetic Diseases, Inborn/genetics , Genomics/classification , Humans
6.
Nucleic Acids Res ; 50(D1): D480-D487, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34850135

ABSTRACT

The Database of Intrinsically Disordered Proteins (DisProt, URL: https://disprot.org) is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontology (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Minimum Information About Disorder (MIADE) standard, an active collaboration with the Gene Ontology (GO) and Evidence and Conclusion Ontology (ECO) consortia and the support of the ELIXIR infrastructure.


Subject(s)
Databases, Protein , Intrinsically Disordered Proteins/metabolism , Molecular Sequence Annotation , Software , Amino Acid Sequence , DNA/genetics , DNA/metabolism , Datasets as Topic , Gene Ontology , Humans , Internet , Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/genetics , Protein Binding , RNA/genetics , RNA/metabolism
7.
Nature ; 598(7879): 103-110, 2021 10.
Article in English | MEDLINE | ID: mdl-34616066

ABSTRACT

Single-cell transcriptomics can provide quantitative molecular signatures for large, unbiased samples of the diverse cell types in the brain1-3. With the proliferation of multi-omics datasets, a major challenge is to validate and integrate results into a biological understanding of cell-type organization. Here we generated transcriptomes and epigenomes from more than 500,000 individual cells in the mouse primary motor cortex, a structure that has an evolutionarily conserved role in locomotion. We developed computational and statistical methods to integrate multimodal data and quantitatively validate cell-type reproducibility. The resulting reference atlas-containing over 56 neuronal cell types that are highly replicable across analysis methods, sequencing technologies and modalities-is a comprehensive molecular and genomic account of the diverse neuronal and non-neuronal cell types in the mouse primary motor cortex. The atlas includes a population of excitatory neurons that resemble pyramidal cells in layer 4 in other cortical regions4. We further discovered thousands of concordant marker genes and gene regulatory elements for these cell types. Our results highlight the complex molecular regulation of cell types in the brain and will directly enable the design of reagents to target specific cell types in the mouse primary motor cortex for functional analysis.


Subject(s)
Epigenomics , Gene Expression Profiling , Motor Cortex/cytology , Neurons/classification , Single-Cell Analysis , Transcriptome , Animals , Atlases as Topic , Datasets as Topic , Epigenesis, Genetic , Female , Male , Mice , Motor Cortex/anatomy & histology , Neurons/cytology , Neurons/metabolism , Organ Specificity , Reproducibility of Results
8.
Database (Oxford) ; 20212021 07 09.
Article in English | MEDLINE | ID: mdl-34244718

ABSTRACT

The Ontology for Biomedical Investigations (OBI) underwent a focused review of assay term annotations, logic and hierarchy with a goal to improve and standardize these terms. As a result, inconsistencies in W3C Web Ontology Language (OWL) expressions were identified and corrected, and additionally, standardized design patterns and a formalized template to maintain them were developed. We describe here this informative and productive process to describe the specific benefits and obstacles for OBI and the universal lessons for similar projects.


Subject(s)
Biological Ontologies , Language , Reference Standards
9.
Front Res Metr Anal ; 6: 674205, 2021.
Article in English | MEDLINE | ID: mdl-34327299

ABSTRACT

Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology.

10.
Nucleic Acids Res ; 49(D1): D734-D742, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33305317

ABSTRACT

The Human Microbiome Project (HMP) explored microbial communities of the human body in both healthy and disease states. Two phases of the HMP (HMP and iHMP) together generated >48TB of data (public and controlled access) from multiple, varied omics studies of both the microbiome and associated hosts. The Human Microbiome Project Data Coordination Center (HMPDACC) was established to provide a portal to access data and resources produced by the HMP. The HMPDACC provides a unified data repository, multi-faceted search functionality, analysis pipelines and standardized protocols to facilitate community use of HMP data. Recent efforts have been put toward making HMP data more findable, accessible, interoperable and reusable. HMPDACC resources are freely available at www.hmpdacc.org.


Subject(s)
Databases, Genetic , Microbiota , Humans , Internet , Search Engine
11.
Microorganisms ; 8(3)2020 Feb 25.
Article in English | MEDLINE | ID: mdl-32106460

ABSTRACT

Despite significant interest and past work to elucidate the phylogeny and photochemistry of species of the Heliobacteriaceae, genomic analyses of heliobacteria to date have been limited to just one published genome, that of the thermophilic species Heliobacterium (Hbt.) modesticaldum str. Ice1T. Here we present an analysis of the complete genome of a second heliobacterium, Heliorestis (Hrs.) convoluta str. HHT, an alkaliphilic, mesophilic, and morphologically distinct heliobacterium isolated from an Egyptian soda lake. The genome of Hrs. convoluta is a single circular chromosome of 3.22 Mb with a GC content of 43.1% and 3263 protein-encoding genes. In addition to culture-based observations and insights gleaned from the Hbt. modesticaldum genome, an analysis of enzyme-encoding genes from key metabolic pathways supports an obligately photoheterotrophic lifestyle for Hrs. convoluta. A complete set of genes encoding enzymes for propionate and butyrate catabolism and the absence of a gene encoding lactate dehydrogenase distinguishes the carbon metabolism of Hrs. convoluta from its close relatives. Comparative analyses of key proteins in Hrs. convoluta, including cytochrome c553 and the Fo alpha subunit of ATP synthase, with those of related species reveal variations in specific amino acid residues that likely contribute to the success of Hrs. convoluta in its highly alkaline environment.

12.
J Biomed Semantics ; 10(1): 13, 2019 07 15.
Article in English | MEDLINE | ID: mdl-31307550

ABSTRACT

BACKGROUND: Microbial genetics has formed a foundation for understanding many aspects of biology. Systematic annotation that supports computational data mining should reveal further insights for microbes, microbiomes, and conserved functions beyond microbes. The Ontology of Microbial Phenotypes (OMP) was created to support such annotation. RESULTS: We define standards for an OMP-based annotation framework that supports the capture of a variety of phenotypes and provides flexibility for different levels of detail based on a combination of pre- and post-composition using OMP and other Open Biomedical Ontology (OBO) projects. A system for entering and viewing OMP annotations has been added to our online, public, web-based data portal. CONCLUSIONS: The annotation framework described here is ready to support projects to capture phenotypes from the experimental literature for a variety of microbes. Defining the OMP annotation standard should support the development of new software tools for data mining and analysis in comparative phenomics.


Subject(s)
Biological Ontologies , Data Curation/methods , Microbiology , Phenotype , Metadata
13.
Nucleic Acids Res ; 47(D1): D955-D962, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30407550

ABSTRACT

The Human Disease Ontology (DO) (http://www.disease-ontology.org), database has undergone significant expansion in the past three years. The DO disease classification includes specific formal semantic rules to express meaningful disease models and has expanded from a single asserted classification to include multiple-inferred mechanistic disease classifications, thus providing novel perspectives on related diseases. Expansion of disease terms, alternative anatomy, cell type and genetic disease classifications and workflow automation highlight the updates for the DO since 2015. The enhanced breadth and depth of the DO's knowledgebase has expanded the DO's utility for exploring the multi-etiology of human disease, thus improving the capture and communication of health-related data across biomedical databases, bioinformatics tools, genomic and cancer resources and demonstrated by a 6.6× growth in DO's user community since 2015. The DO's continual integration of human disease knowledge, evidenced by the more than 200 SVN/GitHub releases/revisions, since previously reported in our DO 2015 NAR paper, includes the addition of 2650 new disease terms, a 30% increase of textual definitions, and an expanding suite of disease classification hierarchies constructed through defined logical axioms.


Subject(s)
Biological Ontologies , Databases, Factual , Disease , Disease/classification , Disease/etiology , Humans , Workflow
14.
Nucleic Acids Res ; 47(D1): D1186-D1194, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30407590

ABSTRACT

The Evidence and Conclusion Ontology (ECO) contains terms (classes) that describe types of evidence and assertion methods. ECO terms are used in the process of biocuration to capture the evidence that supports biological assertions (e.g. gene product X has function Y as supported by evidence Z). Capture of this information allows tracking of annotation provenance, establishment of quality control measures and query of evidence. ECO contains over 1500 terms and is in use by many leading biological resources including the Gene Ontology, UniProt and several model organism databases. ECO is continually being expanded and revised based on the needs of the biocuration community. The ontology is freely available for download from GitHub (https://github.com/evidenceontology/) or the project's website (http://evidenceontology.org/). Users can request new terms or changes to existing terms through the project's GitHub site. ECO is released into the public domain under CC0 1.0 Universal.


Subject(s)
Computational Biology/methods , Databases, Genetic , Gene Ontology , Proteins/genetics , Animals , Humans , Information Storage and Retrieval/methods , Internet , Proteins/metabolism , Sequence Analysis, Protein , User-Computer Interface
15.
Genes (Basel) ; 9(2)2018 Jan 24.
Article in English | MEDLINE | ID: mdl-29364862

ABSTRACT

Rhizobium leguminosarum bv. viciae is a soil α-proteobacterium that establishes a diazotrophic symbiosis with different legumes of the Fabeae tribe. The number of genome sequences from rhizobial strains available in public databases is constantly increasing, although complete, fully annotated genome structures from rhizobial genomes are scarce. In this work, we report and analyse the complete genome of R. leguminosarum bv. viciae UPM791. Whole genome sequencing can provide new insights into the genetic features contributing to symbiotically relevant processes such as bacterial adaptation to the rhizosphere, mechanisms for efficient competition with other bacteria, and the ability to establish a complex signalling dialogue with legumes, to enter the root without triggering plant defenses, and, ultimately, to fix nitrogen within the host. Comparison of the complete genome sequences of two strains of R. leguminosarum bv. viciae, 3841 and UPM791, highlights the existence of different symbiotic plasmids and a common core chromosome. Specific genomic traits, such as plasmid content or a distinctive regulation, define differential physiological capabilities of these endosymbionts. Among them, strain UPM791 presents unique adaptations for recycling the hydrogen generated in the nitrogen fixation process.

17.
Nature ; 550(7674): 61-66, 2017 10 05.
Article in English | MEDLINE | ID: mdl-28953883

ABSTRACT

The characterization of baseline microbial and functional diversity in the human microbiome has enabled studies of microbiome-related disease, diversity, biogeography, and molecular function. The National Institutes of Health Human Microbiome Project has provided one of the broadest such characterizations so far. Here we introduce a second wave of data from the study, comprising 1,631 new metagenomes (2,355 total) targeting diverse body sites with multiple time points in 265 individuals. We applied updated profiling and assembly methods to provide new characterizations of microbiome personalization. Strain identification revealed subspecies clades specific to body sites; it also quantified species with phylogenetic diversity under-represented in isolate genomes. Body-wide functional profiling classified pathways into universal, human-enriched, and body site-enriched subsets. Finally, temporal analysis decomposed microbial variation into rapidly variable, moderately variable, and stable subsets. This study furthers our knowledge of baseline human microbial diversity and enables an understanding of personalized microbiome function and dynamics.


Subject(s)
Microbiota/physiology , Phylogeny , Datasets as Topic , Humans , Metagenome/genetics , Metagenome/physiology , Microbiota/genetics , Molecular Sequence Annotation , National Institutes of Health (U.S.) , Organ Specificity , Spatio-Temporal Analysis , Time Factors , United States
18.
Methods Mol Biol ; 1446: 245-259, 2017.
Article in English | MEDLINE | ID: mdl-27812948

ABSTRACT

The Evidence and Conclusion Ontology (ECO) is a community resource for describing the various types of evidence that are generated during the course of a scientific study and which are typically used to support assertions made by researchers. ECO describes multiple evidence types, including evidence resulting from experimental (i.e., wet lab) techniques, evidence arising from computational methods, statements made by authors (whether or not supported by evidence), and inferences drawn by researchers curating the literature. In addition to summarizing the evidence that supports a particular assertion, ECO also offers a means to document whether a computer or a human performed the process of making the annotation. Incorporating ECO into an annotation system makes it possible to leverage the structure of the ontology such that associated data can be grouped hierarchically, users can select data associated with particular evidence types, and quality control pipelines can be optimized. Today, over 30 resources, including the Gene Ontology, use the Evidence and Conclusion Ontology to represent both evidence and how annotations are made.


Subject(s)
Gene Ontology , Molecular Sequence Annotation/methods , Animals , Computational Biology/methods , Data Curation/methods , Databases, Genetic , Humans , Internet , Software
19.
Database (Oxford) ; 2015: bav043, 2015.
Article in English | MEDLINE | ID: mdl-25957950

ABSTRACT

Biocuration has become a cornerstone for analyses in biology, and to meet needs, the amount of annotations has considerably grown in recent years. However, the reliability of these annotations varies; it has thus become necessary to be able to assess the confidence in annotations. Although several resources already provide confidence information about the annotations that they produce, a standard way of providing such information has yet to be defined. This lack of standardization undermines the propagation of knowledge across resources, as well as the credibility of results from high-throughput analyses. Seeded at a workshop during the Biocuration 2012 conference, a working group has been created to address this problem. We present here the elements that were identified as essential for assessing confidence in annotations, as well as a draft ontology--the Confidence Information Ontology--to illustrate how the problems identified could be addressed. We hope that this effort will provide a home for discussing this major issue among the biocuration community. Tracker URL: https://github.com/BgeeDB/confidence-information-ontology Ontology URL: https://raw.githubusercontent.com/BgeeDB/confidence-information-ontology/master/src/ontology/cio-simple.obo


Subject(s)
Biological Ontologies , Data Curation/standards , Congresses as Topic
20.
BMC Microbiol ; 14: 294, 2014 Nov 30.
Article in English | MEDLINE | ID: mdl-25433798

ABSTRACT

BACKGROUND: Phenotypic data are routinely used to elucidate gene function in organisms amenable to genetic manipulation. However, previous to this work, there was no generalizable system in place for the structured storage and retrieval of phenotypic information for bacteria. RESULTS: The Ontology of Microbial Phenotypes (OMP) has been created to standardize the capture of such phenotypic information from microbes. OMP has been built on the foundations of the Basic Formal Ontology and the Phenotype and Trait Ontology. Terms have logical definitions that can facilitate computational searching of phenotypes and their associated genes. OMP can be accessed via a wiki page as well as downloaded from SourceForge. Initial annotations with OMP are being made for Escherichia coli using a wiki-based annotation capture system. New OMP terms are being concurrently developed as annotation proceeds. CONCLUSIONS: We anticipate that diverse groups studying microbial genetics and associated phenotypes will employ OMP for standardizing microbial phenotype annotation, much as the Gene Ontology has standardized gene product annotation. The resulting OMP resource and associated annotations will facilitate prediction of phenotypes for unknown genes and result in new experimental characterization of phenotypes and functions.


Subject(s)
Bacterial Physiological Phenomena , Computational Biology/methods , Software , Phenotype
SELECTION OF CITATIONS
SEARCH DETAIL
...