Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 23
1.
Genetics ; 224(1)2023 05 04.
Article En | MEDLINE | ID: mdl-36866529

The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO-a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations-evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)-mechanistic models of molecular "pathways" (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.


Databases, Genetic , Proteins , Gene Ontology , Proteins/genetics , Molecular Sequence Annotation , Computational Biology
2.
Nucleic Acids Res ; 51(D1): D1075-D1085, 2023 01 06.
Article En | MEDLINE | ID: mdl-36318260

Scalable technologies to sequence the transcriptomes and epigenomes of single cells are transforming our understanding of cell types and cell states. The Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative Cell Census Network (BICCN) is applying these technologies at unprecedented scale to map the cell types in the mammalian brain. In an effort to increase data FAIRness (Findable, Accessible, Interoperable, Reusable), the NIH has established repositories to make data generated by the BICCN and related BRAIN Initiative projects accessible to the broader research community. Here, we describe the Neuroscience Multi-Omic Archive (NeMO Archive; nemoarchive.org), which serves as the primary repository for genomics data from the BRAIN Initiative. Working closely with other BRAIN Initiative researchers, we have organized these data into a continually expanding, curated repository, which contains transcriptomic and epigenomic data from over 50 million brain cells, including single-cell genomic data from all of the major regions of the adult and prenatal human and mouse brains, as well as substantial single-cell genomic data from non-human primates. We make available several tools for accessing these data, including a searchable web portal, a cloud-computing interface for large-scale data processing (implemented on Terra, terra.bio), and a visualization and analysis platform, NeMO Analytics (nemoanalytics.org).


Brain , Databases, Genetic , Epigenomics , Multiomics , Transcriptome , Animals , Mice , Genomics , Mammals , Primates , Brain/cytology , Brain/metabolism
3.
Gigascience ; 112022 11 21.
Article En | MEDLINE | ID: mdl-36409836

The Common Fund Data Ecosystem (CFDE) has created a flexible system of data federation that enables researchers to discover datasets from across the US National Institutes of Health Common Fund without requiring that data owners move, reformat, or rehost those data. This system is centered on a catalog that integrates detailed descriptions of biomedical datasets from individual Common Fund Programs' Data Coordination Centers (DCCs) into a uniform metadata model that can then be indexed and searched from a centralized portal. This Crosscut Metadata Model (C2M2) supports the wide variety of data types and metadata terms used by individual DCCs and can readily describe nearly all forms of biomedical research data. We detail its use to ingest and index data from 11 DCCs.


Ecosystem , Financial Management , Metadata
4.
Nucleic Acids Res ; 50(D1): D1515-D1521, 2022 01 07.
Article En | MEDLINE | ID: mdl-34986598

The Evidence and Conclusion Ontology (ECO) is a community resource that provides an ontology of terms used to capture the type of evidence that supports biomedical annotations and assertions. Consistent capture of evidence information with ECO allows tracking of annotation provenance, establishment of quality control measures, and evidence-based data mining. ECO is in use by dozens of data repositories and resources with both specific and general areas of focus. ECO is continually being expanded and enhanced in response to user requests as well as our aim to adhere to community best-practices for ontology development. The ECO support team engages in multiple collaborations with other ontologies and annotating groups. Here we report on recent updates to the ECO ontology itself as well as associated resources that are available through this project. ECO project products are freely available for download from the project website (https://evidenceontology.org/) and GitHub (https://github.com/evidenceontology/evidenceontology). ECO is released into the public domain under a CC0 1.0 Universal license.


Computational Biology/standards , Databases, Genetic , Gene Ontology , Software , Humans , Molecular Sequence Annotation
5.
Nucleic Acids Res ; 50(D1): D480-D487, 2022 01 07.
Article En | MEDLINE | ID: mdl-34850135

The Database of Intrinsically Disordered Proteins (DisProt, URL: https://disprot.org) is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontology (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Minimum Information About Disorder (MIADE) standard, an active collaboration with the Gene Ontology (GO) and Evidence and Conclusion Ontology (ECO) consortia and the support of the ELIXIR infrastructure.


Databases, Protein , Intrinsically Disordered Proteins/metabolism , Molecular Sequence Annotation , Software , Amino Acid Sequence , DNA/genetics , DNA/metabolism , Datasets as Topic , Gene Ontology , Humans , Internet , Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/genetics , Protein Binding , RNA/genetics , RNA/metabolism
6.
Front Res Metr Anal ; 6: 674205, 2021.
Article En | MEDLINE | ID: mdl-34327299

Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology.

7.
Microbiol Resour Announc ; 10(23): e0045221, 2021 Jun 10.
Article En | MEDLINE | ID: mdl-34110239

Neisseria musculi is an oral commensal of wild-caught mice. Here, we report the complete genome sequence of N. musculi strain NW831, generated using a combination of the Illumina and PacBio platforms.

8.
Microbiol Resour Announc ; 9(30)2020 Jul 23.
Article En | MEDLINE | ID: mdl-32703831

The 13,647-bp complete mitochondrial genome of Mansonella perstans was sequenced and is syntenic to the mitochondrial genome of Mansonella ozzardi Phylogenetic analysis of the mitochondrial genome is consistent with the known phylogeny of ONC5 group filarial nematodes.

9.
Microbiol Resour Announc ; 9(27)2020 Jul 02.
Article En | MEDLINE | ID: mdl-32616635

Brugia pahangi is a zoonotic parasite that is closely related to human-infecting filarial nematodes. Here, we report the nearly complete genome of Brugia pahangi, including assemblies of four autosomes and an X chromosome, with only seven gaps. The Y chromosome is still not completely assembled.

10.
Microbiol Resour Announc ; 9(27)2020 Jul 02.
Article En | MEDLINE | ID: mdl-32616636

Lymphatic filariasis is a devastating disease caused by filarial nematode roundworms, which contain obligate Wolbachia endosymbionts. Here, we assembled the genome of wBp, the Wolbachia endosymbiont of the filarial nematode Brugia pahangi, from Illumina, Pacific Biosciences, and Oxford Nanopore data. The complete, circular genome is 1,072,967 bp.

11.
Microorganisms ; 8(3)2020 Feb 25.
Article En | MEDLINE | ID: mdl-32106460

Despite significant interest and past work to elucidate the phylogeny and photochemistry of species of the Heliobacteriaceae, genomic analyses of heliobacteria to date have been limited to just one published genome, that of the thermophilic species Heliobacterium (Hbt.) modesticaldum str. Ice1T. Here we present an analysis of the complete genome of a second heliobacterium, Heliorestis (Hrs.) convoluta str. HHT, an alkaliphilic, mesophilic, and morphologically distinct heliobacterium isolated from an Egyptian soda lake. The genome of Hrs. convoluta is a single circular chromosome of 3.22 Mb with a GC content of 43.1% and 3263 protein-encoding genes. In addition to culture-based observations and insights gleaned from the Hbt. modesticaldum genome, an analysis of enzyme-encoding genes from key metabolic pathways supports an obligately photoheterotrophic lifestyle for Hrs. convoluta. A complete set of genes encoding enzymes for propionate and butyrate catabolism and the absence of a gene encoding lactate dehydrogenase distinguishes the carbon metabolism of Hrs. convoluta from its close relatives. Comparative analyses of key proteins in Hrs. convoluta, including cytochrome c553 and the Fo alpha subunit of ATP synthase, with those of related species reveal variations in specific amino acid residues that likely contribute to the success of Hrs. convoluta in its highly alkaline environment.

12.
Microbiol Resour Announc ; 8(43)2019 Oct 24.
Article En | MEDLINE | ID: mdl-31649084

Here, we present the complete genome sequence of the Wolbachia endosymbiont wAna, isolated from Drosophila ananassae and derived from Oxford Nanopore and Illumina sequencing. We anticipate that this will aid in Wolbachia comparative genomics and the assembly of D. ananassae specifically in regions containing extensive lateral gene transfer events.

13.
Nat Commun ; 10(1): 3313, 2019 07 25.
Article En | MEDLINE | ID: mdl-31346170

FDA proactively invests in tools to support innovation of emerging technologies, such as infectious disease next generation sequencing (ID-NGS). Here, we introduce FDA-ARGOS quality-controlled reference genomes as a public database for diagnostic purposes and demonstrate its utility on the example of two use cases. We provide quality control metrics for the FDA-ARGOS genomic database resource and outline the need for genome quality gap filling in the public domain. In the first use case, we show more accurate microbial identification of Enterococcus avium from metagenomic samples with FDA-ARGOS reference genomes compared to non-curated GenBank genomes. In the second use case, we demonstrate the utility of FDA-ARGOS reference genomes for Ebola virus target sequence comparison as part of a composite validation strategy for ID-NGS diagnostic tests. The use of FDA-ARGOS as an in silico target sequence comparator tool combined with representative clinical testing could reduce the burden for completing ID-NGS clinical trials.


Communicable Diseases/diagnosis , Databases, Nucleic Acid/standards , Genome , Access to Information , Communicable Diseases/microbiology , Databases, Nucleic Acid/organization & administration , High-Throughput Nucleotide Sequencing , Humans , United States , United States Food and Drug Administration
14.
Nucleic Acids Res ; 47(D1): D1186-D1194, 2019 01 08.
Article En | MEDLINE | ID: mdl-30407590

The Evidence and Conclusion Ontology (ECO) contains terms (classes) that describe types of evidence and assertion methods. ECO terms are used in the process of biocuration to capture the evidence that supports biological assertions (e.g. gene product X has function Y as supported by evidence Z). Capture of this information allows tracking of annotation provenance, establishment of quality control measures and query of evidence. ECO contains over 1500 terms and is in use by many leading biological resources including the Gene Ontology, UniProt and several model organism databases. ECO is continually being expanded and revised based on the needs of the biocuration community. The ontology is freely available for download from GitHub (https://github.com/evidenceontology/) or the project's website (http://evidenceontology.org/). Users can request new terms or changes to existing terms through the project's GitHub site. ECO is released into the public domain under CC0 1.0 Universal.


Computational Biology/methods , Databases, Genetic , Gene Ontology , Proteins/genetics , Animals , Humans , Information Storage and Retrieval/methods , Internet , Proteins/metabolism , Sequence Analysis, Protein , User-Computer Interface
15.
Article En | MEDLINE | ID: mdl-30533624

Erwinia dacicola is a dominant endosymbiont of the pestiferous olive fly. Its genome is similar in size and GC content to those of free-living Erwinia species, including the plant pathogen Erwinia amylovora. The E. dacicola genome encodes the metabolic capability to supplement and detoxify the olive fly's diet in larval and adult stages.

16.
Article En | MEDLINE | ID: mdl-30533936

Enterobacter sp. strain OLF colonizes laboratory-reared and wild individuals of the olive fruit fly Bactrocera oleae. The 5.07-kbp genome sequence of Enterobacter sp. strain OLF encodes metabolic pathways that allow the bacterium to partially supplement the diet of the olive fly when its dominant endosymbiont, Erwinia dacicola, is absent.

17.
Genes (Basel) ; 9(2)2018 Jan 24.
Article En | MEDLINE | ID: mdl-29364862

Rhizobium leguminosarum bv. viciae is a soil α-proteobacterium that establishes a diazotrophic symbiosis with different legumes of the Fabeae tribe. The number of genome sequences from rhizobial strains available in public databases is constantly increasing, although complete, fully annotated genome structures from rhizobial genomes are scarce. In this work, we report and analyse the complete genome of R. leguminosarum bv. viciae UPM791. Whole genome sequencing can provide new insights into the genetic features contributing to symbiotically relevant processes such as bacterial adaptation to the rhizosphere, mechanisms for efficient competition with other bacteria, and the ability to establish a complex signalling dialogue with legumes, to enter the root without triggering plant defenses, and, ultimately, to fix nitrogen within the host. Comparison of the complete genome sequences of two strains of R. leguminosarum bv. viciae, 3841 and UPM791, highlights the existence of different symbiotic plasmids and a common core chromosome. Specific genomic traits, such as plasmid content or a distinctive regulation, define differential physiological capabilities of these endosymbionts. Among them, strain UPM791 presents unique adaptations for recycling the hydrogen generated in the nitrogen fixation process.

18.
Genome Announc ; 5(40)2017 Oct 05.
Article En | MEDLINE | ID: mdl-28983003

Here, we report the complete genome sequence of Bifidobacterium pseudolongum strain UMB-MBP-01, isolated from the feces of C57BL/6J mice. This strain was identified in microbiome profiling studies and associated with improved transplant outcome in a murine model of cardiac heterotypic transplantation.

19.
Stand Genomic Sci ; 10: 89, 2015.
Article En | MEDLINE | ID: mdl-26516405

Members of the Mycoplasma mycoides cluster' represent important livestock pathogens worldwide. Mycoplasma mycoides subsp. mycoides is the etiologic agent of contagious bovine pleuropneumonia (CBPP), which is still endemic in many parts of Africa. We report the genome sequences and annotation of two frequently used challenge strains of Mycoplasma mycoides subsp. mycoides, Afadé and B237. The information provided will enable downstream 'omics' applications such as proteomics, transcriptomics and reverse vaccinology approaches. Despite the absence of Mycoplasma pneumoniae like cyto-adhesion encoding genes, the two strains showed the presence of protrusions. This phenotype is likely encoded by another set of genes.

20.
Genome Announc ; 1(1)2013 Jan.
Article En | MEDLINE | ID: mdl-23469346

Members of the "Mycoplasma mycoides cluster" represent important livestock pathogens worldwide. We report the genome sequence of Mycoplasma feriruminatoris sp. nov., the closest relative to the "Mycoplasma mycoides cluster" and the fastest-growing Mycoplasma species described to date.

...