RESUMO
Scalable technologies to sequence the transcriptomes and epigenomes of single cells are transforming our understanding of cell types and cell states. The Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative Cell Census Network (BICCN) is applying these technologies at unprecedented scale to map the cell types in the mammalian brain. In an effort to increase data FAIRness (Findable, Accessible, Interoperable, Reusable), the NIH has established repositories to make data generated by the BICCN and related BRAIN Initiative projects accessible to the broader research community. Here, we describe the Neuroscience Multi-Omic Archive (NeMO Archive; nemoarchive.org), which serves as the primary repository for genomics data from the BRAIN Initiative. Working closely with other BRAIN Initiative researchers, we have organized these data into a continually expanding, curated repository, which contains transcriptomic and epigenomic data from over 50 million brain cells, including single-cell genomic data from all of the major regions of the adult and prenatal human and mouse brains, as well as substantial single-cell genomic data from non-human primates. We make available several tools for accessing these data, including a searchable web portal, a cloud-computing interface for large-scale data processing (implemented on Terra, terra.bio), and a visualization and analysis platform, NeMO Analytics (nemoanalytics.org).
Assuntos
Encéfalo , Bases de Dados Genéticas , Epigenômica , Multiômica , Transcriptoma , Animais , Camundongos , Genômica , Mamíferos , Primatas , Encéfalo/citologia , Encéfalo/metabolismoRESUMO
This corrects the article DOI: 10.1038/nature23889.
RESUMO
The characterization of baseline microbial and functional diversity in the human microbiome has enabled studies of microbiome-related disease, diversity, biogeography, and molecular function. The National Institutes of Health Human Microbiome Project has provided one of the broadest such characterizations so far. Here we introduce a second wave of data from the study, comprising 1,631 new metagenomes (2,355 total) targeting diverse body sites with multiple time points in 265 individuals. We applied updated profiling and assembly methods to provide new characterizations of microbiome personalization. Strain identification revealed subspecies clades specific to body sites; it also quantified species with phylogenetic diversity under-represented in isolate genomes. Body-wide functional profiling classified pathways into universal, human-enriched, and body site-enriched subsets. Finally, temporal analysis decomposed microbial variation into rapidly variable, moderately variable, and stable subsets. This study furthers our knowledge of baseline human microbial diversity and enables an understanding of personalized microbiome function and dynamics.
Assuntos
Microbiota/fisiologia , Filogenia , Conjuntos de Dados como Assunto , Humanos , Metagenoma/genética , Metagenoma/fisiologia , Microbiota/genética , Anotação de Sequência Molecular , National Institutes of Health (U.S.) , Especificidade de Órgãos , Análise Espaço-Temporal , Fatores de Tempo , Estados UnidosRESUMO
The NCI Cancer Research Data Commons (CRDC) is a collection of data commons, analysis platforms, and tools that make existing cancer data more findable and accessible by the cancer research community. In practice, the two biggest hurdles to finding and using data for discovery are the wide variety of models and ontologies used to describe data, and the dispersed storage of that data. Here, we outline core CRDC services to aggregate descriptive information from multiple studies for findability via a single interface and to provide a single access method that spans multiple data commons. See related articles by Wang et al., p. 1388, Pot et al., p. 1396, and Kim et al., p. 1404.
Assuntos
National Cancer Institute (U.S.) , Neoplasias , Humanos , Estados Unidos , Neoplasias/terapia , Pesquisa Biomédica/normas , Bases de Dados FactuaisRESUMO
Many datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While biospecimen and experimental information is often captured, detailed metadata standards related to data matrices and analysis workflows are currently lacking. To address this, we develop the matrix and analysis metadata standards (MAMS) to serve as a resource for data centers, repositories, and tool developers. We define metadata fields for matrices and parameters commonly utilized in analytical workflows and developed the rmams package to extract MAMS from single-cell objects. Overall, MAMS promotes the harmonization, integration, and reproducibility of single-cell data across platforms.
Assuntos
Metadados , Análise de Célula Única , Análise de Célula Única/métodos , Análise de Célula Única/normas , Reprodutibilidade dos Testes , Humanos , SoftwareRESUMO
A large number of genomic and imaging datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While much effort has been devoted to capturing information related to biospecimen information and experimental procedures, the metadata standards that describe data matrices and the analysis workflows that produced them are relatively lacking. Detailed metadata schema related to data analysis are needed to facilitate sharing and interoperability across groups and to promote data provenance for reproducibility. To address this need, we developed the Matrix and Analysis Metadata Standards (MAMS) to serve as a resource for data coordinating centers and tool developers. We first curated several simple and complex "use cases" to characterize the types of feature-observation matrices (FOMs), annotations, and analysis metadata produced in different workflows. Based on these use cases, metadata fields were defined to describe the data contained within each matrix including those related to processing, modality, and subsets. Suggested terms were created for the majority of fields to aid in harmonization of metadata terms across groups. Additional provenance metadata fields were also defined to describe the software and workflows that produced each FOM. Finally, we developed a simple list-like schema that can be used to store MAMS information and implemented in multiple formats. Overall, MAMS can be used as a guide to harmonize analysis-related metadata which will ultimately facilitate integration of datasets across tools and consortia. MAMS specifications, use cases, and examples can be found at https://github.com/single-cell-mams/mams/.
RESUMO
The Common Fund Data Ecosystem (CFDE) has created a flexible system of data federation that enables researchers to discover datasets from across the US National Institutes of Health Common Fund without requiring that data owners move, reformat, or rehost those data. This system is centered on a catalog that integrates detailed descriptions of biomedical datasets from individual Common Fund Programs' Data Coordination Centers (DCCs) into a uniform metadata model that can then be indexed and searched from a centralized portal. This Crosscut Metadata Model (C2M2) supports the wide variety of data types and metadata terms used by individual DCCs and can readily describe nearly all forms of biomedical research data. We detail its use to ingest and index data from 11 DCCs.
Assuntos
Ecossistema , Administração Financeira , MetadadosRESUMO
The Institute for Genome Sciences (IGS) has developed a prokaryotic annotation pipeline that is used for coding gene/RNA prediction and functional annotation of Bacteria and Archaea. The fully automated pipeline accepts one or many genomic sequences as input and produces output in a variety of standard formats. Functional annotation is primarily based on similarity searches and motif finding combined with a hierarchical rule based annotation system. The output annotations can also be loaded into a relational database and accessed through visualization tools.