Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 31
Filter
2.
Nature ; 583(7818): 744-751, 2020 07.
Article in English | MEDLINE | ID: mdl-32728240

ABSTRACT

The Encyclopedia of DNA Elements (ENCODE) project has established a genomic resource for mammalian development, profiling a diverse panel of mouse tissues at 8 developmental stages from 10.5 days after conception until birth, including transcriptomes, methylomes and chromatin states. Here we systematically examined the state and accessibility of chromatin in the developing mouse fetus. In total we performed 1,128 chromatin immunoprecipitation with sequencing (ChIP-seq) assays for histone modifications and 132 assay for transposase-accessible chromatin using sequencing (ATAC-seq) assays for chromatin accessibility across 72 distinct tissue-stages. We used integrative analysis to develop a unified set of chromatin state annotations, infer the identities of dynamic enhancers and key transcriptional regulators, and characterize the relationship between chromatin state and accessibility during developmental gene regulation. We also leveraged these data to link enhancers to putative target genes and demonstrate tissue-specific enrichments of sequence variants associated with disease in humans. The mouse ENCODE data sets provide a compendium of resources for biomedical researchers and achieve, to our knowledge, the most comprehensive view of chromatin dynamics during mammalian fetal development to date.


Subject(s)
Chromatin/genetics , Chromatin/metabolism , Datasets as Topic , Fetal Development/genetics , Histones/metabolism , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Animals , Chromatin/chemistry , Chromatin Immunoprecipitation Sequencing , Disease/genetics , Enhancer Elements, Genetic/genetics , Female , Gene Expression Regulation, Developmental/genetics , Genetic Variation , Histones/chemistry , Humans , Male , Mice , Mice, Inbred C57BL , Organ Specificity/genetics , Reproducibility of Results , Transposases/metabolism
3.
Mol Cell ; 61(6): 903-13, 2016 Mar 17.
Article in English | MEDLINE | ID: mdl-26990993

ABSTRACT

Transcriptome-wide maps of RNA binding protein (RBP)-RNA interactions by immunoprecipitation (IP)-based methods such as RNA IP (RIP) and crosslinking and IP (CLIP) are key starting points for evaluating the molecular roles of the thousands of human RBPs. A significant bottleneck to the application of these methods in diverse cell lines, tissues, and developmental stages is the availability of validated IP-quality antibodies. Using IP followed by immunoblot assays, we have developed a validated repository of 438 commercially available antibodies that interrogate 365 unique RBPs. In parallel, 362 short-hairpin RNA (shRNA) constructs against 276 unique RBPs were also used to confirm specificity of these antibodies. These antibodies can characterize subcellular RBP localization. With the burgeoning interest in the roles of RBPs in cancer, neurobiology, and development, these resources are invaluable to the broad scientific community. Detailed information about these resources is publicly available at the ENCODE portal (https://www.encodeproject.org/).


Subject(s)
Databases, Genetic , RNA-Binding Proteins/genetics , RNA/metabolism , Transcriptome/genetics , Binding Sites , Humans , Protein Binding , RNA/genetics , RNA, Small Interfering/classification , RNA, Small Interfering/genetics , RNA-Binding Proteins/metabolism
5.
Am J Hum Genet ; 102(3): 494-504, 2018 03 01.
Article in English | MEDLINE | ID: mdl-29478781

ABSTRACT

ATP synthase, H+ transporting, mitochondrial F1 complex, δ subunit (ATP5F1D; formerly ATP5D) is a subunit of mitochondrial ATP synthase and plays an important role in coupling proton translocation and ATP production. Here, we describe two individuals, each with homozygous missense variants in ATP5F1D, who presented with episodic lethargy, metabolic acidosis, 3-methylglutaconic aciduria, and hyperammonemia. Subject 1, homozygous for c.245C>T (p.Pro82Leu), presented with recurrent metabolic decompensation starting in the neonatal period, and subject 2, homozygous for c.317T>G (p.Val106Gly), presented with acute encephalopathy in childhood. Cultured skin fibroblasts from these individuals exhibited impaired assembly of F1FO ATP synthase and subsequent reduced complex V activity. Cells from subject 1 also exhibited a significant decrease in mitochondrial cristae. Knockdown of Drosophila ATPsynδ, the ATP5F1D homolog, in developing eyes and brains caused a near complete loss of the fly head, a phenotype that was fully rescued by wild-type human ATP5F1D. In contrast, expression of the ATP5F1D c.245C>T and c.317T>G variants rescued the head-size phenotype but recapitulated the eye and antennae defects seen in other genetic models of mitochondrial oxidative phosphorylation deficiency. Our data establish c.245C>T (p.Pro82Leu) and c.317T>G (p.Val106Gly) in ATP5F1D as pathogenic variants leading to a Mendelian mitochondrial disease featuring episodic metabolic decompensation.


Subject(s)
Alleles , Metabolic Diseases/genetics , Mitochondrial Proton-Translocating ATPases/genetics , Mutation/genetics , Protein Subunits/genetics , Amino Acid Sequence , Base Sequence , Child , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Loss of Function Mutation/genetics , Male , Mitochondria/metabolism , Mitochondria/ultrastructure , Mitochondrial Proton-Translocating ATPases/chemistry , Protein Subunits/chemistry
6.
Nucleic Acids Res ; 46(D1): D794-D801, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29126249

ABSTRACT

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.


Subject(s)
DNA/genetics , Databases, Genetic , Gene Components , Genomics , High-Throughput Nucleotide Sequencing , Metadata , Animals , Caenorhabditis elegans/genetics , Data Display , Datasets as Topic , Drosophila melanogaster/genetics , Forecasting , Genome, Human , Humans , Mice/genetics , User-Computer Interface
7.
Nucleic Acids Res ; 44(D1): D726-32, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26527727

ABSTRACT

The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.


Subject(s)
Databases, Genetic , Genome, Human , Genomics , Animals , DNA/metabolism , Genes , Humans , Mice , Proteins/metabolism , RNA/metabolism
8.
Genes Dev ; 23(21): 2461-77, 2009 Nov 01.
Article in English | MEDLINE | ID: mdl-19884253

ABSTRACT

A great many cell types are necessary for the myriad capabilities of complex, multicellular organisms. One interesting aspect of this diversity of cell type is that many cells in diploid organisms are polyploid. This is called endopolyploidy and arises from cell cycles that are often characterized as "variant," but in fact are widespread throughout nature. Endopolyploidy is essential for normal development and physiology in many different organisms. Here we review how both plants and animals use variations of the cell cycle, termed collectively as endoreplication, resulting in polyploid cells that support specific aspects of development. In addition, we discuss briefly how endoreplication occurs in response to certain physiological stresses, and how it may contribute to the development of cancer. Finally, we describe the molecular mechanisms that support the onset and progression of endoreplication.


Subject(s)
Cell Cycle/physiology , DNA Replication/physiology , Polyploidy , Animals , Cell Cycle/genetics , Cell Differentiation , Cell Proliferation , DNA Replication/genetics , Humans , Neoplasms/pathology , Plant Cells , Plant Development , Stress, Physiological/physiology
9.
PLoS Genet ; 8(8): e1002831, 2012.
Article in English | MEDLINE | ID: mdl-22916021

ABSTRACT

Precise control of cell cycle regulators is critical for normal development and tissue homeostasis. E2F transcription factors are activated during G1 to drive the G1-S transition and are then inhibited during S phase by a variety of mechanisms. Here, we genetically manipulate the single Drosophila activator E2F (E2f1) to explore the developmental requirement for S phase-coupled E2F down-regulation. Expression of an E2f1 mutant that is not destroyed during S phase drives cell cycle progression and causes apoptosis. Interestingly, this apoptosis is not exclusively the result of inappropriate cell cycle progression, because a stable E2f1 mutant that cannot function as a transcription factor or drive cell cycle progression also triggers apoptosis. This observation suggests that the inappropriate presence of E2f1 protein during S phase can trigger apoptosis by mechanisms that are independent of E2F acting directly at target genes. The ability of S phase-stabilized E2f1 to trigger apoptosis requires an interaction between E2f1 and the Drosophila pRb homolog, Rbf1, and involves induction of the pro-apoptotic gene, hid. Simultaneously blocking E2f1 destruction during S phase and inhibiting the induction of apoptosis results in tissue overgrowth and lethality. We propose that inappropriate accumulation of E2f1 protein during S phase triggers the elimination of potentially hyperplastic cells via apoptosis in order to ensure normal development of rapidly proliferating tissues.


Subject(s)
Drosophila melanogaster/metabolism , E2F1 Transcription Factor/metabolism , Gene Expression Regulation, Developmental , Homeostasis/genetics , Larva/metabolism , Animals , Apoptosis/genetics , Cell Proliferation , DNA/biosynthesis , Drosophila Proteins/genetics , Drosophila Proteins/metabolism , Drosophila melanogaster/genetics , E2F1 Transcription Factor/genetics , G1 Phase/genetics , Larva/genetics , Mutation , Neuropeptides/genetics , Neuropeptides/metabolism , Proteolysis , Retinoblastoma Protein , S Phase/genetics , Transcription Factors/genetics , Transcription Factors/metabolism
10.
Microbiol Spectr ; : e0351423, 2024 Feb 09.
Article in English | MEDLINE | ID: mdl-38334378

ABSTRACT

Microbiomes have gained significant attention in ecological research, owing to their diverse interactions and essential roles within different organismal ecosystems. Microorganisms, such as bacteria, archaea, and viruses, have profound impact on host health, influencing digestion, metabolism, immune function, tissue development, and behavior. This study investigates the microbiome diversity and function of Kellet's whelk (Kelletia kelletii) perivitelline fluid (PVF), which sustains thousands of developing K. kelletii embryos within a polysaccharide and protein matrix. Our core microbiome analysis reveals a diverse range of bacteria, with the Roseobacter genus being the most abundant. Additionally, genes related to host-microbe interactions, symbiosis, and quorum sensing were detected, indicating a potential symbiotic relationship between the microbiome and Kellet's whelk embryos. Furthermore, the microbiome exhibits gene expression related to antibiotic biosynthesis, suggesting a defensive role against pathogenic bacteria and potential discovery of novel antibiotics. Overall, this study sheds light on the microbiome's role in Kellet's whelk development, emphasizing the significance of host-microbe interactions in vulnerable life history stages. To our knowledge, ours is the first study to use 16S sequencing coupled with RNA sequencing (RNA-seq) to profile the microbiome of an invertebrate PVF.IMPORTANCEThis study provides novel insight to an encapsulated system with strong evidence of symbiosis between the microbial inhabitants and developing host embryos. The Kellet's whelk perivitelline fluid (PVF) contains microbial organisms of interest that may be providing symbiotic functions and potential antimicrobial properties during this vulnerable life history stage. This study, the first to utilize a comprehensive approach to investigating Kellet's whelk PVF microbiome, couples 16S rRNA gene long-read sequencing with RNA-seq. This research contributes to and expands our knowledge on the roles of beneficial host-associated microbes.

11.
PeerJ ; 11: e16510, 2023.
Article in English | MEDLINE | ID: mdl-38077446

ABSTRACT

Next-generation sequencing technologies, such as Nanopore MinION, Illumina Hiseq and Novaseq, and PacBio Sequel II, hold immense potential for advancing genomic research on non-model organisms, including the vast majority of marine species. However, application of these technologies to marine invertebrate species is often impeded by challenges in extracting and purifying their genomic DNA due to high polysaccharide content and other secondary metabolites. In this study, we help resolve this issue by developing and testing DNA extraction protocols for Kellet's whelk (Kelletia kelletii), a subtidal gastropod with ecological and commercial importance, by comparing four DNA extraction methods commonly used in marine invertebrate studies. In our comparison of extraction methods, the Salting Out protocol was the least expensive, produced the highest DNA yields, produced consistent high DNA quality, and had low toxicity. We validated the protocol using an independent set of tissue samples, then applied it to extract high-molecular-weight (HMW) DNA from over three thousand Kellet's whelk tissue samples. The protocol demonstrated scalability and, with added clean-up, suitability for RAD-seq, GT-seq, as well as whole genome sequencing using both long read (ONT MinION) and short read (Illumina NovaSeq) sequencing platforms. Our findings offer a robust and versatile DNA extraction and clean-up protocol for supporting genomic research on non-model marine organisms, to help mediate the under-representation of invertebrates in genomic studies.


Subject(s)
Gastropoda , Animals , Gastropoda/genetics , Genome/genetics , Genomics , DNA/genetics , Sequence Analysis, DNA/methods
12.
G3 (Bethesda) ; 13(10)2023 09 30.
Article in English | MEDLINE | ID: mdl-37555394

ABSTRACT

Ascidians have the potential to reveal fundamental biological insights related to coloniality, regeneration, immune function, and the evolution of these traits. This study implements a hybrid assembly technique to produce a genome assembly and annotation for the botryllid ascidian, Botrylloides violaceus. A hybrid genome assembly was produced using Illumina, Inc. short and Oxford Nanopore Technologies long-read sequencing technologies. The resulting assembly is comprised of 831 contigs, has a total length of 121 Mbp, N50 of 1 Mbp, and a BUSCO score of 96.1%. Genome annotation identified 13 K protein-coding genes. Comparative genomic analysis with other tunicates reveals patterns of conservation and divergence within orthologous gene families even among closely related species. Characterization of the Wnt gene family, encoding signaling ligands involved in development and regeneration, reveals conserved patterns of subfamily presence and gene copy number among botryllids. This supports the use of genomic data from nonmodel organisms in the investigation of biological phenomena.


Subject(s)
Urochordata , Animals , Urochordata/genetics , Genomics/methods , Genome , Gene Dosage , High-Throughput Nucleotide Sequencing/methods , Molecular Sequence Annotation
13.
bioRxiv ; 2023 Apr 06.
Article in English | MEDLINE | ID: mdl-37066421

ABSTRACT

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

14.
Res Sq ; 2023 Jul 19.
Article in English | MEDLINE | ID: mdl-37503119

ABSTRACT

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

15.
Comput Biol Med ; 138: 104850, 2021 11.
Article in English | MEDLINE | ID: mdl-34536702

ABSTRACT

Deep learning neural networks have improved performance in many cancer informatics problems, including breast cancer subtype classification. However, many networks experience underspecificationwheremultiplecombinationsofparametersachievesimilarperformance, bothin training and validation. Additionally, certain parameter combinations may perform poorly when the test distribution differs from the training distribution. Embedding prior knowledge from the literature may address this issue by boosting predictive models that provide crucial, in-depth information about a given disease. Breast cancer research provides a wealth of such knowledge, particularly in the form of subtype biomarkers and genetic signatures. In this study, we draw on past research on breast cancer subtype biomarkers, label propagation, and neural graph machines to present a novel methodology for embedding knowledge into machine learning systems. We embed prior knowledge into the loss function in the form of inter-subject distances derived from a well-known published breast cancer signature. Our results show that this methodology reduces predictor variability on state-of-the-art deep learning architectures and increases predictor consistency leading to improved interpretation. We find that pathway enrichment analysis is more consistent after embedding knowledge. This novel method applies to a broad range of existing studies and predictive models. Our method moves the traditional synthesis of predictive models from an arbitrary assignment of weights to genes toward a more biologically meaningful approach of incorporating knowledge.


Subject(s)
Breast Neoplasms , Deep Learning , Breast Neoplasms/genetics , Female , Humans , Machine Learning , Neural Networks, Computer
16.
Nat Med ; 25(6): 911-919, 2019 06.
Article in English | MEDLINE | ID: mdl-31160820

ABSTRACT

It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene1. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches2-5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases6-8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders6,9, and cultured fibroblasts from patients with mitochondrial disorders7. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution.


Subject(s)
Rare Diseases/genetics , Acid Ceramidase/genetics , Case-Control Studies , Child , Child, Preschool , Cohort Studies , Female , Genetic Variation , Humans , Male , Models, Genetic , Mutation , Oxidoreductases Acting on CH-CH Group Donors/genetics , Potassium Channels/genetics , RNA/blood , RNA/genetics , RNA Splicing/genetics , Rare Diseases/blood , Sequence Analysis, RNA , Exome Sequencing
17.
Med Educ ; 42(3): 286-93, 2008 Mar.
Article in English | MEDLINE | ID: mdl-18275416

ABSTRACT

OBJECTIVE: To determine whether graduate and non-graduate entrants to medical school differ in their views on the first year spent in medical practice as a pre-registration house officer. METHODS: We carried out postal questionnaire surveys of medical qualifiers of 1999, 2000 and 2002 from all UK medical schools, 1 year after qualification. The timing of the study slightly pre-dates the recent major expansion in graduate entry fast-track courses. RESULTS: Differences between graduate and non-graduate entrants were few and, even when statistically significant, were small in scale. Graduate entrants viewed their working hours, pay and living conditions at work, such as hospital accommodation and food, a little less favourably than did non-graduate entrants. Graduate entrants were also less satisfied than non-graduates with time available for family, social and recreational activities. However, graduate entrants were more likely than non-graduate entrants to feel positive about their future career prospects. There were no differences between graduate and non-graduate entrants in whether they felt they had been well prepared by their medical schools for the jobs they undertook as house officers. Levels of job satisfaction expressed by graduate and non-graduate entrants were similar, as were their responses to most other statements about attitudes to clinical work. CONCLUSIONS: 'Quality of life' issues, a sense of being fairly rewarded, and expectations about one's physical working environment seem a little more important to graduate than to non-graduate entrants. Apart from these, the findings suggest that graduate status, at entry to medical school, has no appreciable influence on attitudes to the work of a junior hospital doctor.


Subject(s)
Attitude of Health Personnel , Career Choice , Job Satisfaction , Medical Staff, Hospital/psychology , Students, Medical/psychology , Quality of Life , Schools, Medical , Surveys and Questionnaires
19.
PLoS One ; 12(4): e0175310, 2017.
Article in English | MEDLINE | ID: mdl-28403240

ABSTRACT

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.


Subject(s)
Databases, Genetic , Genomics/methods , Metadata , Software , Animals , DNA/genetics , Genome , Humans , Mice
20.
Article in English | MEDLINE | ID: mdl-26980513

ABSTRACT

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org.


Subject(s)
Computational Biology/methods , DNA/genetics , Databases, Genetic , Algorithms , Animals , Caenorhabditis elegans , Computational Biology/standards , Data Collection , Drosophila melanogaster , High-Throughput Nucleotide Sequencing , Humans , Mice , Nucleic Acids/genetics , Quality Control , Reproducibility of Results , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL