Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Nature ; 583(7818): 744-751, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32728240

RESUMO

The Encyclopedia of DNA Elements (ENCODE) project has established a genomic resource for mammalian development, profiling a diverse panel of mouse tissues at 8 developmental stages from 10.5 days after conception until birth, including transcriptomes, methylomes and chromatin states. Here we systematically examined the state and accessibility of chromatin in the developing mouse fetus. In total we performed 1,128 chromatin immunoprecipitation with sequencing (ChIP-seq) assays for histone modifications and 132 assay for transposase-accessible chromatin using sequencing (ATAC-seq) assays for chromatin accessibility across 72 distinct tissue-stages. We used integrative analysis to develop a unified set of chromatin state annotations, infer the identities of dynamic enhancers and key transcriptional regulators, and characterize the relationship between chromatin state and accessibility during developmental gene regulation. We also leveraged these data to link enhancers to putative target genes and demonstrate tissue-specific enrichments of sequence variants associated with disease in humans. The mouse ENCODE data sets provide a compendium of resources for biomedical researchers and achieve, to our knowledge, the most comprehensive view of chromatin dynamics during mammalian fetal development to date.


Assuntos
Cromatina/genética , Cromatina/metabolismo , Conjuntos de Dados como Assunto , Desenvolvimento Fetal/genética , Histonas/metabolismo , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Animais , Cromatina/química , Sequenciamento de Cromatina por Imunoprecipitação , Doença/genética , Elementos Facilitadores Genéticos/genética , Feminino , Regulação da Expressão Gênica no Desenvolvimento/genética , Variação Genética , Histonas/química , Humanos , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Especificidade de Órgãos/genética , Reprodutibilidade dos Testes , Transposases/metabolismo
3.
Mol Cell ; 61(6): 903-13, 2016 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-26990993

RESUMO

Transcriptome-wide maps of RNA binding protein (RBP)-RNA interactions by immunoprecipitation (IP)-based methods such as RNA IP (RIP) and crosslinking and IP (CLIP) are key starting points for evaluating the molecular roles of the thousands of human RBPs. A significant bottleneck to the application of these methods in diverse cell lines, tissues, and developmental stages is the availability of validated IP-quality antibodies. Using IP followed by immunoblot assays, we have developed a validated repository of 438 commercially available antibodies that interrogate 365 unique RBPs. In parallel, 362 short-hairpin RNA (shRNA) constructs against 276 unique RBPs were also used to confirm specificity of these antibodies. These antibodies can characterize subcellular RBP localization. With the burgeoning interest in the roles of RBPs in cancer, neurobiology, and development, these resources are invaluable to the broad scientific community. Detailed information about these resources is publicly available at the ENCODE portal (https://www.encodeproject.org/).


Assuntos
Bases de Dados Genéticas , Proteínas de Ligação a RNA/genética , RNA/metabolismo , Transcriptoma/genética , Sítios de Ligação , Humanos , Ligação Proteica , RNA/genética , RNA Interferente Pequeno/classificação , RNA Interferente Pequeno/genética , Proteínas de Ligação a RNA/metabolismo
5.
Am J Hum Genet ; 102(3): 494-504, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-29478781

RESUMO

ATP synthase, H+ transporting, mitochondrial F1 complex, δ subunit (ATP5F1D; formerly ATP5D) is a subunit of mitochondrial ATP synthase and plays an important role in coupling proton translocation and ATP production. Here, we describe two individuals, each with homozygous missense variants in ATP5F1D, who presented with episodic lethargy, metabolic acidosis, 3-methylglutaconic aciduria, and hyperammonemia. Subject 1, homozygous for c.245C>T (p.Pro82Leu), presented with recurrent metabolic decompensation starting in the neonatal period, and subject 2, homozygous for c.317T>G (p.Val106Gly), presented with acute encephalopathy in childhood. Cultured skin fibroblasts from these individuals exhibited impaired assembly of F1FO ATP synthase and subsequent reduced complex V activity. Cells from subject 1 also exhibited a significant decrease in mitochondrial cristae. Knockdown of Drosophila ATPsynδ, the ATP5F1D homolog, in developing eyes and brains caused a near complete loss of the fly head, a phenotype that was fully rescued by wild-type human ATP5F1D. In contrast, expression of the ATP5F1D c.245C>T and c.317T>G variants rescued the head-size phenotype but recapitulated the eye and antennae defects seen in other genetic models of mitochondrial oxidative phosphorylation deficiency. Our data establish c.245C>T (p.Pro82Leu) and c.317T>G (p.Val106Gly) in ATP5F1D as pathogenic variants leading to a Mendelian mitochondrial disease featuring episodic metabolic decompensation.


Assuntos
Alelos , Doenças Metabólicas/genética , ATPases Mitocondriais Próton-Translocadoras/genética , Mutação/genética , Subunidades Proteicas/genética , Sequência de Aminoácidos , Sequência de Bases , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Recém-Nascido , Mutação com Perda de Função/genética , Masculino , Mitocôndrias/metabolismo , Mitocôndrias/ultraestrutura , ATPases Mitocondriais Próton-Translocadoras/química , Subunidades Proteicas/química
6.
Nucleic Acids Res ; 46(D1): D794-D801, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29126249

RESUMO

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.


Assuntos
DNA/genética , Bases de Dados Genéticas , Componentes do Gene , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Metadados , Animais , Caenorhabditis elegans/genética , Apresentação de Dados , Conjuntos de Dados como Assunto , Drosophila melanogaster/genética , Previsões , Genoma Humano , Humanos , Camundongos/genética , Interface Usuário-Computador
7.
Nucleic Acids Res ; 44(D1): D726-32, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26527727

RESUMO

The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.


Assuntos
Bases de Dados Genéticas , Genoma Humano , Genômica , Animais , DNA/metabolismo , Genes , Humanos , Camundongos , Proteínas/metabolismo , RNA/metabolismo
8.
Genes Dev ; 23(21): 2461-77, 2009 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-19884253

RESUMO

A great many cell types are necessary for the myriad capabilities of complex, multicellular organisms. One interesting aspect of this diversity of cell type is that many cells in diploid organisms are polyploid. This is called endopolyploidy and arises from cell cycles that are often characterized as "variant," but in fact are widespread throughout nature. Endopolyploidy is essential for normal development and physiology in many different organisms. Here we review how both plants and animals use variations of the cell cycle, termed collectively as endoreplication, resulting in polyploid cells that support specific aspects of development. In addition, we discuss briefly how endoreplication occurs in response to certain physiological stresses, and how it may contribute to the development of cancer. Finally, we describe the molecular mechanisms that support the onset and progression of endoreplication.


Assuntos
Ciclo Celular/fisiologia , Replicação do DNA/fisiologia , Poliploidia , Animais , Ciclo Celular/genética , Diferenciação Celular , Proliferação de Células , Replicação do DNA/genética , Humanos , Neoplasias/patologia , Células Vegetais , Desenvolvimento Vegetal , Estresse Fisiológico/fisiologia
9.
PLoS Genet ; 8(8): e1002831, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22916021

RESUMO

Precise control of cell cycle regulators is critical for normal development and tissue homeostasis. E2F transcription factors are activated during G1 to drive the G1-S transition and are then inhibited during S phase by a variety of mechanisms. Here, we genetically manipulate the single Drosophila activator E2F (E2f1) to explore the developmental requirement for S phase-coupled E2F down-regulation. Expression of an E2f1 mutant that is not destroyed during S phase drives cell cycle progression and causes apoptosis. Interestingly, this apoptosis is not exclusively the result of inappropriate cell cycle progression, because a stable E2f1 mutant that cannot function as a transcription factor or drive cell cycle progression also triggers apoptosis. This observation suggests that the inappropriate presence of E2f1 protein during S phase can trigger apoptosis by mechanisms that are independent of E2F acting directly at target genes. The ability of S phase-stabilized E2f1 to trigger apoptosis requires an interaction between E2f1 and the Drosophila pRb homolog, Rbf1, and involves induction of the pro-apoptotic gene, hid. Simultaneously blocking E2f1 destruction during S phase and inhibiting the induction of apoptosis results in tissue overgrowth and lethality. We propose that inappropriate accumulation of E2f1 protein during S phase triggers the elimination of potentially hyperplastic cells via apoptosis in order to ensure normal development of rapidly proliferating tissues.


Assuntos
Drosophila melanogaster/metabolismo , Fator de Transcrição E2F1/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Homeostase/genética , Larva/metabolismo , Animais , Apoptose/genética , Proliferação de Células , DNA/biossíntese , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Fator de Transcrição E2F1/genética , Fase G1/genética , Larva/genética , Mutação , Neuropeptídeos/genética , Neuropeptídeos/metabolismo , Proteólise , Proteína do Retinoblastoma , Fase S/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
10.
Microbiol Spectr ; : e0351423, 2024 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-38334378

RESUMO

Microbiomes have gained significant attention in ecological research, owing to their diverse interactions and essential roles within different organismal ecosystems. Microorganisms, such as bacteria, archaea, and viruses, have profound impact on host health, influencing digestion, metabolism, immune function, tissue development, and behavior. This study investigates the microbiome diversity and function of Kellet's whelk (Kelletia kelletii) perivitelline fluid (PVF), which sustains thousands of developing K. kelletii embryos within a polysaccharide and protein matrix. Our core microbiome analysis reveals a diverse range of bacteria, with the Roseobacter genus being the most abundant. Additionally, genes related to host-microbe interactions, symbiosis, and quorum sensing were detected, indicating a potential symbiotic relationship between the microbiome and Kellet's whelk embryos. Furthermore, the microbiome exhibits gene expression related to antibiotic biosynthesis, suggesting a defensive role against pathogenic bacteria and potential discovery of novel antibiotics. Overall, this study sheds light on the microbiome's role in Kellet's whelk development, emphasizing the significance of host-microbe interactions in vulnerable life history stages. To our knowledge, ours is the first study to use 16S sequencing coupled with RNA sequencing (RNA-seq) to profile the microbiome of an invertebrate PVF.IMPORTANCEThis study provides novel insight to an encapsulated system with strong evidence of symbiosis between the microbial inhabitants and developing host embryos. The Kellet's whelk perivitelline fluid (PVF) contains microbial organisms of interest that may be providing symbiotic functions and potential antimicrobial properties during this vulnerable life history stage. This study, the first to utilize a comprehensive approach to investigating Kellet's whelk PVF microbiome, couples 16S rRNA gene long-read sequencing with RNA-seq. This research contributes to and expands our knowledge on the roles of beneficial host-associated microbes.

11.
G3 (Bethesda) ; 13(10)2023 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-37555394

RESUMO

Ascidians have the potential to reveal fundamental biological insights related to coloniality, regeneration, immune function, and the evolution of these traits. This study implements a hybrid assembly technique to produce a genome assembly and annotation for the botryllid ascidian, Botrylloides violaceus. A hybrid genome assembly was produced using Illumina, Inc. short and Oxford Nanopore Technologies long-read sequencing technologies. The resulting assembly is comprised of 831 contigs, has a total length of 121 Mbp, N50 of 1 Mbp, and a BUSCO score of 96.1%. Genome annotation identified 13 K protein-coding genes. Comparative genomic analysis with other tunicates reveals patterns of conservation and divergence within orthologous gene families even among closely related species. Characterization of the Wnt gene family, encoding signaling ligands involved in development and regeneration, reveals conserved patterns of subfamily presence and gene copy number among botryllids. This supports the use of genomic data from nonmodel organisms in the investigation of biological phenomena.


Assuntos
Urocordados , Animais , Urocordados/genética , Genômica/métodos , Genoma , Dosagem de Genes , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular
12.
PeerJ ; 11: e16510, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38077446

RESUMO

Next-generation sequencing technologies, such as Nanopore MinION, Illumina Hiseq and Novaseq, and PacBio Sequel II, hold immense potential for advancing genomic research on non-model organisms, including the vast majority of marine species. However, application of these technologies to marine invertebrate species is often impeded by challenges in extracting and purifying their genomic DNA due to high polysaccharide content and other secondary metabolites. In this study, we help resolve this issue by developing and testing DNA extraction protocols for Kellet's whelk (Kelletia kelletii), a subtidal gastropod with ecological and commercial importance, by comparing four DNA extraction methods commonly used in marine invertebrate studies. In our comparison of extraction methods, the Salting Out protocol was the least expensive, produced the highest DNA yields, produced consistent high DNA quality, and had low toxicity. We validated the protocol using an independent set of tissue samples, then applied it to extract high-molecular-weight (HMW) DNA from over three thousand Kellet's whelk tissue samples. The protocol demonstrated scalability and, with added clean-up, suitability for RAD-seq, GT-seq, as well as whole genome sequencing using both long read (ONT MinION) and short read (Illumina NovaSeq) sequencing platforms. Our findings offer a robust and versatile DNA extraction and clean-up protocol for supporting genomic research on non-model marine organisms, to help mediate the under-representation of invertebrates in genomic studies.


Assuntos
Gastrópodes , Animais , Gastrópodes/genética , Genoma/genética , Genômica , DNA/genética , Análise de Sequência de DNA/métodos
13.
Res Sq ; 2023 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-37503119

RESUMO

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

14.
bioRxiv ; 2023 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-37066421

RESUMO

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

15.
Comput Biol Med ; 138: 104850, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34536702

RESUMO

Deep learning neural networks have improved performance in many cancer informatics problems, including breast cancer subtype classification. However, many networks experience underspecificationwheremultiplecombinationsofparametersachievesimilarperformance, bothin training and validation. Additionally, certain parameter combinations may perform poorly when the test distribution differs from the training distribution. Embedding prior knowledge from the literature may address this issue by boosting predictive models that provide crucial, in-depth information about a given disease. Breast cancer research provides a wealth of such knowledge, particularly in the form of subtype biomarkers and genetic signatures. In this study, we draw on past research on breast cancer subtype biomarkers, label propagation, and neural graph machines to present a novel methodology for embedding knowledge into machine learning systems. We embed prior knowledge into the loss function in the form of inter-subject distances derived from a well-known published breast cancer signature. Our results show that this methodology reduces predictor variability on state-of-the-art deep learning architectures and increases predictor consistency leading to improved interpretation. We find that pathway enrichment analysis is more consistent after embedding knowledge. This novel method applies to a broad range of existing studies and predictive models. Our method moves the traditional synthesis of predictive models from an arbitrary assignment of weights to genes toward a more biologically meaningful approach of incorporating knowledge.


Assuntos
Neoplasias da Mama , Aprendizado Profundo , Neoplasias da Mama/genética , Feminino , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
16.
Nat Med ; 25(6): 911-919, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31160820

RESUMO

It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene1. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches2-5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases6-8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders6,9, and cultured fibroblasts from patients with mitochondrial disorders7. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution.


Assuntos
Doenças Raras/genética , Ceramidase Ácida/genética , Estudos de Casos e Controles , Criança , Pré-Escolar , Estudos de Coortes , Feminino , Variação Genética , Humanos , Masculino , Modelos Genéticos , Mutação , Oxirredutases atuantes sobre Doadores de Grupo CH-CH/genética , Canais de Potássio/genética , RNA/sangue , RNA/genética , Splicing de RNA/genética , Doenças Raras/sangue , Análise de Sequência de RNA , Sequenciamento do Exoma
17.
Med Educ ; 42(3): 286-93, 2008 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-18275416

RESUMO

OBJECTIVE: To determine whether graduate and non-graduate entrants to medical school differ in their views on the first year spent in medical practice as a pre-registration house officer. METHODS: We carried out postal questionnaire surveys of medical qualifiers of 1999, 2000 and 2002 from all UK medical schools, 1 year after qualification. The timing of the study slightly pre-dates the recent major expansion in graduate entry fast-track courses. RESULTS: Differences between graduate and non-graduate entrants were few and, even when statistically significant, were small in scale. Graduate entrants viewed their working hours, pay and living conditions at work, such as hospital accommodation and food, a little less favourably than did non-graduate entrants. Graduate entrants were also less satisfied than non-graduates with time available for family, social and recreational activities. However, graduate entrants were more likely than non-graduate entrants to feel positive about their future career prospects. There were no differences between graduate and non-graduate entrants in whether they felt they had been well prepared by their medical schools for the jobs they undertook as house officers. Levels of job satisfaction expressed by graduate and non-graduate entrants were similar, as were their responses to most other statements about attitudes to clinical work. CONCLUSIONS: 'Quality of life' issues, a sense of being fairly rewarded, and expectations about one's physical working environment seem a little more important to graduate than to non-graduate entrants. Apart from these, the findings suggest that graduate status, at entry to medical school, has no appreciable influence on attitudes to the work of a junior hospital doctor.


Assuntos
Atitude do Pessoal de Saúde , Escolha da Profissão , Satisfação no Emprego , Corpo Clínico Hospitalar/psicologia , Estudantes de Medicina/psicologia , Qualidade de Vida , Faculdades de Medicina , Inquéritos e Questionários
19.
PLoS One ; 12(4): e0175310, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28403240

RESUMO

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Metadados , Software , Animais , DNA/genética , Genoma , Humanos , Camundongos
20.
Artigo em Inglês | MEDLINE | ID: mdl-26980513

RESUMO

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org.


Assuntos
Biologia Computacional/métodos , DNA/genética , Bases de Dados Genéticas , Algoritmos , Animais , Caenorhabditis elegans , Biologia Computacional/normas , Coleta de Dados , Drosophila melanogaster , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Ácidos Nucleicos/genética , Controle de Qualidade , Reprodutibilidade dos Testes , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA