Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Res Sq ; 2023 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-37503119

RESUMEN

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

2.
bioRxiv ; 2023 Apr 06.
Artículo en Inglés | MEDLINE | ID: mdl-37066421

RESUMEN

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

3.
Sci Rep ; 12(1): 8458, 2022 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-35589867

RESUMEN

A cell-free DNA (cfDNA) assay would be a promising approach to early cancer diagnosis, especially for patients with dense tissues. Consistent cfDNA signatures have been observed for many carcinogens. Recently, investigations of cfDNA as a reliable early detection bioassay have presented a powerful opportunity for detecting dense tissue screening complications early. We performed a prospective study to evaluate the potential of characterizing cfDNA as a central element in the early detection of dense tissue breast cancer (BC). Plasma samples were collected from 32 consenting subjects with dense tissue and positive mammograms, 20 with positive biopsies and 12 with negative biopsies. After screening and before biopsy, cfDNA was extracted, and whole-genome next-generation sequencing (NGS) was performed on all samples. Copy number alteration (CNA) and single nucleotide polymorphism (SNP)/insertion/deletion (Indel) analyses were performed to characterize cfDNA. In the positive-positive subjects (cases), a total of 5 CNAs overlapped with 5 previously reported BC-related oncogenes (KSR2, MAP2K4, MSI2, CANT1 and MSI2). In addition, 1 SNP was detected in KMT2C, a BC oncogene, and 9 others were detected in or near 10 genes (SERAC1, DAGLB, MACF1, NVL, FBXW4, FANK1, KCTD4, CAVIN1; ATP6V0A1 and ZBTB20-AS1) previously associated with non-BC cancers. For the positive-negative subjects (screening), 3 CNAs were detected in BC genes (ACVR2A, CUL3 and PIK3R1), and 5 SNPs were identified in 6 non-BC cancer genes (SNIP1, TBC1D10B, PANK1, PRKCA and RUNX2; SUPT3H). This study presents evidence of the potential of using cfDNA somatic variants as dense tissue BC biomarkers from a noninvasive liquid bioassay for early cancer detection.


Asunto(s)
Neoplasias de la Mama , Ácidos Nucleicos Libres de Células , Proteínas Adaptadoras Transductoras de Señales/genética , Bioensayo , Biomarcadores de Tumor/genética , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/genética , Ácidos Nucleicos Libres de Células/genética , Detección Precoz del Cáncer , Femenino , Humanos , Mutación , Estudios Prospectivos , Proteínas de Unión al ARN/genética
4.
Sci Rep ; 11(1): 6078, 2021 03 23.
Artículo en Inglés | MEDLINE | ID: mdl-33758256

RESUMEN

As a means to understand human neuropsychiatric disorders from human brain samples, we compared the transcription patterns and histological features of postmortem brain to fresh human neocortex isolated immediately following surgical removal. Compared to a number of neuropsychiatric disease-associated postmortem transcriptomes, the fresh human brain transcriptome had an entirely unique transcriptional pattern. To understand this difference, we measured genome-wide transcription as a function of time after fresh tissue removal to mimic the postmortem interval. Within a few hours, a selective reduction in the number of neuronal activity-dependent transcripts occurred with relative preservation of housekeeping genes commonly used as a reference for RNA normalization. Gene clustering indicated a rapid reduction in neuronal gene expression with a reciprocal time-dependent increase in astroglial and microglial gene expression that continued to increase for at least 24 h after tissue resection. Predicted transcriptional changes were confirmed histologically on the same tissue demonstrating that while neurons were degenerating, glial cells underwent an outgrowth of their processes. The rapid loss of neuronal genes and reciprocal expression of glial genes highlights highly dynamic transcriptional and cellular changes that occur during the postmortem interval. Understanding these time-dependent changes in gene expression in post mortem brain samples is critical for the interpretation of research studies on human brain disorders.


Asunto(s)
Biomarcadores , Encéfalo/metabolismo , Encéfalo/patología , Expresión Génica , Autopsia , Biología Computacional/métodos , Perfilación de la Expresión Génica , Humanos , Inmunohistoquímica , Neuronas/metabolismo , Especificidad de Órganos/genética , Transcriptoma
5.
BMC Cancer ; 19(1): 832, 2019 Aug 23.
Artículo en Inglés | MEDLINE | ID: mdl-31443703

RESUMEN

BACKGROUND: Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer. METHODS: Whole-genome sequencing was performed on cfDNA extracted from plasma samples (N = 546 colorectal cancer and 271 non-cancer controls). Reads aligning to protein-coding gene bodies were extracted, and read counts were normalized. cfDNA tumor fraction was estimated using IchorCNA. Machine learning models were trained using k-fold cross-validation and confounder-based cross-validations to assess generalization performance. RESULTS: In a colorectal cancer cohort heavily weighted towards early-stage cancer (80% stage I/II), we achieved a mean AUC of 0.92 (95% CI 0.91-0.93) with a mean sensitivity of 85% (95% CI 83-86%) at 85% specificity. Sensitivity generally increased with tumor stage and increasing tumor fraction. Stratification by age, sequencing batch, and institution demonstrated the impact of these confounders and provided a more accurate assessment of generalization performance. CONCLUSIONS: A machine learning approach using cfDNA achieved high sensitivity and specificity in a large, predominantly early-stage, colorectal cancer cohort. The possibility of systematic technical and institution-specific biases warrants similar confounder analyses in other studies. Prospective validation of this machine learning method and evaluation of a multi-analyte approach are underway.


Asunto(s)
Biomarcadores de Tumor , ADN Tumoral Circulante , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , Genoma Humano , Genómica , Aprendizaje Automático , Anciano , Anciano de 80 o más Años , Neoplasias Colorrectales/sangre , Biología Computacional/métodos , Femenino , Perfilación de la Expresión Génica , Genómica/métodos , Humanos , Masculino , Persona de Mediana Edad , Estadificación de Neoplasias , Curva ROC , Reproducibilidad de los Resultados , Transcriptoma
6.
Bioinformatics ; 34(16): 2701-2707, 2018 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-29554289

RESUMEN

Motivation: The three-dimensional organization of chromatin plays a critical role in gene regulation and disease. High-throughput chromosome conformation capture experiments such as Hi-C are used to obtain genome-wide maps of three-dimensional chromatin contacts. However, robust estimation of data quality and systematic comparison of these contact maps is challenging due to the multi-scale, hierarchical structure of chromatin contacts and the resulting properties of experimental noise in the data. Measuring concordance of contact maps is important for assessing reproducibility of replicate experiments and for modeling variation between different cellular contexts. Results: We introduce a concordance measure called DIfferences between Smoothed COntact maps (GenomeDISCO) for assessing the similarity of a pair of contact maps obtained from chromosome conformation capture experiments. The key idea is to smooth contact maps using random walks on the contact map graph, before estimating concordance. We use simulated datasets to benchmark GenomeDISCO's sensitivity to different types of noise that affect chromatin contact maps. When applied to a large collection of Hi-C datasets, GenomeDISCO accurately distinguishes biological replicates from samples obtained from different cell types. GenomeDISCO also generalizes to other chromosome conformation capture assays, such as HiChIP. Availability and implementation: Software implementing GenomeDISCO is available at https://github.com/kundajelab/genomedisco. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Cromatina/metabolismo , Biología Computacional/métodos , Programas Informáticos , Línea Celular , Cromatina/ultraestructura , Humanos , Conformación Molecular , Reproducibilidad de los Resultados
7.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-25164755

RESUMEN

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Asunto(s)
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilación de la Expresión Génica , Transcriptoma/genética , Animales , Caenorhabditis elegans/embriología , Caenorhabditis elegans/crecimiento & desarrollo , Cromatina/genética , Análisis por Conglomerados , Drosophila melanogaster/crecimiento & desarrollo , Regulación del Desarrollo de la Expresión Génica/genética , Histonas/metabolismo , Humanos , Larva/genética , Larva/crecimiento & desarrollo , Modelos Genéticos , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas/genética , Pupa/genética , Pupa/crecimiento & desarrollo , ARN no Traducido/genética , Análisis de Secuencia de ARN
8.
Genome Res ; 24(7): 1209-23, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24985915

RESUMEN

Accurate gene model annotation of reference genomes is critical for making them useful. The modENCODE project has improved the D. melanogaster genome annotation by using deep and diverse high-throughput data. Since transcriptional activity that has been evolutionarily conserved is likely to have an advantageous function, we have performed large-scale interspecific comparisons to increase confidence in predicted annotations. To support comparative genomics, we filled in divergence gaps in the Drosophila phylogeny by generating draft genomes for eight new species. For comparative transcriptome analysis, we generated mRNA expression profiles on 81 samples from multiple tissues and developmental stages of 15 Drosophila species, and we performed cap analysis of gene expression in D. melanogaster and D. pseudoobscura. We also describe conservation of four distinct core promoter structures composed of combinations of elements at three positions. Overall, each type of genomic feature shows a characteristic divergence rate relative to neutral models, highlighting the value of multispecies alignment in annotating a target genome that should prove useful in the annotation of other high priority genomes, especially human and other mammalian genomes that are rich in noncoding sequences. We report that the vast majority of elements in the annotation are evolutionarily conserved, indicating that the annotation will be an important springboard for functional genetic testing by the Drosophila community.


Asunto(s)
Biología Computacional/métodos , Drosophila melanogaster/genética , Perfilación de la Expresión Génica , Anotación de Secuencia Molecular , Transcriptoma , Animales , Análisis por Conglomerados , Drosophila melanogaster/clasificación , Evolución Molecular , Exones , Femenino , Genoma de los Insectos , Humanos , Masculino , Motivos de Nucleótidos , Filogenia , Posición Específica de Matrices de Puntuación , Regiones Promotoras Genéticas , Edición de ARN , Sitios de Empalme de ARN , Empalme del ARN , Reproducibilidad de los Resultados , Sitio de Iniciación de la Transcripción
9.
Nat Biotechnol ; 32(4): 341-6, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24633242

RESUMEN

The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in the annotation of genomes. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call Generalized RNA Integration Tool, or GRIT. Applying GRIT to Drosophila melanogaster short-read RNA-seq, cap analysis of gene expression (CAGE) and poly(A)-site-seq data collected for the modENCODE project, we recovered the vast majority of previously annotated transcripts and doubled the total number of transcripts cataloged. We found that 20% of protein coding genes encode multiple protein-localization signals and that, in 20-d-old adult fly heads, genes with multiple polyadenylation sites are more common than genes with alternative splicing or alternative promoters. GRIT demonstrates 30% higher precision and recall than the most widely used transcript assembly tools. GRIT will facilitate the automated generation of high-quality genome annotations without the need for extensive manual annotation.


Asunto(s)
Mapeo Cromosómico/métodos , Genómica/métodos , Anotación de Secuencia Molecular/métodos , ARN/química , ARN/genética , Análisis de Secuencia de ARN/métodos , Animales , Drosophila melanogaster/genética , Genoma de los Insectos/genética , ARN/análisis
10.
Methods ; 68(1): 38-47, 2014 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-24636835

RESUMEN

modENCODE was a 5year NHGRI funded project (2007-2012) to map the function of every base in the genomes of worms and flies characterizing positions of modified histones and other chromatin marks, origins of DNA replication, RNA transcripts and the transcription factor binding sites that control gene expression. Here we describe the Drosophila modENCODE datasets and how best to access and use them for genome wide and individual gene studies.


Asunto(s)
Replicación del ADN/genética , Bases de Datos Genéticas , Biología Evolutiva/métodos , Animales , Cromatina/genética , Minería de Datos , Drosophila melanogaster/genética , Drosophila melanogaster/crecimiento & desarrollo , Genoma de los Insectos
11.
Nature ; 512(7515): 393-9, 2014 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-24670639

RESUMEN

Animal transcriptomes are dynamic, with each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. Here we have identified new genes, transcripts and proteins using poly(A)+ RNA sequencing from Drosophila melanogaster in cultured cell lines, dissected organ systems and under environmental perturbations. We found that a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long non-coding RNAs (lncRNAs), some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized, with this complexity arising from combinatorial usage of promoters, splice sites and polyadenylation sites.


Asunto(s)
Drosophila melanogaster/genética , Perfilación de la Expresión Génica , Transcriptoma/genética , Empalme Alternativo/genética , Animales , Drosophila melanogaster/anatomía & histología , Drosophila melanogaster/citología , Femenino , Masculino , Anotación de Secuencia Molecular , Tejido Nervioso/metabolismo , Especificidad de Órganos , Poli A/genética , Poliadenilación , Regiones Promotoras Genéticas/genética , ARN Largo no Codificante/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Caracteres Sexuales , Estrés Fisiológico/genética
12.
Genome Res ; 21(2): 182-92, 2011 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-21177961

RESUMEN

Core promoters are critical regions for gene regulation in higher eukaryotes. However, the boundaries of promoter regions, the relative rates of initiation at the transcription start sites (TSSs) distributed within them, and the functional significance of promoter architecture remain poorly understood. We produced a high-resolution map of promoters active in the Drosophila melanogaster embryo by integrating data from three independent and complementary methods: 21 million cap analysis of gene expression (CAGE) tags, 1.2 million RNA ligase mediated rapid amplification of cDNA ends (RLM-RACE) reads, and 50,000 cap-trapped expressed sequence tags (ESTs). We defined 12,454 promoters of 8037 genes. Our analysis indicates that, due to non-promoter-associated RNA background signal, previous studies have likely overestimated the number of promoter-associated CAGE clusters by fivefold. We show that TSS distributions form a complex continuum of shapes, and that promoters active in the embryo and adult have highly similar shapes in 95% of cases. This suggests that these distributions are generally determined by static elements such as local DNA sequence and are not modulated by dynamic signals such as histone modifications. Transcription factor binding motifs are differentially enriched as a function of promoter shape, and peaked promoter shape is correlated with both temporal and spatial regulation of gene expression. Our results contribute to the emerging view that core promoters are functionally diverse and control patterning of gene expression in Drosophila and mammals.


Asunto(s)
Biología Computacional , Drosophila melanogaster/genética , Genoma de los Insectos/genética , Regiones Promotoras Genéticas , Regiones no Traducidas 3'/genética , Animales , Mapeo Cromosómico , Drosophila melanogaster/embriología , Etiquetas de Secuencia Expresada , Perfilación de la Expresión Génica , Regulación de la Expresión Génica/genética , Estudio de Asociación del Genoma Completo , Sitio de Iniciación de la Transcripción
13.
Nature ; 471(7339): 473-9, 2011 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-21179090

RESUMEN

Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.


Asunto(s)
Drosophila melanogaster/crecimiento & desarrollo , Drosophila melanogaster/genética , Perfilación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica/genética , Transcripción Genética/genética , Empalme Alternativo/genética , Animales , Secuencia de Bases , Proteínas de Drosophila/genética , Drosophila melanogaster/embriología , Exones/genética , Femenino , Genes de Insecto/genética , Genoma de los Insectos/genética , Masculino , MicroARNs/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Isoformas de Proteínas/genética , Edición de ARN/genética , ARN Mensajero/análisis , ARN Mensajero/genética , ARN Pequeño no Traducido/análisis , ARN Pequeño no Traducido/genética , Análisis de Secuencia , Caracteres Sexuales
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...