RESUMEN
Population-scale single-cell RNA-seq (scRNA-seq) data sets create unique opportunities for quantifying expression variation across individuals at the gene coexpression network level. Estimation of coexpression networks is well established for bulk RNA-seq; however, single-cell measurements pose novel challenges owing to technical limitations and noise levels of this technology. Gene-gene correlation estimates from scRNA-seq tend to be severely biased toward zero for genes with low and sparse expression. Here, we present Dozer to debias gene-gene correlation estimates from scRNA-seq data sets and accurately quantify network-level variation across individuals. Dozer corrects correlation estimates in the general Poisson measurement model and provides a metric to quantify genes measured with high noise. Computational experiments establish that Dozer estimates are robust to mean expression levels of the genes and the sequencing depths of the data sets. Compared with alternatives, Dozer results in fewer false-positive edges in the coexpression networks, yields more accurate estimates of network centrality measures and modules, and improves the faithfulness of networks estimated from separate batches of the data sets. We showcase unique analyses enabled by Dozer in two population-scale scRNA-seq applications. Coexpression network-based centrality analysis of multiple differentiating human induced pluripotent stem cell (iPSC) lines yields biologically coherent gene groups that are associated with iPSC differentiation efficiency. Application with population-scale scRNA-seq of oligodendrocytes from postmortem human tissues of Alzheimer's disease and controls uniquely reveals coexpression modules of innate immune response with distinct coexpression levels between the diagnoses. Dozer represents an important advance in estimating personalized coexpression networks from scRNA-seq data.
Asunto(s)
Perfilación de la Expresión Génica , Células Madre Pluripotentes Inducidas , Humanos , Perfilación de la Expresión Génica/métodos , Análisis de Expresión Génica de una Sola Célula , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodosRESUMEN
The pyruvate kinase M2 isoform (PKM2) is preferentially expressed in cancer cells to regulate anabolic metabolism. Although PKM2 was recently reported to regulate lipid homeostasis, the molecular mechanism remains unclear. Herein, we discovered an ER transmembrane protein 33 (TMEM33) as a downstream effector of PKM2 that regulates activation of SREBPs and lipid metabolism. Loss of PKM2 leads to up-regulation of TMEM33, which recruits RNF5, an E3 ligase, to promote SREBP-cleavage activating protein (SCAP) degradation. TMEM33 is transcriptionally regulated by nuclear factor erythroid 2-like 1 (NRF1), whose cleavage and activation are controlled by PKM2 levels. Total plasma cholesterol levels are elevated by either treatment with PKM2 tetramer-promoting agent TEPP-46 or by global PKM2 knockout in mice, highlighting the essential function of PKM2 in lipid metabolism. Although depletion of PKM2 decreases cancer cell growth, global PKM2 knockout accelerates allografted tumor growth. Together, our findings reveal the cell-autonomous and systemic effects of PKM2 in lipid homeostasis and carcinogenesis, as well as TMEM33 as a bona fide regulator of lipid metabolism.
Asunto(s)
Proteínas Portadoras/metabolismo , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Metabolismo de los Lípidos/fisiología , Proteínas de la Membrana/metabolismo , Hormonas Tiroideas/metabolismo , Animales , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Proteínas Portadoras/genética , Línea Celular Tumoral , Colesterol/sangre , Femenino , Regulación Neoplásica de la Expresión Génica , Homeostasis , Humanos , Péptidos y Proteínas de Señalización Intracelular/genética , Proteínas de la Membrana/genética , Ratones Noqueados , Proteína 1 de Unión a los Elementos Reguladores de Esteroles/metabolismo , Hormonas Tiroideas/genética , Ensayos Antitumor por Modelo de Xenoinjerto , Proteínas de Unión a Hormona TiroideRESUMEN
Mouse knockouts of Cntnap2 show altered neurodevelopmental behavior, deficits in striatal GABAergic signaling, and a genome-wide disruption of an environmentally sensitive DNA methylation modification (5-hydroxymethylcytosine [5hmC]) in the orthologs of a significant number of genes implicated in human neurodevelopmental disorders. We tested adult Cntnap2 heterozygous mice (Cntnap2 +/-; lacking behavioral or neuropathological abnormalities) subjected to a prenatal stress and found that prenatally stressed Cntnap2 +/- female mice show repetitive behaviors and altered sociability, similar to the homozygote phenotype. Genomic profiling revealed disruptions in hippocampal and striatal 5hmC levels that are correlated to altered transcript levels of genes linked to these phenotypes (e.g., Reln, Dst, Trio, and Epha5). Chromatin immunoprecipitation coupled with high-throughput sequencing and hippocampal nuclear lysate pull-down data indicated that 5hmC abundance alters the binding of the transcription factor CLOCK near the promoters of these genes (e.g., Palld, Gigyf1, and Fry), providing a mechanistic role for 5hmC in gene regulation. Together, these data support gene-by-environment hypotheses for the origins of mental illness and provide a means to identify the elusive factors contributing to complex human diseases.
Asunto(s)
Interacción Gen-Ambiente , Trastornos del Neurodesarrollo , 5-Metilcitosina/análogos & derivados , 5-Metilcitosina/metabolismo , Animales , Metilación de ADN , Epigénesis Genética , Femenino , Proteínas de la Membrana/metabolismo , Ratones , Proteínas del Tejido Nervioso/metabolismo , EmbarazoRESUMEN
MOTIVATION: Gene-enhancer interactions are central to transcriptional regulation. Current multi-modal single cell datasets that profile transcriptome and chromatin accessibility simultaneously in a single cell are yielding opportunities to infer gene-enhancer associations in a cell type specific manner. Computational efforts for such multi-modal single cell datasets thus far focused on methods for identification and refinement of cell types and trajectory construction. While initial attempts for inferring gene-enhancer interactions have emerged, these have not been evaluated against benchmark datasets that materialized from bulk genomic experiments. Furthermore, existing approaches are limited to inferring gene-enhancer associations at the level of grouped cells as opposed to individual cells, thereby ignoring regulatory heterogeneity among the cells. RESULTS: We present a new approach, GEEES for "Gene EnhancEr IntEractions from Multi-modal Single Cell Data", for inferring gene-enhancer associations at the single cell level using multi-modal single cell transcriptome and chromatin accessibility data. We evaluated GEEES alongside several multivariate regression-based alternatives we devised and state-of-the-art methods using a large number of benchmark datasets, providing a comprehensive assessment of current approaches. This analysis revealed significant discrepancies between gold-standard interactions and gene-enhancer associations derived from multi-modal single-cell data. Notably, incorporating gene-enhancer distance into the analysis markedly improved performance across all methods, positioning GEEES as a leading approach in this domain. While the overall improvement in performance metrics by GEEES is modest, it provides enhanced cell representation learning which can be leveraged for more effective downstream analysis. Furthermore, our review of existing experimentally driven benchmark datasets uncovers their limited concordance, underscoring the necessity for new high-throughput experiments to validate gene-enhancer interactions inferred from single-cell data. AVAILABILITY: https://github.com/keleslab/GEEES. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMEN
MOTIVATION: The ENCODE project generated a large collection of eCLIP-seq RNA binding protein (RBP) profiling data with accompanying RNA-seq transcriptomes of shRNA knockdown of RBPs. These data could have utility in understanding the functional impact of genetic variants, however their potential has not been fully exploited. We implement INCA (Integrative annotation scores of variants for impact on RBP activities) as a multi-step genetic variant scoring approach that leverages the ENCODE RBP data together with ClinVar and integrates multiple computational approaches to aggregate evidence. RESULTS: INCA evaluates variant impacts on RBP activities by leveraging genotypic differences in cell lines used for eCLIP-seq. We show that INCA provides critical specificity, beyond generic scoring for RBP binding disruption, for candidate variants and their linkage-disequilibrium partners. As a result, it can, on average, augment scoring of 46.2% of the candidate variants beyond generic scoring for RBP binding disruption and aid in variant prioritization for follow-up analysis. AVAILABILITY AND IMPLEMENTATION: INCA is implemented in R and is available at https://github.com/keleslab/INCA.
Asunto(s)
Proteínas de Unión al ARN , Humanos , Proteínas de Unión al ARN/metabolismo , Proteínas de Unión al ARN/genética , Programas Informáticos , Variación Genética , Biología Computacional/métodos , Anotación de Secuencia Molecular/métodosRESUMEN
MOTIVATION: Elucidating functionally similar orthologous regulatory regions for human and model organism genomes is critical for exploiting model organism research and advancing our understanding of results from genome-wide association studies (GWAS). Sequence conservation is the de facto approach for finding orthologous non-coding regions between human and model organism genomes. However, existing methods for mapping non-coding genomic regions across species are challenged by the multi-mapping, low precision, and low mapping rate issues. RESULTS: We develop Adaptive liftOver (AdaLiftOver), a large-scale computational tool for identifying functionally similar orthologous non-coding regions across species. AdaLiftOver builds on the UCSC liftOver framework to extend the query regions and prioritizes the resulting candidate target regions based on the conservation of the epigenomic and the sequence grammar features. Evaluations of AdaLiftOver with multiple case studies, spanning both genomic intervals from epigenome datasets across a wide range of model organisms and GWAS SNPs, yield AdaLiftOver as a versatile method for deriving hard-to-obtain human epigenome datasets as well as reliably identifying orthologous loci for GWAS SNPs. AVAILABILITY AND IMPLEMENTATION: The R package and the data for AdaLiftOver is available from https://github.com/keleslab/AdaLiftOver.
Asunto(s)
Estudio de Asociación del Genoma Completo , Secuencias Reguladoras de Ácidos Nucleicos , Humanos , Genoma , Genómica/métodos , Programas InformáticosRESUMEN
MOTIVATION: With the wide availability of single-cell RNA-seq (scRNA-seq) technology, population-scale scRNA-seq datasets across multiple individuals and time points are emerging. While the initial investigations of these datasets tend to focus on standard analysis of clustering and differential expression, leveraging the power of scRNA-seq data at the personalized dynamic gene co-expression network level has the potential to unlock subject and/or time-specific network-level variation, which is critical for understanding phenotypic differences. Community detection from co-expression networks of multiple time points or conditions has been well-studied; however, none of the existing settings included networks from multiple subjects and multiple time points simultaneously. To address this, we develop Multi-subject Dynamic Community Detection (MuDCoD) for multi-subject community detection in personalized dynamic gene networks from scRNA-seq. MuDCoD builds on the spectral clustering framework and promotes information sharing among the networks of the subjects as well as networks at different time points. It clusters genes in the personalized dynamic gene networks and reveals gene communities that are variable or shared not only across time but also among subjects. RESULTS: Evaluation and benchmarking of MuDCoD against existing approaches reveal that MuDCoD effectively leverages apparent shared signals among networks of the subjects at individual time points, and performs robustly when there is no or little information sharing among the networks. Applications to population-scale scRNA-seq datasets of human-induced pluripotent stem cells during dopaminergic neuron differentiation and CD4+ T cell activation indicate that MuDCoD enables robust inference for identifying time-varying personalized gene modules. Our results illustrate how personalized dynamic community detection can aid in the exploration of subject-specific biological processes that vary across time. AVAILABILITY AND IMPLEMENTATION: MuDCoD is publicly available at https://github.com/bo1929/MuDCoD as a Python package. Implementation includes simulation and real-data experiments together with extensive documentation.
Asunto(s)
Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Análisis por ConglomeradosRESUMEN
Recent advances in consortium-scale genome-wide association studies (GWAS) have highlighted the involvement of common genetic variants in autism spectrum disorder (ASD), but our understanding of their etiologic roles, especially the interplay with rare variants, is incomplete. In this work, we introduce an analytical framework to quantify the transmission disequilibrium of genetically regulated gene expression from parents to offspring. We applied this framework to conduct a transcriptome-wide association study (TWAS) on 7,805 ASD proband-parent trios, and replicated our findings using 35,740 independent samples. We identified 31 associations at the transcriptome-wide significance level. In particular, we identified POU3F2 (p = 2.1E-7), a transcription factor mainly expressed in developmental brain. Gene targets regulated by POU3F2 showed a 2.7-fold enrichment for known ASD genes (p = 2.0E-5) and a 2.7-fold enrichment for loss-of-function de novo mutations in ASD probands (p = 7.1E-5). These results provide a novel connection between rare and common variants, whereby ASD genes affected by very rare mutations are regulated by an unlinked transcription factor affected by common genetic variations.
Asunto(s)
Trastorno del Espectro Autista/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Hipocampo/metabolismo , Proteínas de Homeodominio/genética , Factores del Dominio POU/genética , Transcriptoma/genética , Alelos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Humanos , Mutación , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Factores de Riesgo , Análisis Espacio-TemporalRESUMEN
INTRODUCTION: DNA microarray-based studies report differentially methylated positions (DMPs) in blood between late-onset dementia due to Alzheimer's disease (AD) and cognitively unimpaired individuals, but interrogate < 4% of the genome. METHODS: We used whole genome methylation sequencing (WGMS) to quantify DNA methylation levels at 25,409,826 CpG loci in 281 blood samples from 108 AD and 173 cognitively unimpaired individuals. RESULTS: WGMS identified 28,038 DMPs throughout the human methylome, including 2707 differentially methylated genes (e.g., SORCS3, GABA, and PICALM) encoding proteins in biological pathways relevant to AD such as synaptic membrane, cation channel complex, and glutamatergic synapse. One hundred seventy-three differentially methylated blood-specific enhancers interact with the promoters of 95 genes that are differentially expressed in blood from persons with and without AD. DISCUSSION: WGMS identifies differentially methylated CpGs in known and newly detected genes and enhancers in blood from persons with and without AD. HIGHLIGHTS: Whole genome DNA methylation levels were quantified in blood from persons with and without Alzheimer's disease (AD). Twenty-eight thousand thirty-eight differentially methylated positions (DMPs) were identified. Two thousand seven hundred seven genes comprise DMPs. Forty-eight of 75 independent genetic risk loci for AD have DMPs. One thousand five hundred sixty-eight blood-specific enhancers comprise DMPs, 173 of which interact with the promoters of 95 genes that are differentially expressed in blood from persons with and without AD.
Asunto(s)
Enfermedad de Alzheimer , Metilación de ADN , Humanos , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/metabolismo , Epigénesis Genética , Secuenciación Completa del GenomaRESUMEN
Publicly available RNA-seq data is routinely used for retrospective analysis to elucidate new biology. Novel transcript discovery enabled by joint analysis of large collections of RNA-seq data sets has emerged as one such analysis. Current methods for transcript discovery rely on a '2-Step' approach where the first step encompasses building transcripts from individual data sets, followed by the second step that merges predicted transcripts across data sets. To increase the power of transcript discovery from large collections of RNA-seq data sets, we developed a novel '1-Step' approach named Pooling RNA-seq and Assembling Models (PRAM) that builds transcript models from pooled RNA-seq data sets. We demonstrate in a computational benchmark that 1-Step outperforms 2-Step approaches in predicting overall transcript structures and individual splice junctions, while performing competitively in detecting exonic nucleotides. Applying PRAM to 30 human ENCODE RNA-seq data sets identified unannotated transcripts with epigenetic and RAMPAGE signatures similar to those of recently annotated transcripts. In a case study, we discovered and experimentally validated new transcripts through the application of PRAM to mouse hematopoietic RNA-seq data sets. We uncovered new transcripts that share a differential expression pattern with a neighboring gene Pik3cg implicated in human hematopoietic phenotypes, and we provided evidence for the conservation of this relationship in human. PRAM is implemented as an R/Bioconductor package.
Asunto(s)
RNA-Seq/métodos , Animales , Fosfatidilinositol 3-Quinasa Clase Ib/genética , ADN Intergénico , Genómica , Células Madre Hematopoyéticas/metabolismo , Humanos , Ratones , ARN/metabolismo , Programas InformáticosRESUMEN
The ability to simulate high-throughput chromatin conformation (Hi-C) data is foundational for benchmarking Hi-C data analysis methods. Here we present a nonparametric strategy named FreeHi-C to simulate Hi-C data from the interacting genome fragments. Data from FreeHi-C exhibit high fidelity to biological Hi-C data. FreeHi-C boosts the precision and power of differential chromatin interaction detection through data augmentation under preserved false discovery rate control.
Asunto(s)
Benchmarking , Cromatina/genética , Mapeo Cromosómico/métodos , Biología Computacional/métodos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Humanos , Programas InformáticosRESUMEN
SUMMARY: Quantitative tools are needed to leverage the unprecedented resolution of single-cell high-throughput chromatin conformation (scHi-C) data and integrate it with other single-cell data modalities. We present single-cell gene associating domain (scGAD) scores as a dimension reduction and exploratory analysis tool for scHi-C data. scGAD enables summarization at the gene unit while accounting for inherent gene-level genomic biases. Low-dimensional projections with scGAD capture clustering of cells based on their 3D structures. Significant chromatin interactions within and between cell types can be identified with scGAD. We further show that scGAD facilitates the integration of scHi-C data with other single-cell data modalities by enabling its projection onto reference low-dimensional embeddings. This multi-modal data integration provides an automated and refined cell-type annotation for scHi-C data. AVAILABILITY AND IMPLEMENTATION: scGAD is part of the BandNorm R package at https://sshen82.github.io/BandNorm/articles/scGAD-tutorial.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genómica , Programas Informáticos , Genoma , Cromosomas , Cromatina , Análisis de la Célula IndividualRESUMEN
Thousands of cis-elements in genomes are predicted to have vital functions. Although conservation, activity in surrogate assays, polymorphisms, and disease mutations provide functional clues, deletion from endogenous loci constitutes the gold-standard test. A GATA-2-binding, Gata2 intronic cis-element (+9.5) required for hematopoietic stem cell genesis in mice is mutated in a human immunodeficiency syndrome. Because +9.5 is the only cis-element known to mediate stem cell genesis, we devised a strategy to identify functionally comparable enhancers ("+9.5-like") genome-wide. Gene editing revealed +9.5-like activity to mediate GATA-2 occupancy, chromatin opening, and transcriptional activation. A +9.5-like element resided in Samd14, which encodes a protein of unknown function. Samd14 increased hematopoietic progenitor levels/activity and promoted signaling by a pathway vital for hematopoietic stem/progenitor cell regulation (stem cell factor/c-Kit), and c-Kit rescued Samd14 loss-of-function phenotypes. Thus, the hematopoietic stem/progenitor cell cistrome revealed a mediator of a signaling pathway that has broad importance for stem/progenitor cell biology.
Asunto(s)
Factor de Transcripción GATA2/genética , Células Madre Hematopoyéticas/metabolismo , Proteínas/genética , Proteínas Proto-Oncogénicas c-kit/genética , Activación Transcripcional/genética , Secuencia de Aminoácidos , Animales , Diferenciación Celular/genética , Línea Celular , Ratones , Datos de Secuencia Molecular , Proteínas/metabolismo , Interferencia de ARN , ARN Interferente Pequeño , Transducción de Señal , Transcripción Genética/genéticaRESUMEN
Single-cell transcriptome sequencing (scRNA-seq) enabled investigations of cellular heterogeneity at exceedingly higher resolutions. Identification of novel cell types or transient developmental stages across multiple experimental conditions is one of its key applications. Linear and non-linear dimensionality reduction for data integration became a foundational tool in inference from scRNA-seq data. We present multilayer graph clustering (MLG) as an integrative approach for combining multiple dimensionality reduction of multi-condition scRNA-seq data. MLG generates a multilayer shared nearest neighbor cell graph with higher signal-to-noise ratio and outperforms current best practices in terms of clustering accuracy across large-scale benchmarking experiments. Application of MLG to a wide variety of datasets from multiple conditions highlights how MLG boosts signal-to-noise ratio for fine-grained sub-population identification. MLG is widely applicable to settings with single cell data integration via dimension reduction.
Asunto(s)
RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Animales , Análisis por Conglomerados , Células Madre Hematopoyéticas/metabolismo , Humanos , RatonesRESUMEN
Developmental-regulatory networks often include large gene families encoding mechanistically-related proteins like G-protein-coupled receptors, zinc finger transcription factors and solute carrier (SLC) transporters. In principle, a common mechanism may confer expression of multiple members integral to a developmental process, or diverse mechanisms may be deployed. Using genetic complementation and enhancer-mutant systems, we analyzed the 456 member SLC family that establishes the small molecule constitution of cells. This analysis identified SLC gene cohorts regulated by GATA1 and/or GATA2 during erythroid differentiation. As >50 SLC genes shared GATA factor regulation, a common mechanism established multiple members of this family. These genes included Slc29a1 encoding an equilibrative nucleoside transporter (Slc29a1/ENT1) that utilizes adenosine as a preferred substrate. Slc29a1 promoted erythroblast survival and differentiation ex vivo. Targeted ablation of murine Slc29a1 in erythroblasts attenuated erythropoiesis and erythrocyte regeneration in response to acute anemia. Our results reveal a GATA factor-regulated SLC ensemble, with a nucleoside transporter component that promotes erythropoiesis and prevents anemia, and establish a mechanistic link between GATA factor and adenosine mechanisms. We propose that integration of the GATA factor-adenosine circuit with other components of the GATA factor-regulated SLC ensemble establishes the small molecule repertoire required for progenitor cells to efficiently generate erythrocytes.
Asunto(s)
Tranportador Equilibrativo 1 de Nucleósido/metabolismo , Eritropoyesis , Factores de Transcripción GATA/metabolismo , Adenosina/metabolismo , Animales , Células Cultivadas , Tranportador Equilibrativo 1 de Nucleósido/genética , Ratones , Ratones Endogámicos C57BLRESUMEN
Ribonucleotidyl transferases (rNTases) add untemplated ribonucleotides to diverse RNAs. We have developed TRAID-seq, a screening strategy in Saccharomyces cerevisiae to identify sequences added to a reporter RNA at single-nucleotide resolution by overexpressed candidate enzymes from different organisms. The rNTase activities of 22 previously unexplored enzymes were determined. In addition to poly(A)- and poly(U)-adding enzymes, we identified a cytidine-adding enzyme that is likely to be part of a two-enzyme system that adds CCA to tRNAs in a eukaryote; a nucleotidyl transferase that adds nucleotides to RNA without apparent nucleotide preference; and a poly(UG) polymerase, Caenorhabditis elegans MUT-2, that adds alternating uridine and guanosine nucleotides to form poly(UG) tails. MUT-2 is known to be required for certain forms of RNA silencing, and mutants of the enzyme that result in defective silencing did not add poly(UG) tails in our assay. We propose that MUT-2 poly(UG) polymerase activity is required to promote genome integrity and RNA silencing.
Asunto(s)
Proteínas de Caenorhabditis elegans/genética , Caenorhabditis elegans/genética , Nucleotidiltransferasas/genética , Interferencia de ARN , ARN Nucleotidiltransferasas/genética , Saccharomyces cerevisiae/genética , Animales , Caenorhabditis elegans/enzimología , Mutación , Saccharomyces cerevisiae/enzimología , Proteínas de Saccharomyces cerevisiae/genéticaRESUMEN
High-throughput genome-wide chromatin conformation capture assay (Hi-C) is routinely used to profile long-range genomic interactions and three-dimensional organization of genomes. A key application of Hi-C is the comparative analysis of genomic interactions across different time points, cellular conditions, or multiple stimuli. While operating characteristics of methods for Hi-C data processing such as normalization, pairwise interaction and higher-order organization detection have been relatively well studied, properties of methods for differential chromatin interaction detection are less investigated. We have recently developed FreeHi-C to enable data-driven non-parametric simulations from Hi-C experiments. Here, we extend FreeHi-C with a user/data-driven spike-in module to facilitate comparisons of differential chromatin interaction detection methods where the ground truth differential chromatin interactions are known under a wide variety of settings. We use FreeHi-C to benchmark four differential chromatin interaction detection methods, namely HiCcompare, multiHiCcompare, diffHic, and Selfish, using three comparative analysis settings with different sequencing depths and spike-in proportions. This comparison reveals distinguished performances in terms of the standard metrics such as the false discovery rate control, detection power, significance order, precision-recall curve, and receiver operating characteristic curve as well as overall genomic properties of the types of differential chromatin interactions detectable by each method. Furthermore, it highlights the lack of power for all methods in small replication settings.
Asunto(s)
Cromatina/metabolismo , Epigenómica/métodos , Programas Informáticos , Animales , Mapeo Cromosómico , Biología Computacional/métodos , Simulación por Computador , HumanosRESUMEN
Hemoglobin-expressing erythrocytes (red blood cells) act as fundamental metabolic regulators by providing oxygen to cells and tissues throughout the body. Whereas the vital requirement for oxygen to support metabolically active cells and tissues is well established, almost nothing is known regarding how erythrocyte development and function impact regeneration. Furthermore, many questions remain unanswered relating to how insults to hematopoietic stem/progenitor cells and erythrocytes can trigger a massive regenerative process termed 'stress erythropoiesis' to produce billions of erythrocytes. Here, we review the cellular and molecular mechanisms governing erythrocyte development and regeneration, and discuss the potential links between these events and other regenerative processes.
Asunto(s)
Diferenciación Celular/fisiología , Eritrocitos/metabolismo , Eritropoyesis/fisiología , Células Madre Hematopoyéticas/metabolismo , Regeneración/fisiología , Animales , Transporte Biológico Activo/fisiología , Eritrocitos/citología , Células Madre Hematopoyéticas/citología , Humanos , Oxígeno/metabolismoRESUMEN
Genome-wide association studies (GWAS) have successfully identified thousands of genetic variants contributing to disease and other phenotypes. However, significant obstacles hamper our ability to elucidate causal variants, identify genes affected by causal variants, and characterize the mechanisms by which genotypes influence phenotypes. The increasing availability of genome-wide functional annotation data is providing unique opportunities to incorporate prior information into the analysis of GWAS to better understand the impact of variants on disease etiology. Although there have been many advances in incorporating prior information into prioritization of trait-associated variants in GWAS, functional annotation data have played a secondary role in the joint analysis of GWAS and molecular (i.e., expression) quantitative trait loci (eQTL) data in assessing evidence for association. To address this, we develop a novel mediation framework, iFunMed, to integrate GWAS and eQTL data with the utilization of publicly available functional annotation data. iFunMed extends the scope of standard mediation analysis by incorporating information from multiple genetic variants at a time and leveraging variant-level summary statistics. Data-driven computational experiments convey how informative annotations improve single-nucleotide polymorphism (SNP) selection performance while emphasizing robustness of iFunMed to noninformative annotations. Application to Framingham Heart Study data indicates that iFunMed is able to boost detection of SNPs with mediation effects that can be attributed to regulatory mechanisms.
Asunto(s)
Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo/genética , Programas Informáticos , Secuencia de Bases , Simulación por Computador , Recuento de Eritrocitos , Genotipo , Humanos , Anotación de Secuencia Molecular , Fenotipo , Polimorfismo de Nucleótido Simple/genética , ProbabilidadRESUMEN
SUMMARY: Understanding the regulatory roles of non-coding genetic variants has become a central goal for interpreting results of genome-wide association studies. The regulatory significance of the variants may be interrogated by assessing their influence on transcription factor binding. We have developed atSNP Search, a comprehensive web database for evaluating motif matches to the human genome with both reference and variant alleles and assessing the overall significance of the variant alterations on the motif matches. Convenient search features, comprehensive search outputs and a useful help menu are key components of atSNP Search. atSNP Search enables convenient interpretation of regulatory variants by statistical significance testing and composite logo plots, which are graphical representations of motif matches with the reference and variant alleles. Existing motif-based regulatory variant discovery tools only consider a limited pool of variants due to storage or other limitations. In contrast, atSNP Search users can test more than 37 billion variant-motif pairs with marginal significance in motif matches or match alteration. Computational evidence from atSNP Search, when combined with experimental validation, may help with the discovery of underlying disease mechanisms. AVAILABILITY AND IMPLEMENTATION: atSNP Search is freely available at http://atsnp.biostat.wisc.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.