ABSTRACT
Epstein-Barr virus (EBV) is associated with multiple human malignancies. To evade immune detection, EBV switches between latent and lytic programs. How viral latency is maintained in tumors or in memory B cells, the reservoir for lifelong EBV infection, remains incompletely understood. To gain insights, we performed a human genome-wide CRISPR/Cas9 screen in Burkitt lymphoma B cells. Our analyses identified a network of host factors that repress lytic reactivation, centered on the transcription factor MYC, including cohesins, FACT, STAGA, and Mediator. Depletion of MYC or factors important for MYC expression reactivated the lytic cycle, including in Burkitt xenografts. MYC bound the EBV genome origin of lytic replication and suppressed its looping to the lytic cycle initiator BZLF1 promoter. Notably, MYC abundance decreases with plasma cell differentiation, a key lytic reactivation trigger. Our results suggest that EBV senses MYC abundance as a readout of B cell state and highlights Burkitt latency reversal therapeutic targets.
Subject(s)
Burkitt Lymphoma/pathology , Epstein-Barr Virus Infections/virology , Herpesvirus 4, Human/physiology , Host-Pathogen Interactions , Proto-Oncogene Proteins c-myc/metabolism , Virus Activation , Virus Latency , Animals , B-Lymphocytes/metabolism , B-Lymphocytes/pathology , B-Lymphocytes/virology , Burkitt Lymphoma/metabolism , Burkitt Lymphoma/virology , Cell Proliferation , Epstein-Barr Virus Infections/genetics , Epstein-Barr Virus Infections/metabolism , Female , Gene Expression Regulation, Viral , Humans , Mice , Mice, Inbred NOD , Mice, SCID , Promoter Regions, Genetic , Proto-Oncogene Proteins c-myc/genetics , Tumor Cells, Cultured , Xenograft Model Antitumor AssaysABSTRACT
Epstein-Barr virus (EBV) uses latency programs to colonize the memory B-cell reservoir, and each program is associated with human malignancies. However, knowledge remains incomplete of epigenetic mechanisms that maintain the highly restricted latency I program, present in memory and Burkitt lymphoma cells, in which EBNA1 is the only EBV-encoded protein expressed. Given increasing appreciation that higher order chromatin architecture is an important determinant of viral and host gene expression, we investigated roles of Wings Apart-Like Protein Homolog (WAPL), a host factor that unloads cohesin to control DNA loop size and that was discovered as an EBNA2-associated protein. WAPL knockout (KO) in Burkitt cells de-repressed LMP1 and LMP2A expression, but not other EBV oncogenes, to yield a viral program reminiscent of EBV latency II, which is rarely observed in B-cells. WAPL KO also increased LMP1/2A levels in latency III lymphoblastoid cells. WAPL KO altered EBV genome architecture, triggering formation of DNA loops between the LMP promoter region and the EBV origins of lytic replication (oriLyt). Hi-C analysis further demonstrated that WAPL KO reprogrammed EBV genomic DNA looping. LMP1 and LMP2A de-repression correlated with decreased histone repressive marks at their promoters. We propose that EBV coopts WAPL to negatively regulate latent membrane protein expression to maintain Burkitt latency I.
Subject(s)
Epstein-Barr Virus Infections , Gene Expression Regulation, Viral , Herpesvirus 4, Human , Viral Matrix Proteins , Virus Latency , Humans , Herpesvirus 4, Human/genetics , Virus Latency/physiology , Viral Matrix Proteins/metabolism , Viral Matrix Proteins/genetics , Epstein-Barr Virus Infections/virology , Epstein-Barr Virus Infections/metabolism , Epstein-Barr Virus Infections/genetics , B-Lymphocytes/virology , B-Lymphocytes/metabolism , Burkitt Lymphoma/virology , Burkitt Lymphoma/genetics , Burkitt Lymphoma/metabolism , Cell Line, TumorABSTRACT
MOTIVATION: Integrative analysis of heterogeneous expression data remains challenging due to variations in platform, RNA quality, sample processing, and other unknown technical effects. Selecting the approach for removing unwanted batch effects can be a time-consuming and tedious process, especially for more biologically focused investigators. RESULTS: Here, we present BatchFLEX, a Shiny app that can facilitate visualization and correction of batch effects using several established methods. BatchFLEX can visualize the variance contribution of a factor before and after correction. As an example, we have analyzed ImmGen microarray data and enhanced its expression signals that distinguishes each immune cell type. Moreover, our analysis revealed the impact of the batch correction in altering the gene expression rank and single-sample GSEA pathway scores in immune cell types, highlighting the importance of real-time assessment of the batch correction for optimal downstream analysis. AVAILABILITY AND IMPLEMENTATION: Our tool is available through Github https://github.com/shawlab-moffitt/BATCH-FLEX-ShinyApp with an online example on Shiny.io https://shawlab-moffitt.shinyapps.io/batch_flex/.
Subject(s)
Software , Gene Expression Profiling/methods , Humans , Computational Biology/methodsABSTRACT
Super enhancers (SE), large genomic elements that activate transcription and drive cell identity, have been found with cancer-specific gene regulation in human cancers. Recent studies reported the importance of understanding the cooperation and function of SE internal components, i.e., the constituent enhancers (CE). However, there are no pan-cancer studies to identify cancer-specific SE signatures at the constituent level. Here, by revisiting pan-cancer SE activities with H3K27Ac ChIP-seq datasets, we report fingerprint SE signatures for 28 cancer types in the NCI-60 cell panel. We implement a mixture model to discriminate active CEs from inactive CEs by taking into consideration ChIP-seq variabilities between cancer samples and across CEs. We demonstrate that the model-based estimation of CE states provides improved functional interpretation of SE-associated regulation. We identify cancer-specific CEs by balancing their active prevalence with their capability of encoding cancer type identities. We further demonstrate that cancer-specific CEs have the strongest per-base enhancer activities in independent enhancer sequencing assays, suggesting their importance in understanding critical SE signatures. We summarize fingerprint SEs based on the cancer-specific statuses of their component CEs and build an easy-to-use R package to facilitate the query, exploration, and visualization of fingerprint SEs across cancers.
Subject(s)
Neoplasms , Super Enhancers , Humans , Epigenomics , Enhancer Elements, Genetic/genetics , Gene Expression Regulation , Neoplasms/geneticsABSTRACT
BACKGROUND: Cancer-related deaths for people with human immunodeficiency virus (PWH) are increasing due to longer life expectancies and disparately poor cancer-related outcomes. We hypothesize that advanced biological aging contributes to cancer-related morbidity and mortality for PWH and cancer. We sought to determine the impact of clonal hematopoiesis (CH) on cancer disparities in PWH. METHODS: We conducted a retrospective study to compare the prevalence and clinical outcomes of CH in PWH and people without HIV (PWoH) and cancer. Included in the study were PWH and similar PWoH based on tumor site, age, tumor sequence, and cancer treatment status. Biological aging was also measured using epigenetic methylation clocks. RESULTS: In 136 patients with cancer, PWH had twice the prevalence of CH compared to similar PWoH (23% vs 11%, P = .07). After adjusting for patient characteristics, PWH were 4 times more likely than PWoH to have CH (odds ratio, 4.1 [95% confidence interval, 1.3-13.9]; P = .02). The effect of CH on survival was most pronounced in PWH, who had a 5-year survival rate of 38% if they had CH (vs 59% if no CH), compared to PWoH who had a 5-year survival rate of 75% if they had CH (vs 83% if no CH). CONCLUSIONS: This study provides the first evidence that PWH may have a higher prevalence of CH than PWoH with the same cancers. CH may be an independent biological aging risk factor contributing to inferior survival for PWH and cancer.
Subject(s)
Clonal Hematopoiesis , HIV Infections , Neoplasms , Humans , Male , Female , HIV Infections/virology , HIV Infections/complications , Middle Aged , Clonal Hematopoiesis/genetics , Retrospective Studies , Neoplasms/virology , Adult , Prevalence , AgedABSTRACT
Super enhancers (SEs) are broad enhancer domains usually containing multiple constituent enhancers that hold elevated activities in gene regulation. Disruption in one or more constituent enhancers causes aberrant SE activities that lead to gene dysregulation in diseases. To quantify SE aberrations, differential analysis is performed to compare SE activities between cell conditions. The state-of-art strategy in estimating differential SEs relies on overall activities and neglect the changes in length and structure of SEs. Here, we propose a novel computational method to identify differential SEs by weighting the combinatorial effects of constituent-enhancer activities and locations (i.e. internal dynamics). In addition to overall activity changes, our method identified four novel classes of differential SEs with distinct enhancer structural alterations. We demonstrate that these structure alterations hold distinct regulatory impact, such as regulating different number of genes and modulating gene expression with different strengths, highlighting the differentiated regulatory roles of these unexplored SE features. When compared to the existing method, our method showed improved identification of differential SEs that were linked to better discernment of cell-type-specific SE activity and functional interpretation.
Subject(s)
Enhancer Elements, Genetic , Gene Expression Regulation , Cell DifferentiationABSTRACT
Pathway-level survival analysis offers the opportunity to examine molecular pathways and immune signatures that influence patient outcomes. However, available survival analysis algorithms are limited in pathway-level function and lack a streamlined analytical process. Here we present a comprehensive pathway-level survival analysis suite, PATH-SURVEYOR, which includes a Shiny user interface with extensive features for systematic exploration of pathways and covariates in a Cox proportional-hazard model. Moreover, our framework offers an integrative strategy for performing Hazard Ratio ranked Gene Set Enrichment Analysis and pathway clustering. As an example, we applied our tool in a combined cohort of melanoma patients treated with checkpoint inhibition (ICI) and identified several immune populations and biomarkers predictive of ICI efficacy. We also analyzed gene expression data of pediatric acute myeloid leukemia (AML) and performed an inverse association of drug targets with the patient's clinical endpoint. Our analysis derived several drug targets in high-risk KMT2A-fusion-positive patients, which were then validated in AML cell lines in the Genomics of Drug Sensitivity database. Altogether, the tool offers a comprehensive suite for pathway-level survival analysis and a user interface for exploring drug targets, molecular features, and immune populations at different resolutions.
Subject(s)
Leukemia, Myeloid, Acute , Melanoma , Child , Humans , Drug Repositioning , Medical Oncology , Melanoma/drug therapy , Melanoma/genetics , Algorithms , Leukemia, Myeloid, Acute/drug therapy , Leukemia, Myeloid, Acute/geneticsABSTRACT
Epstein-Barr virus (EBV) causes endemic Burkitt lymphoma, the leading childhood cancer in sub-Saharan Africa. Burkitt cells retain aspects of germinal center B-cell physiology with MYC-driven B-cell hyperproliferation; however, little is presently known about their iron metabolism. CRISPR/Cas9 analysis highlighted the little-studied ferrireductase CYB561A3 as critical for Burkitt proliferation but not for that of the closely related EBV-transformed lymphoblastoid cells or nearly all other Cancer Dependency Map cell lines. Burkitt CYB561A3 knockout induced profound iron starvation, despite ferritinophagy ad plasma membrane transferrin upregulation. Elevated concentrations of ascorbic acid, a key CYB561 family electron donor, or the labile iron source ferrous citrate rescued Burkitt CYB561A3 deficiency. CYB561A3 knockout caused catastrophic lysosomal and mitochondrial damage and impaired mitochondrial respiration. Conversely, lymphoblastoid B cells with the transforming EBV latency III program were instead dependent on the STEAP3 ferrireductase. These results highlight CYB561A3 as an attractive therapeutic Burkitt lymphoma target.
Subject(s)
Burkitt Lymphoma/pathology , Cytochromes b/genetics , Gene Expression Regulation, Neoplastic , Lysosomes/pathology , B-Lymphocytes/metabolism , B-Lymphocytes/pathology , Burkitt Lymphoma/genetics , CRISPR-Cas Systems , Cell Line, Tumor , Cell Proliferation , Epstein-Barr Virus Infections/complications , FMN Reductase/genetics , HEK293 Cells , Herpesvirus 4, Human/isolation & purification , Humans , Lysosomes/genetics , Mitochondria/genetics , Mitochondria/pathologyABSTRACT
PURPOSE: Estrogen-receptor (ER) and progesterone-receptor (PR) expression levels in breast cancer, which have been principally compared via binomial descriptors, can vary widely across tumors. We sought to characterize ER and PR expression levels using semi-quantitative analyses of receptor staining in germline pathogenic variant (PV) carriers of cancer predisposition genes. METHODS: We conducted a retrospective chart review of patients who underwent germline genetic testing for cancer predisposition genes at a tertiary cancer center genetics clinic. We performed comparisons of semi-quantitative ER and PR percentage staining levels across carriers and non-carriers of cancer predisposition genes. RESULTS: Breast cancers from BRCA1 PV carriers expressed significantly lower ER (15.2% vs 78.2%, p < 0.001) and lower PR (6.8% vs 41.1%, p < 0.001) staining compared to non-PV carriers. Similarly, breast cancers of BRCA2 (66.7% vs 78.2%, p = 0.005) and TP53 (50.6% vs 78.2%, p = 0.015) PV tumors also displayed moderate decreases in ER staining. Conversely, CHEK2 tumors displayed higher ER (93.1% vs 78.2%, p = 0.005) and PR (72% vs 48.8%, p = 0.001) staining when compared to non-PV carriers. We observed a wide range of dispersion across the ER and PR staining levels of the carriers and noncarriers. ER and PR ranges of dispersion of CHEK2 tumors were uniquely narrower than all other groups. CONCLUSION: The findings of our study suggest that precise expression levels of ER and PR in breast cancers can vary widely. These differences are further augmented when comparing expression staining across PV and non-PV carriers, suggesting potentially unique tumorigenesis and progression pathways influenced by germline cancer predisposition genes.
Subject(s)
Breast Neoplasms , Breast Neoplasms/pathology , Checkpoint Kinase 2/genetics , Female , Genetic Predisposition to Disease , Germ Cells/metabolism , Germ-Line Mutation , Hormones , Humans , Mutation , Receptors, Progesterone/genetics , Receptors, Progesterone/metabolism , Retrospective StudiesABSTRACT
SUMMARY: The heterogeneous cell types of the tumor-immune microenvironment (TIME) play key roles in determining cancer progression, metastasis and response to treatment. We report the development of TIMEx, a novel TIME deconvolution method emphasizing on estimating infiltrating immune cells for bulk transcriptomics using pan-cancer single-cell RNA-seq signatures. We also implemented a comprehensive, user-friendly web-portal for users to evaluate TIMEx and other deconvolution methods with bulk transcriptomic profiles. AVAILABILITY AND IMPLEMENTATION: TIMEx web-portal is freely accessible at http://timex.moffitt.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
ABSTRACT
Early diagnosis of nasopharyngeal carcinoma (NPC) is difficult because of a lack of specific symptoms. Many patients have advanced disease at diagnosis, and these patients respond poorly to treatment. New treatments are therefore needed to improve the outcome of NPC. To better understand the molecular pathogenesis of NPC, here we used an NPC cell line in a genome-wide CRISPR-based knockout screen to identify the cellular factors and pathways essential for NPC (i.e. dependence factors). This screen identified the Moz, Ybf2/Sas3, Sas2, Tip60 histone acetyl transferase complex, NF-κB signaling, purine synthesis, and linear ubiquitination pathways; and MDM2 proto-oncogene as NPC dependence factors/pathways. Using gene knock out, complementary DNA rescue, and inhibitor assays, we found that perturbation of these pathways greatly reduces the growth of NPC cell lines but does not affect growth of SV40-immortalized normal nasopharyngeal epithelial cells. These results suggest that targeting these pathways/proteins may hold promise for achieving better treatment of patients with NPC.
Subject(s)
Biomarkers, Tumor/genetics , CRISPR-Cas Systems , Cell Proliferation , Gene Knockout Techniques/methods , Genome, Human , Nasopharyngeal Carcinoma/genetics , Nasopharyngeal Neoplasms/genetics , Biomarkers, Tumor/antagonists & inhibitors , Humans , Nasopharyngeal Carcinoma/pathology , Nasopharyngeal Neoplasms/pathology , Proto-Oncogene Mas , Signal Transduction , Tumor Cells, CulturedABSTRACT
The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics' public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories.
Subject(s)
Base Composition , DNA/metabolism , Sequence Analysis, DNA/methods , Algorithms , Binding Sites , Chromatin Immunoprecipitation , DNA/chemistry , False Positive Reactions , Genomics , High-Throughput Nucleotide SequencingABSTRACT
Epstein-Barr virus (EBV) infection of human primary resting B lymphocytes (RBLs) leads to the establishment of lymphoblastoid cell lines (LCLs) that can grow indefinitely in vitro EBV transforms RBLs through the expression of viral latency genes, and these genes alter host transcription programs. To globally measure the transcriptome changes during EBV transformation, primary human resting B lymphocytes (RBLs) were infected with B95.8 EBV for 0, 2, 4, 7, 14, 21, and 28 days, and poly(A) plus RNAs were analyzed by transcriptome sequencing (RNA-seq). Analyses of variance (ANOVAs) found 3,669 protein-coding genes that were differentially expressed (false-discovery rate [FDR] < 0.01). Ninety-four percent of LCL genes that are essential for LCL growth and survival were differentially expressed. Pathway analyses identified a significant enrichment of pathways involved in cell proliferation, DNA repair, metabolism, and antiviral responses. RNA-seq also identified long noncoding RNAs (lncRNAs) differentially expressed during EBV infection. Clustered regularly interspaced short palindromic repeat (CRISPR) interference (CRISPRi) and CRISPR activation (CRISPRa) found that CYTOR and NORAD lncRNAs were important for LCL growth. During EBV infection, type III EBV latency genes were expressed rapidly after infection. Immediately after LCL establishment, EBV lytic genes were also expressed in LCLs, and â¼4% of the LCLs express gp350. Chromatin immune precipitation followed by deep sequencing (ChIP-seq) and POLR2A chromatin interaction analysis followed by paired-end tag sequencing (ChIA-PET) data linked EBV enhancers to 90% of EBV-regulated genes. Many genes were linked to enhancers occupied by multiple EBNAs or NF-κB subunits. Incorporating these assays, we generated a comprehensive EBV regulome in LCLs.IMPORTANCE Epstein-Barr virus (EBV) immortalization of resting B lymphocytes (RBLs) is a useful model system to study EBV oncogenesis. By incorporating transcriptome sequencing (RNA-seq), chromatin immune precipitation followed by deep sequencing (ChIP-seq), chromatin interaction analysis followed by paired-end tag sequencing (ChIA-PET), and genome-wide clustered regularly interspaced short palindromic repeat (CRISPR) screen, we identified key pathways that EBV usurps to enable B cell growth and transformation. Multiple layers of regulation could be achieved by cooperations between multiple EBV transcription factors binding to the same enhancers. EBV manipulated the expression of most cell genes essential for lymphoblastoid cell line (LCL) growth and survival. In addition to proteins, long noncoding RNAs (lncRNAs) regulated by EBV also contributed to LCL growth and survival. The data presented in this paper not only allowed us to further define the molecular pathogenesis of EBV but also serve as a useful resource to the EBV research community.
Subject(s)
B-Lymphocytes/virology , Epstein-Barr Virus Infections/genetics , Epstein-Barr Virus Infections/metabolism , Gene Expression Regulation, Viral , Herpesvirus 4, Human/genetics , Herpesvirus 4, Human/physiology , Sequence Analysis, RNA , Analysis of Variance , Cell Line , Chromatin/metabolism , Chromatin Immunoprecipitation , DNA-Directed RNA Polymerases , Epstein-Barr Virus Infections/virology , Epstein-Barr Virus Nuclear Antigens/genetics , Epstein-Barr Virus Nuclear Antigens/metabolism , Herpesvirus 4, Human/pathogenicity , High-Throughput Nucleotide Sequencing , Host-Pathogen Interactions/genetics , Host-Pathogen Interactions/physiology , Humans , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Transcription Factors/metabolism , Transcriptome , Virus Latency/geneticsABSTRACT
Super-enhancers (SEs) are clusters of enhancers marked by extraordinarily high and broad chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) signals for H3K27ac or other transcription factors (TFs). SEs play pivotal roles in development and oncogenesis. Epstein-Barr virus (EBV) super-enhancers (ESEs) are co-occupied by all essential EBV oncogenes and EBV-activated NF-κB subunits. Perturbation of ESEs stops lymphoblastoid cell line (LCL) growth. To further characterize ESEs and identify proteins critical for ESE function, MYC ESEs were cloned upstream of a green fluorescent protein (GFP) reporter. Reporters driven by MYC ESEs 525 kb and 428 kb upstream of MYC (525ESE and 428ESE) had very high activities in LCLs but not in EBV-negative BJAB cells. EBNA2 activated MYC ESE-driven luciferase reporters. CRISPRi targeting 525ESE significantly decreased MYC expression. Genome-wide CRISPR screens identified factors essential for ESE activity. TBP-associated factor (TAF) family proteins, including TAF8, TAF11, and TAF3, were essential for the activity of the integrated 525ESE-driven reporter in LCLs. TAF8 and TAF11 knockout significantly decreased 525ESE activity and MYC transcription. MEF2C was also identified to be essential for 525ESE activity. Depletion of MEF2C decreased 525ESE reporter activity, MYC expression, and LCL growth. MEF2C cDNA resistant to CRIPSR cutting rescued MEF2C knockout and restored 525ESE reporter activity and MYC expression. MEF2C depletion decreased IRF4, EBNA2, and SPI1 binding to 525ESE in LCLs. MEF2C depletion also affected the expression of other ESE target genes, including the ETS1 and BCL2 genes. These data indicated that in addition to EBNA2, TAF family members and MEF2C are essential for ESE activity, MYC expression, and LCL growth.IMPORTANCE SEs play critical roles in cancer development. Since SEs assemble much bigger protein complexes on enhancers than typical enhancers (TEs), they are more sensitive than TEs to perturbations. Understanding the protein composition of SEs that are linked to key oncogenes may identify novel therapeutic targets. A genome-wide CRISPR screen specifically identified proteins essential for MYC ESE activity but not simian virus 40 (SV40) enhancer. These proteins not only were essential for the reporter activity but also were also important for MYC expression and LCL growth. Targeting these proteins may lead to new therapies for EBV-associated cancers.
Subject(s)
Epstein-Barr Virus Infections/virology , Gene Expression Regulation, Viral , Herpesvirus 4, Human/physiology , TATA-Binding Protein Associated Factors/metabolism , CRISPR-Cas Systems , Cell Line, Tumor , Cell Survival/genetics , Enhancer Elements, Genetic , Gene Editing , Gene Expression , Gene Knockout Techniques , Genes, myc , Histones/metabolism , Host-Pathogen Interactions , Humans , MEF2 Transcription Factors/genetics , MEF2 Transcription Factors/metabolismABSTRACT
Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem.
Subject(s)
Gene Expression Profiling/standards , High-Throughput Nucleotide Sequencing/standards , Sequence Analysis, RNA/standards , Single-Cell Analysis/standards , Transcriptome , Animals , HumansABSTRACT
BACKGROUND: Somatic copy number alternations (SCNAs) can be utilized to infer tumor subclonal populations in whole genome seuqncing studies, where usually their read count ratios between tumor-normal paired samples serve as the inferring proxy. Existing SCNA based subclonal population inferring tools consider the GC bias of tumor and normal sample is of the same fature, and could be fully offset by read count ratio. However, we found that, the read count ratio on SCNA segments presents a Log linear biased pattern, which influence existing read count ratios based subclonal inferring tools performance. Currently no correction tools take into account the read ratio bias. RESULTS: We present Pre-SCNAClonal, a tool that improving tumor subclonal population inferring by correcting GC-bias at SCNAs level. Pre-SCNAClonal first corrects GC bias using Markov chain Monte Carlo probability model, then accurately locates baseline DNA segments (not containing any SCNAs) with a hierarchy clustering model. We show Pre-SCNAClonal's superiority to exsiting GC-bias correction methods at any level of subclonal population. CONCLUSIONS: Pre-SCNAClonal could be run independently as well as serving as pre-processing/gc-correction step in conjuntion with exsiting SCNA-based subclonal inferring tools.
Subject(s)
Base Composition/genetics , DNA Copy Number Variations/genetics , Models, Genetic , Neoplasms/genetics , Neoplasms/pathology , Whole Genome Sequencing , Bias , Cell Line, Tumor , Clone Cells , Heterozygote , Humans , Markov Chains , Monte Carlo Method , Polymorphism, Single Nucleotide/geneticsABSTRACT
MOTIVATION: Single Molecule Real-Time (SMRT) sequencing has been widely applied in cutting-edge genomic studies. However, it is still an expensive task to align the noisy long SMRT reads to reference genome by state-of-the-art aligners, which is becoming a bottleneck in applications with SMRT sequencing. Novel approach is on demand for improving the efficiency and effectiveness of SMRT read alignment. RESULTS: We propose Regional Hashing-based Alignment Tool (rHAT), a seed-and-extension-based read alignment approach specifically designed for noisy long reads. rHAT indexes reference genome by regional hash table (RHT), a hash table-based index which describes the short tokens within local windows of reference genome. In the seeding phase, rHAT utilizes RHT for efficiently calculating the occurrences of short token matches between partial read and local genomic windows to find highly possible candidate sites. In the extension phase, a sparse dynamic programming-based heuristic approach is used for reducing the cost of aligning read to the candidate sites. By benchmarking on the real and simulated datasets from various prokaryote and eukaryote genomes, we demonstrated that rHAT can effectively align SMRT reads with outstanding throughput. AVAILABILITY AND IMPLEMENTATION: rHAT is implemented in C++; the source code is available at https://github.com/HIT-Bioinformatics/rHAT CONTACT: ydwang@hit.edu.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Software , Algorithms , Genomics , Sequence Alignment , Sequence Analysis, DNAABSTRACT
MOTIVATION: Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. RESULTS: We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. AVAILABILITY AND IMPLEMENTATION: The FGB is available at http://mlg.hit.edu.cn/FGB/.
Subject(s)
Genome, Human , Pedigree , Software , Computer Graphics , Genetic Variation , Genomics , High-Throughput Nucleotide Sequencing , Humans , Molecular Sequence AnnotationABSTRACT
Advances in high-throughput sequencing technologies have brought us into the individual genome era. Projects such as the 1000 Genomes Project have led the individual genome sequencing to become more and more popular. How to visualize, analyse and annotate individual genomes with knowledge bases to support genome studies and personalized healthcare is still a big challenge. The Personal Genome Browser (PGB) is developed to provide comprehensive functional annotation and visualization for individual genomes based on the genetic-molecular-phenotypic model. Investigators can easily view individual genetic variants, such as single nucleotide variants (SNVs), INDELs and structural variations (SVs), as well as genomic features and phenotypes associated to the individual genetic variants. The PGB especially highlights potential functional variants using the PGB built-in method or SIFT/PolyPhen2 scores. Moreover, the functional risks of genes could be evaluated by scanning individual genetic variants on the whole genome, a chromosome, or a cytoband based on functional implications of the variants. Investigators can then navigate to high risk genes on the scanned individual genome. The PGB accepts Variant Call Format (VCF) and Genetic Variation Format (GVF) files as the input. The functional annotation of input individual genome variants can be visualized in real time by well-defined symbols and shapes. The PGB is available at http://www.pgbrowser.org/.
Subject(s)
Genetic Variation , Genome, Human , Software , Computer Graphics , Genomics , Humans , InternetABSTRACT
Epstein-Barr virus (EBV) uses latency programs to colonize the memory B-cell reservoir, and each program is associated with human malignancies. However, knowledge remains incomplete of epigenetic mechanisms that maintain the highly restricted latency I program, present in memory and Burkitt lymphoma cells, in which EBNA1 is the only EBV-encoded protein expressed. Given increasing appreciation that higher order chromatin architecture is an important determinant of viral and host gene expression, we investigated roles of Wings Apart-Like Protein Homolog (WAPL), a host factor that unloads cohesins to control DNA loop size and that was discovered as an EBNA2-associated protein. WAPL knockout (KO) in Burkitt cells de-repressed LMP1 and LMP2A expression but not other EBV oncogenes to yield a viral program reminiscent of EBV latency II, which is rarely observed in B-cells. WAPL KO also increased LMP1/2A levels in latency III lymphoblastoid cells. WAPL KO altered EBV genome architecture, triggering formation of DNA loops between the LMP promoter region and the EBV origins of lytic replication (oriLyt). Hi-C analysis further demonstrated that WAPL KO reprograms EBV genomic DNA looping. LMP1 and LMP2A de-repression correlated with decreased histone repressive marks at their promoters. We propose that EBV coopts WAPL to negatively regulate latent membrane protein expression to maintain Burkitt latency I. Author Summary: EBV is a highly prevalent herpesvirus etiologically linked to multiple lymphomas, gastric and nasopharyngeal carcinomas, and multiple sclerosis. EBV persists in the human host in B-cells that express a series of latency programs, each of which is observed in a distinct type of human lymphoma. The most restricted form of EBV latency, called latency I, is observed in memory cells and in most Burkitt lymphomas. In this state, EBNA1 is the only EBV-encoded protein expressed to facilitate infected cell immunoevasion. However, epigenetic mechanisms that repress expression of the other eight EBV-encoded latency proteins remain to be fully elucidated. We hypothesized that the host factor WAPL might have a role in restriction of EBV genes, as it is a major regulator of long-range DNA interactions by negatively regulating cohesin proteins that stabilize DNA loops, and WAPL was found in a yeast 2-hybrid screen for EBNA2-interacting host factors. Using CRISPR together with Hi-ChIP and Hi-C DNA architecture analyses, we uncovered WAPL roles in suppressing expression of LMP1 and LMP2A, which mimic signaling by CD40 and B-cell immunoglobulin receptors, respectively. These proteins are expressed together with EBNA1 in the latency II program. We demonstrate that WAPL KO changes EBV genomic architecture, including allowing the formation of DNA loops between the oriLyt enhancers and the LMP promoter regions. Collectively, our study suggests that WAPL reinforces Burkitt latency I by preventing the formation of DNA loops that may instead support the latency II program.