Search | VHL Regional Portal

Cross-sectional Ct distributions from qPCR tests can provide an early warning signal for the spread of COVID-19 in communities.

Sharmin, Mahfuza; Manivannan, Mani; Woo, David; Sorel, Océane; Auclair, Jared R; Gandhi, Manoj; Mujawar, Imran.

Front Public Health ; 11: 1185720, 2023.

Article in English | MEDLINE | ID: mdl-37841738

ABSTRACT

Background: SARS-CoV-2 PCR testing data has been widely used for COVID-19 surveillance. Existing COVID-19 forecasting models mainly rely on case counts obtained from qPCR results, even though the binary PCR results provide a limited picture of the pandemic trajectory. Most forecasting models have failed to accurately predict the COVID-19 waves before they occur. Recently a model utilizing cross-sectional population cycle threshold (Ct-the number of cycles required for the fluorescent signal to cross the background threshold) values obtained from PCR tests (Ct-based model) was developed to overcome the limitations of using only binary PCR results. In this study, we aimed to improve on COVID-19 forecasting models using features derived from the Ct-based model, to detect epidemic waves earlier than case-based trajectories. Methods: PCR data was collected weekly at Northeastern University (NU) between August 2020 and January 2022. Campus and county epidemic trajectories were generated from case counts. A novel forecasting approach was developed by enhancing a recent deep learning model with Ct-based features and applied in Suffolk County and NU campus. For this, cross-sectional Ct values from PCR data were used to generate Ct-based epidemic trajectories, including effective reproductive rate (Rt) and incidence. The improvement in forecasting performance was compared using absolute errors and residual squared errors with respect to actual observed cases at the 7-day and 14-day forecasting horizons. The model was also tested prospectively over the period January 2022 to April 2022. Results: Rt curves estimated from the Ct-based model indicated epidemic waves 12 to 14 days earlier than Rt curves from NU campus and Suffolk County cases, with a correlation of 0.57. Enhancing the forecasting models with Ct-based information significantly decreased absolute error (decrease of 49.4 and 221.5 for the 7 and 14-day forecasting horizons) and residual squared error (40.6 and 217.1 for the 7 and 14-day forecasting horizons) compared to the original model without Ct features. Conclusion: Ct-based epidemic trajectories can herald an earlier signal for impending epidemic waves in the community and forecast transmission peaks. Moreover, COVID-19 forecasting models can be enhanced using these Ct features to improve their forecasting accuracy. In this study, we make the case that public health agencies should publish Ct values along with the binary positive/negative PCR results. Early and accurate forecasting of epidemic waves can inform public health policies and countermeasures which can mitigate spread.

Subject(s)

COVID-19 , Humans , COVID-19/diagnosis , COVID-19/epidemiology , SARS-CoV-2 , Pandemics , Public Health

BEENE: deep learning-based nonlinear embedding improves batch effect estimation.

Rahman, Md Ashiqur; Tutul, Abdullah Aman; Sharmin, Mahfuza; Bayzid, Md Shamsuzzoha.

Bioinformatics ; 39(8)2023 08 01.

Article in English | MEDLINE | ID: mdl-37561107

ABSTRACT

MOTIVATION: Analyzing large-scale single-cell transcriptomic datasets generated using different technologies is challenging due to the presence of batch-specific systematic variations known as batch effects. Since biological and technological differences are often interspersed, detecting and accounting for batch effects in RNA-seq datasets are critical for effective data integration and interpretation. Low-dimensional embeddings, such as principal component analysis (PCA) are widely used in visual inspection and estimation of batch effects. Linear dimensionality reduction methods like PCA are effective in assessing the presence of batch effects, especially when batch effects exhibit linear patterns. However, batch effects are inherently complex and existing linear dimensionality reduction methods could be inadequate and imprecise in the presence of sophisticated nonlinear batch effects. RESULTS: We present Batch Effect Estimation using Nonlinear Embedding (BEENE), a deep nonlinear auto-encoder network which is specially tailored to generate an alternative lower dimensional embedding suitable for both linear and nonlinear batch effects. BEENE simultaneously learns the batch and biological variables from RNA-seq data, resulting in an embedding that is more robust and sensitive than PCA embedding in terms of detecting and quantifying batch effects. BEENE was assessed on a collection of carefully controlled simulated datasets as well as biological datasets, including two technical replicates of mouse embryogenesis cells, peripheral blood mononuclear cells from three largely different experiments and five studies of pancreatic islet cells. AVAILABILITY AND IMPLEMENTATION: BEENE is freely available as an open source project at https://github.com/ashiq24/BEENE.

Subject(s)

Deep Learning , Animals , Mice , Sequence Analysis, RNA/methods , Leukocytes, Mononuclear , RNA-Seq , Gene Expression Profiling , Single-Cell Analysis/methods

Chromatin accessibility dynamics of neurogenic niche cells reveal defects in neural stem cell adhesion and migration during aging.

Yeo, Robin W; Zhou, Olivia Y; Zhong, Brian L; Sun, Eric D; Navarro Negredo, Paloma; Nair, Surag; Sharmin, Mahfuza; Ruetz, Tyson J; Wilson, Mikaela; Kundaje, Anshul; Dunn, Alexander R; Brunet, Anne.

Nat Aging ; 3(7): 866-893, 2023 07.

Article in English | MEDLINE | ID: mdl-37443352

ABSTRACT

The regenerative potential of brain stem cell niches deteriorates during aging. Yet the mechanisms underlying this decline are largely unknown. Here we characterize genome-wide chromatin accessibility of neurogenic niche cells in vivo during aging. Interestingly, chromatin accessibility at adhesion and migration genes decreases with age in quiescent neural stem cells (NSCs) but increases with age in activated (proliferative) NSCs. Quiescent and activated NSCs exhibit opposing adhesion behaviors during aging: quiescent NSCs become less adhesive, whereas activated NSCs become more adhesive. Old activated NSCs also show decreased migration in vitro and diminished mobilization out of the niche for neurogenesis in vivo. Using tension sensors, we find that aging increases force-producing adhesions in activated NSCs. Inhibiting the cytoskeletal-regulating kinase ROCK reduces these adhesions, restores migration in old activated NSCs in vitro, and boosts neurogenesis in vivo. These results have implications for restoring the migratory potential of NSCs and for improving neurogenesis in the aged brain.

Subject(s)

Chromatin , Neural Stem Cells , Chromatin/genetics , Neurogenesis/genetics , Brain

The dynamic, combinatorial cis-regulatory lexicon of epidermal differentiation.

Kim, Daniel S; Risca, Viviana I; Reynolds, David L; Chappell, James; Rubin, Adam J; Jung, Namyoung; Donohue, Laura K H; Lopez-Pajares, Vanessa; Kathiria, Arwa; Shi, Minyi; Zhao, Zhixin; Deep, Harsh; Sharmin, Mahfuza; Rao, Deepti; Lin, Shin; Chang, Howard Y; Snyder, Michael P; Greenleaf, William J; Kundaje, Anshul; Khavari, Paul A.

Nat Genet ; 53(11): 1564-1576, 2021 11.

Article in English | MEDLINE | ID: mdl-34650237

ABSTRACT

Transcription factors bind DNA sequence motif vocabularies in cis-regulatory elements (CREs) to modulate chromatin state and gene expression during cell state transitions. A quantitative understanding of how motif lexicons influence dynamic regulatory activity has been elusive due to the combinatorial nature of the cis-regulatory code. To address this, we undertook multiomic data profiling of chromatin and expression dynamics across epidermal differentiation to identify 40,103 dynamic CREs associated with 3,609 dynamically expressed genes, then applied an interpretable deep-learning framework to model the cis-regulatory logic of chromatin accessibility. This analysis framework identified cooperative DNA sequence rules in dynamic CREs regulating synchronous gene modules with diverse roles in skin differentiation. Massively parallel reporter assay analysis validated temporal dynamics and cooperative cis-regulatory logic. Variants linked to human polygenic skin disease were enriched in these time-dependent combinatorial motif rules. This integrative approach shows the combinatorial cis-regulatory lexicon of epidermal differentiation and represents a general framework for deciphering the organizational principles of the cis-regulatory code of dynamic gene regulation.

Subject(s)

Epidermis/physiology , Models, Genetic , Regulatory Elements, Transcriptional , Cell Differentiation/genetics , Chromatin/genetics , Epigenome , Gene Expression Regulation , Genes, Reporter , Genome-Wide Association Study , Humans , Keratinocytes/cytology , Keratinocytes/physiology , Neural Networks, Computer , Skin Diseases/genetics , Transcription Factors/genetics

Beyond Synthetic Lethality: Charting the Landscape of Pairwise Gene Expression States Associated with Survival in Cancer.

Magen, Assaf; Das Sahu, Avinash; Lee, Joo Sang; Sharmin, Mahfuza; Lugo, Alexander; Gutkind, J Silvio; Schäffer, Alejandro A; Ruppin, Eytan; Hannenhalli, Sridhar.

Cell Rep ; 28(4): 938-948.e6, 2019 07 23.

Article in English | MEDLINE | ID: mdl-31340155

ABSTRACT

The phenotypic effect of perturbing a gene's activity depends on the activity level of other genes, reflecting the notion that phenotypes are emergent properties of a network of functionally interacting genes. In the context of cancer, contemporary investigations have primarily focused on just one type of functional relationship between two genes-synthetic lethality (SL). Here, we define the more general concept of "survival-associated pairwise gene expression states" (SPAGEs) as gene pairs whose joint expression levels are associated with survival. We describe a data-driven approach called SPAGE-finder that when applied to The Cancer Genome Atlas (TCGA) data identified 71,946 SPAGEs spanning 12 distinct types, only a minority of which are SLs. The detected SPAGEs explain cancer driver genes' tissue specificity and differences in patients' response to drugs and stratify breast cancer tumors into refined subtypes. These results expand the scope of cancer SPAGEs and lay a conceptual basis for future studies of SPAGEs and their translational applications.

Subject(s)

Gene Expression Regulation, Neoplastic , Neoplasms/genetics , Synthetic Lethal Mutations/genetics , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Carcinogenesis/drug effects , Gene Expression Regulation, Neoplastic/drug effects , Genes, Neoplasm , Humans , Neoplasms/drug therapy , Organ Specificity/genetics , Survival Analysis

Prediction and Subtyping of Hypertension from Pan-Tissue Transcriptomic and Genetic Analyses.

Basu, Mahashweta; Sharmin, Mahfuza; Das, Avinash; Nair, Nishanth Ulhas; Wang, Kun; Lee, Joo Sang; Chang, Yen-Pei Christy; Ruppin, Eytan; Hannenhalli, Sridhar.

Genetics ; 207(3): 1121-1134, 2017 11.

Article in English | MEDLINE | ID: mdl-28899996

ABSTRACT

Hypertension (HT) is a complex systemic disease involving transcriptional changes in multiple organs. Here we systematically investigate the pan-tissue transcriptional and genetic landscape of HT spanning dozens of tissues in hundreds of individuals. We find that in several tissues, previously identified HT-linked genes are dysregulated and the gene expression profile is predictive of HT. Importantly, many expression quantitative trait loci (eQTL) SNPs associated with the population variance of the dysregulated genes are linked with blood pressure in an independent genome-wide association study, suggesting that the functional effect of HT-associated SNPs may be mediated through tissue-specific transcriptional dysregulation. Analyses of pan-tissue transcriptional dysregulation profile, as well as eQTL SNPs underlying the dysregulated genes, reveals substantial heterogeneity among the HT patients, revealing two broad groupings - a Diffused group where several tissues exhibit HT-associated molecular alterations and a Localized group where such alterations are localized to very few tissues. These two patient subgroups differ in several clinical phenotypes including respiratory, cerebrovascular, diabetes, and heart disease. These findings suggest that the Diffused and Localized subgroups may be driven by different molecular mechanisms and have different genetic underpinning.

Subject(s)

Genetic Predisposition to Disease , Hypertension/genetics , Models, Genetic , Polymorphism, Single Nucleotide , Transcriptome , Gene Regulatory Networks , Genetic Heterogeneity , Genome-Wide Association Study/methods , Humans , Hypertension/classification , Organ Specificity , Quantitative Trait Loci

Heterogeneity of transcription factor binding specificity models within and across cell lines.

Sharmin, Mahfuza; Bravo, Héctor Corrada; Hannenhalli, Sridhar.

Genome Res ; 26(8): 1110-23, 2016 08.

Article in English | MEDLINE | ID: mdl-27311443

ABSTRACT

Complex gene expression patterns are mediated by the binding of transcription factors (TFs) to specific genomic loci. The in vivo occupancy of a TF is, in large part, determined by the TF's DNA binding interaction partners, motivating genomic context-based models of TF occupancy. However, approaches thus far have assumed a uniform TF binding model to explain genome-wide cell-type-specific binding sites. Therefore, the cell type heterogeneity of TF occupancy models, as well as the extent to which binding rules underlying a TF's occupancy are shared across cell types, has not been investigated. Here, we develop an ensemble-based approach (TRISECT) to identify the heterogeneous binding rules for cell-type-specific TF occupancy and analyze the inter-cell-type sharing of such rules. Comprehensive analysis of 23 TFs, each with ChIP-seq data in four to 12 different cell types, shows that by explicitly capturing the heterogeneity of binding rules, TRISECT accurately identifies in vivo TF occupancy. Importantly, many of the binding rules derived from individual cell types are shared across cell types and reveal distinct yet functionally coherent putative target genes in different cell types. Closer inspection of the predicted cell-type-specific interaction partners provides insights into the context-specific functional landscape of a TF. Together, our novel ensemble-based approach reveals, for the first time, a widespread heterogeneity of binding rules, comprising the interaction partners within a cell type, many of which nevertheless transcend cell types. Notably, the putative targets of shared binding rules in different cell types, while distinct, exhibit significant functional coherence.

Subject(s)

DNA-Binding Proteins/genetics , Genetic Heterogeneity , Protein Binding/genetics , Transcription Factors/genetics , Binding Sites/genetics , Cell Lineage/genetics , Computational Biology , Gene Expression Regulation , Genomics , Humans , Sensitivity and Specificity

Distinct genomic and epigenomic features demarcate hypomethylated blocks in colon cancer.

Sharmin, Mahfuza; Bravo, Héctor Corrada; Hannenhalli, Sridhar.

BMC Cancer ; 16: 88, 2016 Feb 11.

Article in English | MEDLINE | ID: mdl-26868017

ABSTRACT

BACKGROUND: Large mega base-pair genomic regions show robust alterations in DNA methylation levels in multiple cancers. A vast majority of these regions are hypomethylated in cancers. These regions are generally enriched for CpG islands, Lamin Associated Domains and Large organized chromatin lysine modification domains, and are associated with stochastic variability in gene expression. Given the size and consistency of hypomethylated blocks (HMB) across cancer types, we hypothesized that the immediate causes of methylation instability are likely to be encoded in the genomic region near HMB boundaries, in terms of specific genomic or epigenomic signatures. However, a detailed characterization of the HMB boundaries has not been reported. METHOD: Here, we focused on ~13 k HMBs, encompassing approximately half of the genome, identified in colon cancer. We modeled the genomic features of HMB boundaries by Random Forest to identify their salient features, in terms of transcription factor (TF) binding motifs. Additionally we analyzed various epigenomic marks, and chromatin structural features of HMB boundaries relative to the non-HMB genomic regions. RESULT: We found that the classical promoter epigenomic mark--H3K4me3, is highly enriched at HMB boundaries, as are CTCF bound sites. HMB boundaries harbor distinct combinations of TF motifs. Our Random Forest model based on TF motifs can accurately distinguish boundaries not only from regions inside and outside HMBs, but surprisingly, from active promoters as well. Interestingly, the distinguishing TFs and their interacting proteins are involved in chromatin modification. Finally, HMB boundaries significantly coincide with the boundaries of Topologically Associating Domains of the chromatin. CONCLUSION: Our analyses suggest that the overall architecture of HMBs is guided by pre-existing chromatin architecture, and are associated with aberrant activity of promoter-like sequences at the boundary.

Subject(s)

Colonic Neoplasms/genetics , DNA Methylation/genetics , Epigenomics , Genome, Human , Cell Line, Tumor , Chromatin/genetics , Colonic Neoplasms/pathology , CpG Islands/genetics , Histones/genetics , Humans , Promoter Regions, Genetic

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL