Search | VHL Regional Portal

1.

Optimizing the design of spatial genomic studies.

Jones, Andrew; Cai, Diana; Li, Didong; Engelhardt, Barbara E.

Nat Commun ; 15(1): 4987, 2024 Jun 11.

Article in English | MEDLINE | ID: mdl-38862492

ABSTRACT

Spatial genomic technologies characterize the relationship between the structural organization of cells and their cellular state. Despite the availability of various spatial transcriptomic and proteomic profiling platforms, these experiments remain costly and labor-intensive. Traditionally, tissue slicing for spatial sequencing involves parallel axis-aligned sections, often yielding redundant or correlated information. We propose structured batch experimental design, a method that improves the cost efficiency of spatial genomics experiments by profiling tissue slices that are maximally informative, while recognizing the destructive nature of the process. Applied to two spatial genomics studies-one to construct a spatially-resolved genomic atlas of a tissue and another to localize a region of interest in a tissue, such as a tumor-our approach collects more informative samples using fewer slices compared to traditional slicing strategies. This methodology offers a foundation for developing robust and cost-efficient design strategies, allowing spatial genomics studies to be deployed by smaller, resource-constrained labs.

Subject(s)

Genomics , Genomics/methods , Animals , Humans , Gene Expression Profiling/methods , Mice , Transcriptome , Proteomics/methods , Research Design

2.

Alignment of spatial genomics data using deep Gaussian processes.

Jones, Andrew; Townes, F William; Li, Didong; Engelhardt, Barbara E.

Nat Methods ; 20(9): 1379-1387, 2023 09.

Article in English | MEDLINE | ID: mdl-37592182

ABSTRACT

Spatially resolved genomic technologies have allowed us to study the physical organization of cells and tissues, and promise an understanding of local interactions between cells. However, it remains difficult to precisely align spatial observations across slices, samples, scales, individuals and technologies. Here, we propose a probabilistic model that aligns spatially-resolved samples onto a known or unknown common coordinate system (CCS) with respect to phenotypic readouts (for example, gene expression). Our method, Gaussian Process Spatial Alignment (GPSA), consists of a two-layer Gaussian process: the first layer maps observed samples' spatial locations onto a CCS, and the second layer maps from the CCS to the observed readouts. Our approach enables complex downstream spatially aware analyses that are impossible or inaccurate with unaligned data, including an analysis of variance, creation of a dense three-dimensional (3D) atlas from sparse two-dimensional (2D) slices or association tests across data modalities.

Subject(s)

Genomics , Models, Statistical , Humans , Normal Distribution

3.

Optimizing the design of spatial genomic studies.

Jones, Andrew; Cai, Diana; Li, Didong; Engelhardt, Barbara E.

bioRxiv ; 2023 Jan 31.

Article in English | MEDLINE | ID: mdl-36778332

ABSTRACT

Spatially-resolved genomic technologies have shown promise for studying the relationship between the structural arrangement of cells and their functional behavior. While numerous sequencing and imaging platforms exist for performing spatial transcriptomics and spatial proteomics profiling, these experiments remain expensive and labor-intensive. Thus, when performing spatial genomics experiments using multiple tissue slices, there is a need to select the tissue cross sections that will be maximally informative for the purposes of the experiment. In this work, we formalize the problem of experimental design for spatial genomics experiments, which we generalize into a problem class that we call structured batch experimental design. We propose approaches for optimizing these designs in two types of spatial genomics studies: one in which the goal is to construct a spatially-resolved genomic atlas of a tissue and another in which the goal is to localize a region of interest in a tissue, such as a tumor. We demonstrate the utility of these optimal designs, where each slice is a two-dimensional plane, on several spatial genomics datasets.

4.

Nonnegative spatial factorization applied to spatial genomics.

Townes, F William; Engelhardt, Barbara E.

Nat Methods ; 20(2): 229-238, 2023 02.

Article in English | MEDLINE | ID: mdl-36587187

ABSTRACT

Nonnegative matrix factorization (NMF) is widely used to analyze high-dimensional count data because, in contrast to real-valued alternatives such as factor analysis, it produces an interpretable parts-based representation. However, in applications such as spatial transcriptomics, NMF fails to incorporate known structure between observations. Here, we present nonnegative spatial factorization (NSF), a spatially-aware probabilistic dimension reduction model based on transformed Gaussian processes that naturally encourages sparsity and scales to tens of thousands of observations. NSF recovers ground truth factors more accurately than real-valued alternatives such as MEFISTO in simulations, and has lower out-of-sample prediction error than probabilistic NMF on three spatial transcriptomics datasets from mouse brain and liver. Since not all patterns of gene expression have spatial correlations, we also propose a hybrid extension of NSF that combines spatial and nonspatial components, enabling quantification of spatial importance for both observations and features. A TensorFlow implementation of NSF is available from https://github.com/willtownes/nsf-paper .

Subject(s)

Algorithms , Gene Expression Profiling , Animals , Mice , Gene Expression Profiling/methods , Genomics , Models, Statistical

5.

A Poisson reduced-rank regression model for association mapping in sequencing data.

Fitzgerald, Tiana; Jones, Andrew; Engelhardt, Barbara E.

BMC Bioinformatics ; 23(1): 529, 2022 Dec 08.

Article in English | MEDLINE | ID: mdl-36482321

ABSTRACT

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions. RESULTS: We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses. CONCLUSION: We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensional representations of transcriptional states.

6.

Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues.

Gewirtz, Ariel Dh; Townes, F William; Engelhardt, Barbara E.

Life Sci Alliance ; 5(12)2022 08 17.

Article in English | MEDLINE | ID: mdl-35977827

ABSTRACT

Expression quantitative trait loci (eQTLs), or single-nucleotide polymorphisms that affect average gene expression levels, provide important insights into context-specific gene regulation. Classic eQTL analyses use one-to-one association tests, which test gene-variant pairs individually and ignore correlations induced by gene regulatory networks and linkage disequilibrium. Probabilistic topic models, such as latent Dirichlet allocation, estimate latent topics for a collection of count observations. Prior multimodal frameworks that bridge genotype and expression data assume matched sample numbers between modalities. However, many data sets have a nested structure where one individual has several associated gene expression samples and a single germline genotype vector. Here, we build a telescoping bimodal latent Dirichlet allocation (TBLDA) framework to learn shared topics across gene expression and genotype data that allows multiple RNA sequencing samples to correspond to a single individual's genotype. By using raw count data, our model avoids possible adulteration via normalization procedures. Ancestral structure is captured in a genotype-specific latent space, effectively removing it from shared components. Using GTEx v8 expression data across 10 tissues and genotype data, we show that the estimated topics capture meaningful and robust biological signal in both modalities and identify associations within and across tissue types. We identify 4,645 cis-eQTLs and 995 trans-eQTLs by conducting eQTL mapping between the most informative features in each topic. Our TBLDA model is able to identify associations using raw sequencing count data when the samples in two separate data modalities are matched one-to-many, as is often the case in biological data. Our code is freely available at https://github.com/gewirtz/TBLDA.

Subject(s)

Polymorphism, Single Nucleotide , Quantitative Trait Loci , Gene Expression Regulation , Gene Regulatory Networks , Genotype , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics

7.

Towards 'end-to-end' analysis and understanding of biological timecourse data.

Jena, Siddhartha G; Goglia, Alexander G; Engelhardt, Barbara E.

Biochem J ; 479(11): 1257-1263, 2022 06 17.

Article in English | MEDLINE | ID: mdl-35713413

ABSTRACT

Petabytes of increasingly complex and multidimensional live cell and tissue imaging data are generated every year. These videos hold large promise for understanding biology at a deep and fundamental level, as they capture single-cell and multicellular events occurring over time and space. However, the current modalities for analysis and mining of these data are scattered and user-specific, preventing more unified analyses from being performed over different datasets and obscuring possible scientific insights. Here, we propose a unified pipeline for storage, segmentation, analysis, and statistical parametrization of live cell imaging datasets.

Subject(s)

Datasets as Topic

8.

Guiding Efficient, Effective, and Patient-Oriented Electrolyte Replacement in Critical Care: An Artificial Intelligence Reinforcement Learning Approach.

Prasad, Niranjani; Mandyam, Aishwarya; Chivers, Corey; Draugelis, Michael; Hanson, C William; Engelhardt, Barbara E; Laudanski, Krzysztof.

J Pers Med ; 12(5)2022 Apr 20.

Article in English | MEDLINE | ID: mdl-35629084

ABSTRACT

Both provider- and protocol-driven electrolyte replacement have been linked to the over-prescription of ubiquitous electrolytes. Here, we describe the development and retrospective validation of a data-driven clinical decision support tool that uses reinforcement learning (RL) algorithms to recommend patient-tailored electrolyte replacement policies for ICU patients. We used electronic health records (EHR) data that originated from two institutions (UPHS; MIMIC-IV). The tool uses a set of patient characteristics, such as their physiological and pharmacological state, a pre-defined set of possible repletion actions, and a set of clinical goals to present clinicians with a recommendation for the route and dose of an electrolyte. RL-driven electrolyte repletion substantially reduces the frequency of magnesium and potassium replacements (up to 60%), adjusts the timing of interventions in all three electrolytes considered (potassium, magnesium, and phosphate), and shifts them towards orally administered repletion over intravenous replacement. This shift in recommended treatment limits risk of the potentially harmful effects of over-repletion and implies monetary savings. Overall, the RL-driven electrolyte repletion recommendations reduce excess electrolyte replacements and improve the safety, precision, efficacy, and cost of each electrolyte repletion event, while showing robust performance across patient cohorts and hospital systems.

9.

Hierarchical Gaussian Processes and Mixtures of Experts to Model COVID-19 Patient Trajectories.

Cui, Sunny; Yoo, Elizabeth C; Li, Didong; Laudanski, Krzysztof; Engelhardt, Barbara E.

Pac Symp Biocomput ; 27: 266-277, 2022.

Article in English | MEDLINE | ID: mdl-34890155

ABSTRACT

Gaussian processes (GPs) are a versatile nonparametric model for nonlinear regression and have been widely used to study spatiotemporal phenomena. However, standard GPs offer limited interpretability and generalizability for datasets with naturally occurring hierarchies. With large-scale, rapidly-updating electronic health record (EHR) data, we want to study patient trajectories across diverse patient cohorts while preserving patient subgroup structure. In this work, we partition our cohort of over 2000 COVID-19 patients by sex and ethnicity. We develop and apply a hierarchical Gaussian process and a mixture of experts (MOE) hierarchical GP model to fit patient trajectories on clinical markers of disease progression. A case study for albumin, an effective predictor of COVID-19 patient outcomes, highlights the predictive performance of these models. These hierarchical spatiotemporal models of EHR data bring us a step closer toward our goal of building flexible approaches to capture patient data that can be used in real-time systems*.

Subject(s)

COVID-19 , Cohort Studies , Computational Biology , Electronic Health Records , Humans , SARS-CoV-2

10.

Brain kernel: A new spatial covariance function for fMRI data.

Wu, Anqi; Nastase, Samuel A; Baldassano, Christopher A; Turk-Browne, Nicholas B; Norman, Kenneth A; Engelhardt, Barbara E; Pillow, Jonathan W.

Neuroimage ; 245: 118580, 2021 12 15.

Article in English | MEDLINE | ID: mdl-34740792

ABSTRACT

A key problem in functional magnetic resonance imaging (fMRI) is to estimate spatial activity patterns from noisy high-dimensional signals. Spatial smoothing provides one approach to regularizing such estimates. However, standard smoothing methods ignore the fact that correlations in neural activity may fall off at different rates in different brain areas, or exhibit discontinuities across anatomical or functional boundaries. Moreover, such methods do not exploit the fact that widely separated brain regions may exhibit strong correlations due to bilateral symmetry or the network organization of brain regions. To capture this non-stationary spatial correlation structure, we introduce the brain kernel, a continuous covariance function for whole-brain activity patterns. We define the brain kernel in terms of a continuous nonlinear mapping from 3D brain coordinates to a latent embedding space, parametrized with a Gaussian process (GP). The brain kernel specifies the prior covariance between voxels as a function of the distance between their locations in embedding space. The GP mapping warps the brain nonlinearly so that highly correlated voxels are close together in latent space, and uncorrelated voxels are far apart. We estimate the brain kernel using resting-state fMRI data, and we develop an exact, scalable inference method based on block coordinate descent to overcome the challenges of high dimensionality (10-100K voxels). Finally, we illustrate the brain kernel's usefulness with applications to brain decoding and factor analysis with multiple task-based fMRI datasets.

Subject(s)

Brain Mapping/methods , Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging/methods , Neuroimaging/methods , Humans , Imaging, Three-Dimensional

11.

A self-exciting point process to study multicellular spatial signaling patterns.

Verma, Archit; Jena, Siddhartha G; Isakov, Danielle R; Aoki, Kazuhiro; Toettcher, Jared E; Engelhardt, Barbara E.

Proc Natl Acad Sci U S A ; 118(32)2021 08 10.

Article in English | MEDLINE | ID: mdl-34362843

ABSTRACT

Multicellular organisms rely on spatial signaling among cells to drive their organization, development, and response to stimuli. Several models have been proposed to capture the behavior of spatial signaling in multicellular systems, but existing approaches fail to capture both the autonomous behavior of single cells and the interactions of a cell with its neighbors simultaneously. We propose a spatiotemporal model of dynamic cell signaling based on Hawkes processes-self-exciting point processes-that model the signaling processes within a cell and spatial couplings between cells. With this cellular point process (CPP), we capture both the single-cell pathway activation rate and the magnitude and duration of signaling between cells relative to their spatial location. Furthermore, our model captures tissues composed of heterogeneous cell types with different bursting rates and signaling behaviors across multiple signaling proteins. We apply our model to epithelial cell systems that exhibit a range of autonomous and spatial signaling behaviors basally and under pharmacological exposure. Our model identifies known drug-induced signaling deficits, characterizes signaling changes across a wound front, and generalizes to multichannel observations.

Subject(s)

Keratinocytes/metabolism , Models, Biological , Signal Transduction , Animals , Dipeptides/pharmacology , Dogs , Epithelial Cells , Hydroxamic Acids/pharmacology , Keratinocytes/cytology , Keratinocytes/drug effects , MAP Kinase Signaling System/drug effects , Madin Darby Canine Kidney Cells , Mice, Inbred Strains , Mice, Transgenic , Models, Statistical , Protein Kinase Inhibitors/pharmacology , Signal Transduction/drug effects , Spatio-Temporal Analysis

12.

Joint analysis of expression levels and histological images identifies genes associated with tissue morphology.

Ash, Jordan T; Darnell, Gregory; Munro, Daniel; Engelhardt, Barbara E.

Nat Commun ; 12(1): 1609, 2021 03 11.

Article in English | MEDLINE | ID: mdl-33707455

ABSTRACT

Histopathological images are used to characterize complex phenotypes such as tumor stage. Our goal is to associate features of stained tissue images with high-dimensional genomic markers. We use convolutional autoencoders and sparse canonical correlation analysis (CCA) on paired histological images and bulk gene expression to identify subsets of genes whose expression levels in a tissue sample correlate with subsets of morphological features from the corresponding sample image. We apply our approach, ImageCCA, to two TCGA data sets, and find gene sets associated with the structure of the extracellular matrix and cell wall infrastructure, implicating uncharacterized genes in extracellular processes. We find sets of genes associated with specific cell types, including neuronal cells and cells of the immune system. We apply ImageCCA to the GTEx v6 data, and find image features that capture population variation in thyroid and in colon tissues associated with genetic variants (image morphology QTLs, or imQTLs), suggesting that genetic variation regulates population variation in tissue morphological traits.

Subject(s)

Computational Biology/methods , Gene Expression Regulation, Neoplastic/genetics , Gene Expression/genetics , Neoplasms/pathology , Quantitative Trait Loci/genetics , BRCA1 Protein/genetics , Biomarkers, Tumor/genetics , Cell Membrane/genetics , Cell Membrane/physiology , Extracellular Matrix/genetics , Extracellular Matrix/physiology , Humans , Image Processing, Computer-Assisted , Neoplasms/genetics , Polymorphism, Single Nucleotide/genetics

13.

Optimal marker gene selection for cell type discrimination in single cell analyses.

Dumitrascu, Bianca; Villar, Soledad; Mixon, Dustin G; Engelhardt, Barbara E.

Nat Commun ; 12(1): 1186, 2021 02 19.

Article in English | MEDLINE | ID: mdl-33608535

ABSTRACT

Single-cell technologies characterize complex cell populations across multiple data modalities at unprecedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers that robustly enable the identification and discrimination of specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labels to discriminate, scGeneFit selects gene markers that jointly optimize cell label recovery using label-aware compressive classification methods. This results in a substantially more robust and less redundant set of markers than existing methods, most of which identify markers that separate each cell label from the rest. When applied to a data set given a hierarchy of cell types as labels, the markers found by our method improves the recovery of the cell type hierarchy with fewer markers than existing methods using a computationally efficient and principled optimization.

Subject(s)

Genetic Markers , Single-Cell Analysis/methods , Algorithms , Cluster Analysis , Gene Expression , Gene Expression Profiling/methods , Humans , RNA-Seq , Sequence Analysis, RNA/methods , Transcriptome

14.

Causal network inference from gene transcriptional time-series response to glucocorticoids.

Lu, Jonathan; Dumitrascu, Bianca; McDowell, Ian C; Jo, Brian; Barrera, Alejandro; Hong, Linda K; Leichter, Sarah M; Reddy, Timothy E; Engelhardt, Barbara E.

PLoS Comput Biol ; 17(1): e1008223, 2021 01.

Article in English | MEDLINE | ID: mdl-33513136

ABSTRACT

Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS.

Subject(s)

Glucocorticoids/pharmacology , Models, Statistical , Transcriptome/drug effects , A549 Cells , Algorithms , Computational Biology , Humans , Lung/chemistry , Lung/metabolism , Machine Learning , Software , Transcriptome/genetics

15.

The impact of sex on gene expression across human tissues.

Oliva, Meritxell; Muñoz-Aguirre, Manuel; Kim-Hellmuth, Sarah; Wucher, Valentin; Gewirtz, Ariel D H; Cotter, Daniel J; Parsana, Princy; Kasela, Silva; Balliu, Brunilda; Viñuela, Ana; Castel, Stephane E; Mohammadi, Pejman; Aguet, François; Zou, Yuxin; Khramtsova, Ekaterina A; Skol, Andrew D; Garrido-Martín, Diego; Reverter, Ferran; Brown, Andrew; Evans, Patrick; Gamazon, Eric R; Payne, Anthony; Bonazzola, Rodrigo; Barbeira, Alvaro N; Hamel, Andrew R; Martinez-Perez, Angel; Soria, José Manuel; Pierce, Brandon L; Stephens, Matthew; Eskin, Eleazar; Dermitzakis, Emmanouil T; Segrè, Ayellet V; Im, Hae Kyung; Engelhardt, Barbara E; Ardlie, Kristin G; Montgomery, Stephen B; Battle, Alexis J; Lappalainen, Tuuli; Guigó, Roderic; Stranger, Barbara E.

Science ; 369(6509)2020 09 11.

Article in English | MEDLINE | ID: mdl-32913072

ABSTRACT

Many complex human phenotypes exhibit sex-differentiated characteristics. However, the molecular mechanisms underlying these differences remain largely unknown. We generated a catalog of sex differences in gene expression and in the genetic regulation of gene expression across 44 human tissue sources surveyed by the Genotype-Tissue Expression project (GTEx, v8 release). We demonstrate that sex influences gene expression levels and cellular composition of tissue samples across the human body. A total of 37% of all genes exhibit sex-biased expression in at least one tissue. We identify cis expression quantitative trait loci (eQTLs) with sex-differentiated effects and characterize their cellular origin. By integrating sex-biased eQTLs with genome-wide association study data, we identify 58 gene-trait associations that are driven by genetic regulation of gene expression in a single sex. These findings provide an extensive characterization of sex differences in the human transcriptome and its genetic regulation.

Subject(s)

Gene Expression Regulation , Gene Expression , Sex Characteristics , Chromosomes, Human, X/genetics , Disease/genetics , Epigenesis, Genetic , Female , Genetic Variation , Genome-Wide Association Study , Humans , Male , Organ Specificity , Promoter Regions, Genetic , Quantitative Trait Loci , Sex Factors

16.

A robust nonlinear low-dimensional manifold for single cell RNA-seq data.

Verma, Archit; Engelhardt, Barbara E.

BMC Bioinformatics ; 21(1): 324, 2020 Jul 21.

Article in English | MEDLINE | ID: mdl-32693778

ABSTRACT

BACKGROUND: Modern developments in single-cell sequencing technologies enable broad insights into cellular state. Single-cell RNA sequencing (scRNA-seq) can be used to explore cell types, states, and developmental trajectories to broaden our understanding of cellular heterogeneity in tissues and organs. Analysis of these sparse, high-dimensional experimental results requires dimension reduction. Several methods have been developed to estimate low-dimensional embeddings for filtered and normalized single-cell data. However, methods have yet to be developed for unfiltered and unnormalized count data that estimate uncertainty in the low-dimensional space. We present a nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data. RESULTS: Gene expression in a single cell is modeled as a noisy draw from a Gaussian process in high dimensions from low-dimensional latent positions. This model is called the Gaussian process latent variable model (GPLVM). We model residual errors with a heavy-tailed Student's t-distribution to estimate a manifold that is robust to technical and biological noise found in normalized scRNA-seq data. We compare our approach to common dimension reduction tools across a diverse set of scRNA-seq data sets to highlight our model's ability to enable important downstream tasks such as clustering, inferring cell developmental trajectories, and visualizing high throughput experiments on available experimental data. CONCLUSION: We show that our adaptive robust statistical approach to estimate a nonlinear manifold is well suited for raw, unfiltered gene counts from high-throughput sequencing technologies for visualization, exploration, and uncertainty estimation of cell states.

Subject(s)

Nonlinear Dynamics , RNA-Seq , Single-Cell Analysis/methods , Blood Cells/metabolism , Gene Expression Regulation , Humans , Models, Genetic , Neurons/metabolism , Normal Distribution , Principal Component Analysis , Time Factors

17.

Sparse multi-output Gaussian processes for online medical time series prediction.

Cheng, Li-Fang; Dumitrascu, Bianca; Darnell, Gregory; Chivers, Corey; Draugelis, Michael; Li, Kai; Engelhardt, Barbara E.

BMC Med Inform Decis Mak ; 20(1): 152, 2020 07 08.

Article in English | MEDLINE | ID: mdl-32641134

ABSTRACT

BACKGROUND: For real-time monitoring of hospital patients, high-quality inference of patients' health status using all information available from clinical covariates and lab test results is essential to enable successful medical interventions and improve patient outcomes. Developing a computational framework that can learn from observational large-scale electronic health records (EHRs) and make accurate real-time predictions is a critical step. In this work, we develop and explore a Bayesian nonparametric model based on multi-output Gaussian process (GP) regression for hospital patient monitoring. METHODS: We propose MedGP, a statistical framework that incorporates 24 clinical covariates and supports a rich reference data set from which relationships between observed covariates may be inferred and exploited for high-quality inference of patient state over time. To do this, we develop a highly structured sparse GP kernel to enable tractable computation over tens of thousands of time points while estimating correlations among clinical covariates, patients, and periodicity in patient observations. MedGP has a number of benefits over current methods, including (i) not requiring an alignment of the time series data, (ii) quantifying confidence regions in the predictions, (iii) exploiting a vast and rich database of patients, and (iv) inferring interpretable relationships among clinical covariates. RESULTS: We evaluate and compare results from MedGP on the task of online prediction for three patient subgroups from two medical data sets across 8,043 patients. We find MedGP improves online prediction over baseline and state-of-the-art methods for nearly all covariates across different disease subgroups and hospitals. CONCLUSIONS: The MedGP framework is robust and efficient in estimating the temporal dependencies from sparse and irregularly sampled medical time series data for online prediction. The publicly available code is at https://github.com/bee-hive/MedGP .

Subject(s)

Algorithms , Models, Statistical , Bayes Theorem , Normal Distribution

18.

netNMF-sc: leveraging gene-gene interactions for imputation and dimensionality reduction in single-cell expression analysis.

Elyanow, Rebecca; Dumitrascu, Bianca; Engelhardt, Barbara E; Raphael, Benjamin J.

Genome Res ; 30(2): 195-204, 2020 02.

Article in English | MEDLINE | ID: mdl-31992614

ABSTRACT

Single-cell RNA-sequencing (scRNA-seq) enables high-throughput measurement of RNA expression in single cells. However, because of technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells in a lower-dimensional space, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc learns a low-dimensional representation of scRNA-seq transcript counts using network-regularized non-negative matrix factorization. The network regularization takes advantage of prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be nearby each other in the low-dimensional representation. The resulting matrix factorization imputes gene abundance for both zero and nonzero counts and can be used to cluster cells into meaningful subpopulations. We show that netNMF-sc outperforms existing methods at clustering cells and estimating gene-gene covariance using both simulated and real scRNA-seq data, with increasing advantages at higher dropout rates (e.g., >60%). We also show that the results from netNMF-sc are robust to variation in the input network, with more representative networks leading to greater performance gains.

Subject(s)

Epistasis, Genetic/genetics , RNA-Seq , Single-Cell Analysis/methods , Software , Cluster Analysis , Gene Expression Profiling , Humans , Exome Sequencing

19.

ACE inhibition and cardiometabolic risk factors, lung ACE2 and TMPRSS2 gene expression, and plasma ACE2 levels: a Mendelian randomization study.

Gill, Dipender; Arvanitis, Marios; Carter, Paul; Hernández Cordero, Ana I; Jo, Brian; Karhunen, Ville; Larsson, Susanna C; Li, Xuan; Lockhart, Sam M; Mason, Amy; Pashos, Evanthia; Saha, Ashis; Tan, Vanessa Y; Zuber, Verena; Bossé, Yohan; Fahle, Sarah; Hao, Ke; Jiang, Tao; Joubert, Philippe; Lunt, Alan C; Ouwehand, Willem Hendrik; Roberts, David J; Timens, Wim; van den Berge, Maarten; Watkins, Nicholas A; Battle, Alexis; Butterworth, Adam S; Danesh, John; Di Angelantonio, Emanuele; Engelhardt, Barbara E; Peters, James E; Sin, Don D; Burgess, Stephen.

R Soc Open Sci ; 7(11): 200958, 2020 Nov.

Article in English | MEDLINE | ID: mdl-33391794

ABSTRACT

Angiotensin-converting enzyme 2 (ACE2) and serine protease TMPRSS2 have been implicated in cell entry for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus responsible for coronavirus disease 2019 (COVID-19). The expression of ACE2 and TMPRSS2 in the lung epithelium might have implications for the risk of SARS-CoV-2 infection and severity of COVID-19. We use human genetic variants that proxy angiotensin-converting enzyme (ACE) inhibitor drug effects and cardiovascular risk factors to investigate whether these exposures affect lung ACE2 and TMPRSS2 gene expression and circulating ACE2 levels. We observed no consistent evidence of an association of genetically predicted serum ACE levels with any of our outcomes. There was weak evidence for an association of genetically predicted serum ACE levels with ACE2 gene expression in the Lung eQTL Consortium (p = 0.014), but this finding did not replicate. There was evidence of a positive association of genetic liability to type 2 diabetes mellitus with lung ACE2 gene expression in the Gene-Tissue Expression (GTEx) study (p = 4 × 10-4) and with circulating plasma ACE2 levels in the INTERVAL study (p = 0.03), but not with lung ACE2 expression in the Lung eQTL Consortium study (p = 0.68). There were no associations of genetically proxied liability to the other cardiometabolic traits with any outcome. This study does not provide consistent evidence to support an effect of serum ACE levels (as a proxy for ACE inhibitors) or cardiometabolic risk factors on lung ACE2 and TMPRSS2 expression or plasma ACE2 levels.

20.

An Optimal Policy for Patient Laboratory Tests in Intensive Care Units.

Cheng, Li-Fang; Prasad, Niranjani; Engelhardt, Barbara E.

Pac Symp Biocomput ; 24: 320-331, 2019.

Article in English | MEDLINE | ID: mdl-30864333

ABSTRACT

Laboratory testing is an integral tool in the management of patient care in hospitals, particularly in intensive care units (ICUs). There exists an inherent trade-off in the selection and timing of lab tests between considerations of the expected utility in clinical decision-making of a given test at a specific time, and the associated cost or risk it poses to the patient. In this work, we introduce a framework that learns policies for ordering lab tests which optimizes for this trade-off. Our approach uses batch off-policy reinforcement learning with a composite reward function based on clinical imperatives, applied to data that include examples of clinicians ordering labs for patients. To this end, we develop and extend principles of Pareto optimality to improve the selection of actions based on multiple reward function components while respecting typical procedural considerations and prioritization of clinical goals in the ICU. Our experiments show that we can estimate a policy that reduces the frequency of lab tests and optimizes timing to minimize information redundancy. We also find that the estimated policies typically suggest ordering lab tests well ahead of critical onsets-such as mechanical ventilation or dialysis-that depend on the lab results. We evaluate our approach by quantifying how these policies may initiate earlier onset of treatment.

Subject(s)

Clinical Laboratory Techniques , Intensive Care Units , Acute Kidney Injury/diagnosis , Clinical Laboratory Techniques/statistics & numerical data , Computational Biology , Critical Care/statistics & numerical data , Decision Support Techniques , Humans , Intensive Care Units/organization & administration , Intensive Care Units/statistics & numerical data , Patient Care Management/organization & administration , Patient Care Management/statistics & numerical data , Reinforcement, Psychology , Reward , Sepsis/diagnosis

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL