Búsqueda | Portal Regional de la BVS

1.

Optimizing the design of spatial genomic studies.

Jones, Andrew; Cai, Diana; Li, Didong; Engelhardt, Barbara E.

Nat Commun ; 15(1): 4987, 2024 Jun 11.

Artículo en Inglés | MEDLINE | ID: mdl-38862492

RESUMEN

Spatial genomic technologies characterize the relationship between the structural organization of cells and their cellular state. Despite the availability of various spatial transcriptomic and proteomic profiling platforms, these experiments remain costly and labor-intensive. Traditionally, tissue slicing for spatial sequencing involves parallel axis-aligned sections, often yielding redundant or correlated information. We propose structured batch experimental design, a method that improves the cost efficiency of spatial genomics experiments by profiling tissue slices that are maximally informative, while recognizing the destructive nature of the process. Applied to two spatial genomics studies-one to construct a spatially-resolved genomic atlas of a tissue and another to localize a region of interest in a tissue, such as a tumor-our approach collects more informative samples using fewer slices compared to traditional slicing strategies. This methodology offers a foundation for developing robust and cost-efficient design strategies, allowing spatial genomics studies to be deployed by smaller, resource-constrained labs.

Asunto(s)

Genómica , Genómica/métodos , Animales , Humanos , Perfilación de la Expresión Génica/métodos , Ratones , Transcriptoma , Proteómica/métodos , Proyectos de Investigación

2.

Communication as a Key Performance Indicator in Employer Branding in the Context of the Social Economy-A Quantitative Study.

Heide, Michael P; Prodan, Silvana; Lazaroiu, George; Kreis-Engelhardt, Barbara; Ghigiu, Alexandru-Mihai.

Behav Sci (Basel) ; 14(4)2024 Apr 07.

Artículo en Inglés | MEDLINE | ID: mdl-38667099

RESUMEN

Performance measurement refers to the systematic evaluation and analysis of the performance and results of business processes, initiatives, or strategies. This study discusses the crucial role of communication using signaling theory in employer branding in the context of the social economy organization (SEO). The aim is to measure employee satisfaction in concrete terms and to determine the status quo of the communication culture of the organization under investigation in order to develop an employer branding strategy based on the results. The authors use an employee survey as a quantitative research method and limit the data collection to the EU member state of Germany considering the research background. The results provide insights into the specific communication policy in relation to employer branding. The focus here is on (digital) communication. Organizations need to understand how communication strategies directly influence the perception of the employer brand in the social economy. Furthermore, practical implications are derived in order to increase employer attractiveness. Concrete recommendations of action for SEOs should help them be successful in the competition for qualified specialists and talent.

3.

Alignment of spatial genomics data using deep Gaussian processes.

Jones, Andrew; Townes, F William; Li, Didong; Engelhardt, Barbara E.

Nat Methods ; 20(9): 1379-1387, 2023 09.

Artículo en Inglés | MEDLINE | ID: mdl-37592182

RESUMEN

Spatially resolved genomic technologies have allowed us to study the physical organization of cells and tissues, and promise an understanding of local interactions between cells. However, it remains difficult to precisely align spatial observations across slices, samples, scales, individuals and technologies. Here, we propose a probabilistic model that aligns spatially-resolved samples onto a known or unknown common coordinate system (CCS) with respect to phenotypic readouts (for example, gene expression). Our method, Gaussian Process Spatial Alignment (GPSA), consists of a two-layer Gaussian process: the first layer maps observed samples' spatial locations onto a CCS, and the second layer maps from the CCS to the observed readouts. Our approach enables complex downstream spatially aware analyses that are impossible or inaccurate with unaligned data, including an analysis of variance, creation of a dense three-dimensional (3D) atlas from sparse two-dimensional (2D) slices or association tests across data modalities.

Asunto(s)

Genómica , Modelos Estadísticos , Humanos , Distribución Normal

4.

Optimizing the design of spatial genomic studies.

Jones, Andrew; Cai, Diana; Li, Didong; Engelhardt, Barbara E.

bioRxiv ; 2023 Jan 31.

Artículo en Inglés | MEDLINE | ID: mdl-36778332

RESUMEN

Spatially-resolved genomic technologies have shown promise for studying the relationship between the structural arrangement of cells and their functional behavior. While numerous sequencing and imaging platforms exist for performing spatial transcriptomics and spatial proteomics profiling, these experiments remain expensive and labor-intensive. Thus, when performing spatial genomics experiments using multiple tissue slices, there is a need to select the tissue cross sections that will be maximally informative for the purposes of the experiment. In this work, we formalize the problem of experimental design for spatial genomics experiments, which we generalize into a problem class that we call structured batch experimental design. We propose approaches for optimizing these designs in two types of spatial genomics studies: one in which the goal is to construct a spatially-resolved genomic atlas of a tissue and another in which the goal is to localize a region of interest in a tissue, such as a tumor. We demonstrate the utility of these optimal designs, where each slice is a two-dimensional plane, on several spatial genomics datasets.

5.

Nonnegative spatial factorization applied to spatial genomics.

Townes, F William; Engelhardt, Barbara E.

Nat Methods ; 20(2): 229-238, 2023 02.

Artículo en Inglés | MEDLINE | ID: mdl-36587187

RESUMEN

Nonnegative matrix factorization (NMF) is widely used to analyze high-dimensional count data because, in contrast to real-valued alternatives such as factor analysis, it produces an interpretable parts-based representation. However, in applications such as spatial transcriptomics, NMF fails to incorporate known structure between observations. Here, we present nonnegative spatial factorization (NSF), a spatially-aware probabilistic dimension reduction model based on transformed Gaussian processes that naturally encourages sparsity and scales to tens of thousands of observations. NSF recovers ground truth factors more accurately than real-valued alternatives such as MEFISTO in simulations, and has lower out-of-sample prediction error than probabilistic NMF on three spatial transcriptomics datasets from mouse brain and liver. Since not all patterns of gene expression have spatial correlations, we also propose a hybrid extension of NSF that combines spatial and nonspatial components, enabling quantification of spatial importance for both observations and features. A TensorFlow implementation of NSF is available from https://github.com/willtownes/nsf-paper .

Asunto(s)

Algoritmos , Perfilación de la Expresión Génica , Animales , Ratones , Perfilación de la Expresión Génica/métodos , Genómica , Modelos Estadísticos

6.

A Poisson reduced-rank regression model for association mapping in sequencing data.

Fitzgerald, Tiana; Jones, Andrew; Engelhardt, Barbara E.

BMC Bioinformatics ; 23(1): 529, 2022 Dec 08.

Artículo en Inglés | MEDLINE | ID: mdl-36482321

RESUMEN

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions. RESULTS: We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses. CONCLUSION: We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensional representations of transcriptional states.

7.

Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues.

Gewirtz, Ariel Dh; Townes, F William; Engelhardt, Barbara E.

Life Sci Alliance ; 5(12)2022 08 17.

Artículo en Inglés | MEDLINE | ID: mdl-35977827

RESUMEN

Expression quantitative trait loci (eQTLs), or single-nucleotide polymorphisms that affect average gene expression levels, provide important insights into context-specific gene regulation. Classic eQTL analyses use one-to-one association tests, which test gene-variant pairs individually and ignore correlations induced by gene regulatory networks and linkage disequilibrium. Probabilistic topic models, such as latent Dirichlet allocation, estimate latent topics for a collection of count observations. Prior multimodal frameworks that bridge genotype and expression data assume matched sample numbers between modalities. However, many data sets have a nested structure where one individual has several associated gene expression samples and a single germline genotype vector. Here, we build a telescoping bimodal latent Dirichlet allocation (TBLDA) framework to learn shared topics across gene expression and genotype data that allows multiple RNA sequencing samples to correspond to a single individual's genotype. By using raw count data, our model avoids possible adulteration via normalization procedures. Ancestral structure is captured in a genotype-specific latent space, effectively removing it from shared components. Using GTEx v8 expression data across 10 tissues and genotype data, we show that the estimated topics capture meaningful and robust biological signal in both modalities and identify associations within and across tissue types. We identify 4,645 cis-eQTLs and 995 trans-eQTLs by conducting eQTL mapping between the most informative features in each topic. Our TBLDA model is able to identify associations using raw sequencing count data when the samples in two separate data modalities are matched one-to-many, as is often the case in biological data. Our code is freely available at https://github.com/gewirtz/TBLDA.

Asunto(s)

Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Genotipo , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética

8.

Towards 'end-to-end' analysis and understanding of biological timecourse data.

Jena, Siddhartha G; Goglia, Alexander G; Engelhardt, Barbara E.

Biochem J ; 479(11): 1257-1263, 2022 06 17.

Artículo en Inglés | MEDLINE | ID: mdl-35713413

RESUMEN

Petabytes of increasingly complex and multidimensional live cell and tissue imaging data are generated every year. These videos hold large promise for understanding biology at a deep and fundamental level, as they capture single-cell and multicellular events occurring over time and space. However, the current modalities for analysis and mining of these data are scattered and user-specific, preventing more unified analyses from being performed over different datasets and obscuring possible scientific insights. Here, we propose a unified pipeline for storage, segmentation, analysis, and statistical parametrization of live cell imaging datasets.

Asunto(s)

Conjuntos de Datos como Asunto

9.

Guiding Efficient, Effective, and Patient-Oriented Electrolyte Replacement in Critical Care: An Artificial Intelligence Reinforcement Learning Approach.

Prasad, Niranjani; Mandyam, Aishwarya; Chivers, Corey; Draugelis, Michael; Hanson, C William; Engelhardt, Barbara E; Laudanski, Krzysztof.

J Pers Med ; 12(5)2022 Apr 20.

Artículo en Inglés | MEDLINE | ID: mdl-35629084

RESUMEN

Both provider- and protocol-driven electrolyte replacement have been linked to the over-prescription of ubiquitous electrolytes. Here, we describe the development and retrospective validation of a data-driven clinical decision support tool that uses reinforcement learning (RL) algorithms to recommend patient-tailored electrolyte replacement policies for ICU patients. We used electronic health records (EHR) data that originated from two institutions (UPHS; MIMIC-IV). The tool uses a set of patient characteristics, such as their physiological and pharmacological state, a pre-defined set of possible repletion actions, and a set of clinical goals to present clinicians with a recommendation for the route and dose of an electrolyte. RL-driven electrolyte repletion substantially reduces the frequency of magnesium and potassium replacements (up to 60%), adjusts the timing of interventions in all three electrolytes considered (potassium, magnesium, and phosphate), and shifts them towards orally administered repletion over intravenous replacement. This shift in recommended treatment limits risk of the potentially harmful effects of over-repletion and implies monetary savings. Overall, the RL-driven electrolyte repletion recommendations reduce excess electrolyte replacements and improve the safety, precision, efficacy, and cost of each electrolyte repletion event, while showing robust performance across patient cohorts and hospital systems.

10.

Hierarchical Gaussian Processes and Mixtures of Experts to Model COVID-19 Patient Trajectories.

Cui, Sunny; Yoo, Elizabeth C; Li, Didong; Laudanski, Krzysztof; Engelhardt, Barbara E.

Pac Symp Biocomput ; 27: 266-277, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-34890155

RESUMEN

Gaussian processes (GPs) are a versatile nonparametric model for nonlinear regression and have been widely used to study spatiotemporal phenomena. However, standard GPs offer limited interpretability and generalizability for datasets with naturally occurring hierarchies. With large-scale, rapidly-updating electronic health record (EHR) data, we want to study patient trajectories across diverse patient cohorts while preserving patient subgroup structure. In this work, we partition our cohort of over 2000 COVID-19 patients by sex and ethnicity. We develop and apply a hierarchical Gaussian process and a mixture of experts (MOE) hierarchical GP model to fit patient trajectories on clinical markers of disease progression. A case study for albumin, an effective predictor of COVID-19 patient outcomes, highlights the predictive performance of these models. These hierarchical spatiotemporal models of EHR data bring us a step closer toward our goal of building flexible approaches to capture patient data that can be used in real-time systems*.

Asunto(s)

COVID-19 , Estudios de Cohortes , Biología Computacional , Registros Electrónicos de Salud , Humanos , SARS-CoV-2

11.

Brain kernel: A new spatial covariance function for fMRI data.

Wu, Anqi; Nastase, Samuel A; Baldassano, Christopher A; Turk-Browne, Nicholas B; Norman, Kenneth A; Engelhardt, Barbara E; Pillow, Jonathan W.

Neuroimage ; 245: 118580, 2021 12 15.

Artículo en Inglés | MEDLINE | ID: mdl-34740792

RESUMEN

A key problem in functional magnetic resonance imaging (fMRI) is to estimate spatial activity patterns from noisy high-dimensional signals. Spatial smoothing provides one approach to regularizing such estimates. However, standard smoothing methods ignore the fact that correlations in neural activity may fall off at different rates in different brain areas, or exhibit discontinuities across anatomical or functional boundaries. Moreover, such methods do not exploit the fact that widely separated brain regions may exhibit strong correlations due to bilateral symmetry or the network organization of brain regions. To capture this non-stationary spatial correlation structure, we introduce the brain kernel, a continuous covariance function for whole-brain activity patterns. We define the brain kernel in terms of a continuous nonlinear mapping from 3D brain coordinates to a latent embedding space, parametrized with a Gaussian process (GP). The brain kernel specifies the prior covariance between voxels as a function of the distance between their locations in embedding space. The GP mapping warps the brain nonlinearly so that highly correlated voxels are close together in latent space, and uncorrelated voxels are far apart. We estimate the brain kernel using resting-state fMRI data, and we develop an exact, scalable inference method based on block coordinate descent to overcome the challenges of high dimensionality (10-100K voxels). Finally, we illustrate the brain kernel's usefulness with applications to brain decoding and factor analysis with multiple task-based fMRI datasets.

Asunto(s)

Mapeo Encefálico/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos , Neuroimagen/métodos , Humanos , Imagenología Tridimensional

12.

A self-exciting point process to study multicellular spatial signaling patterns.

Verma, Archit; Jena, Siddhartha G; Isakov, Danielle R; Aoki, Kazuhiro; Toettcher, Jared E; Engelhardt, Barbara E.

Proc Natl Acad Sci U S A ; 118(32)2021 08 10.

Artículo en Inglés | MEDLINE | ID: mdl-34362843

RESUMEN

Multicellular organisms rely on spatial signaling among cells to drive their organization, development, and response to stimuli. Several models have been proposed to capture the behavior of spatial signaling in multicellular systems, but existing approaches fail to capture both the autonomous behavior of single cells and the interactions of a cell with its neighbors simultaneously. We propose a spatiotemporal model of dynamic cell signaling based on Hawkes processes-self-exciting point processes-that model the signaling processes within a cell and spatial couplings between cells. With this cellular point process (CPP), we capture both the single-cell pathway activation rate and the magnitude and duration of signaling between cells relative to their spatial location. Furthermore, our model captures tissues composed of heterogeneous cell types with different bursting rates and signaling behaviors across multiple signaling proteins. We apply our model to epithelial cell systems that exhibit a range of autonomous and spatial signaling behaviors basally and under pharmacological exposure. Our model identifies known drug-induced signaling deficits, characterizes signaling changes across a wound front, and generalizes to multichannel observations.

Asunto(s)

Queratinocitos/metabolismo , Modelos Biológicos , Transducción de Señal , Animales , Dipéptidos/farmacología , Perros , Células Epiteliales , Ácidos Hidroxámicos/farmacología , Queratinocitos/citología , Queratinocitos/efectos de los fármacos , Sistema de Señalización de MAP Quinasas/efectos de los fármacos , Células de Riñón Canino Madin Darby , Ratones Endogámicos , Ratones Transgénicos , Modelos Estadísticos , Inhibidores de Proteínas Quinasas/farmacología , Transducción de Señal/efectos de los fármacos , Análisis Espacio-Temporal

13.

Joint analysis of expression levels and histological images identifies genes associated with tissue morphology.

Ash, Jordan T; Darnell, Gregory; Munro, Daniel; Engelhardt, Barbara E.

Nat Commun ; 12(1): 1609, 2021 03 11.

Artículo en Inglés | MEDLINE | ID: mdl-33707455

RESUMEN

Histopathological images are used to characterize complex phenotypes such as tumor stage. Our goal is to associate features of stained tissue images with high-dimensional genomic markers. We use convolutional autoencoders and sparse canonical correlation analysis (CCA) on paired histological images and bulk gene expression to identify subsets of genes whose expression levels in a tissue sample correlate with subsets of morphological features from the corresponding sample image. We apply our approach, ImageCCA, to two TCGA data sets, and find gene sets associated with the structure of the extracellular matrix and cell wall infrastructure, implicating uncharacterized genes in extracellular processes. We find sets of genes associated with specific cell types, including neuronal cells and cells of the immune system. We apply ImageCCA to the GTEx v6 data, and find image features that capture population variation in thyroid and in colon tissues associated with genetic variants (image morphology QTLs, or imQTLs), suggesting that genetic variation regulates population variation in tissue morphological traits.

Asunto(s)

Biología Computacional/métodos , Regulación Neoplásica de la Expresión Génica/genética , Expresión Génica/genética , Neoplasias/patología , Sitios de Carácter Cuantitativo/genética , Proteína BRCA1/genética , Biomarcadores de Tumor/genética , Membrana Celular/genética , Membrana Celular/fisiología , Matriz Extracelular/genética , Matriz Extracelular/fisiología , Humanos , Procesamiento de Imagen Asistido por Computador , Neoplasias/genética , Polimorfismo de Nucleótido Simple/genética

14.

Optimal marker gene selection for cell type discrimination in single cell analyses.

Dumitrascu, Bianca; Villar, Soledad; Mixon, Dustin G; Engelhardt, Barbara E.

Nat Commun ; 12(1): 1186, 2021 02 19.

Artículo en Inglés | MEDLINE | ID: mdl-33608535

RESUMEN

Single-cell technologies characterize complex cell populations across multiple data modalities at unprecedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers that robustly enable the identification and discrimination of specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labels to discriminate, scGeneFit selects gene markers that jointly optimize cell label recovery using label-aware compressive classification methods. This results in a substantially more robust and less redundant set of markers than existing methods, most of which identify markers that separate each cell label from the rest. When applied to a data set given a hierarchy of cell types as labels, the markers found by our method improves the recovery of the cell type hierarchy with fewer markers than existing methods using a computationally efficient and principled optimization.

Asunto(s)

Marcadores Genéticos , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados , Expresión Génica , Perfilación de la Expresión Génica/métodos , Humanos , RNA-Seq , Análisis de Secuencia de ARN/métodos , Transcriptoma

15.

Causal network inference from gene transcriptional time-series response to glucocorticoids.

Lu, Jonathan; Dumitrascu, Bianca; McDowell, Ian C; Jo, Brian; Barrera, Alejandro; Hong, Linda K; Leichter, Sarah M; Reddy, Timothy E; Engelhardt, Barbara E.

PLoS Comput Biol ; 17(1): e1008223, 2021 01.

Artículo en Inglés | MEDLINE | ID: mdl-33513136

RESUMEN

Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS.

Asunto(s)

Glucocorticoides/farmacología , Modelos Estadísticos , Transcriptoma/efectos de los fármacos , Células A549 , Algoritmos , Biología Computacional , Humanos , Pulmón/química , Pulmón/metabolismo , Aprendizaje Automático , Programas Informáticos , Transcriptoma/genética

16.

The impact of sex on gene expression across human tissues.

Oliva, Meritxell; Muñoz-Aguirre, Manuel; Kim-Hellmuth, Sarah; Wucher, Valentin; Gewirtz, Ariel D H; Cotter, Daniel J; Parsana, Princy; Kasela, Silva; Balliu, Brunilda; Viñuela, Ana; Castel, Stephane E; Mohammadi, Pejman; Aguet, François; Zou, Yuxin; Khramtsova, Ekaterina A; Skol, Andrew D; Garrido-Martín, Diego; Reverter, Ferran; Brown, Andrew; Evans, Patrick; Gamazon, Eric R; Payne, Anthony; Bonazzola, Rodrigo; Barbeira, Alvaro N; Hamel, Andrew R; Martinez-Perez, Angel; Soria, José Manuel; Pierce, Brandon L; Stephens, Matthew; Eskin, Eleazar; Dermitzakis, Emmanouil T; Segrè, Ayellet V; Im, Hae Kyung; Engelhardt, Barbara E; Ardlie, Kristin G; Montgomery, Stephen B; Battle, Alexis J; Lappalainen, Tuuli; Guigó, Roderic; Stranger, Barbara E.

Science ; 369(6509)2020 09 11.

Artículo en Inglés | MEDLINE | ID: mdl-32913072

RESUMEN

Many complex human phenotypes exhibit sex-differentiated characteristics. However, the molecular mechanisms underlying these differences remain largely unknown. We generated a catalog of sex differences in gene expression and in the genetic regulation of gene expression across 44 human tissue sources surveyed by the Genotype-Tissue Expression project (GTEx, v8 release). We demonstrate that sex influences gene expression levels and cellular composition of tissue samples across the human body. A total of 37% of all genes exhibit sex-biased expression in at least one tissue. We identify cis expression quantitative trait loci (eQTLs) with sex-differentiated effects and characterize their cellular origin. By integrating sex-biased eQTLs with genome-wide association study data, we identify 58 gene-trait associations that are driven by genetic regulation of gene expression in a single sex. These findings provide an extensive characterization of sex differences in the human transcriptome and its genetic regulation.

Asunto(s)

Regulación de la Expresión Génica , Expresión Génica , Caracteres Sexuales , Cromosomas Humanos X/genética , Enfermedad/genética , Epigénesis Genética , Femenino , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Especificidad de Órganos , Regiones Promotoras Genéticas , Sitios de Carácter Cuantitativo , Factores Sexuales

17.

A robust nonlinear low-dimensional manifold for single cell RNA-seq data.

Verma, Archit; Engelhardt, Barbara E.

BMC Bioinformatics ; 21(1): 324, 2020 Jul 21.

Artículo en Inglés | MEDLINE | ID: mdl-32693778

RESUMEN

BACKGROUND: Modern developments in single-cell sequencing technologies enable broad insights into cellular state. Single-cell RNA sequencing (scRNA-seq) can be used to explore cell types, states, and developmental trajectories to broaden our understanding of cellular heterogeneity in tissues and organs. Analysis of these sparse, high-dimensional experimental results requires dimension reduction. Several methods have been developed to estimate low-dimensional embeddings for filtered and normalized single-cell data. However, methods have yet to be developed for unfiltered and unnormalized count data that estimate uncertainty in the low-dimensional space. We present a nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data. RESULTS: Gene expression in a single cell is modeled as a noisy draw from a Gaussian process in high dimensions from low-dimensional latent positions. This model is called the Gaussian process latent variable model (GPLVM). We model residual errors with a heavy-tailed Student's t-distribution to estimate a manifold that is robust to technical and biological noise found in normalized scRNA-seq data. We compare our approach to common dimension reduction tools across a diverse set of scRNA-seq data sets to highlight our model's ability to enable important downstream tasks such as clustering, inferring cell developmental trajectories, and visualizing high throughput experiments on available experimental data. CONCLUSION: We show that our adaptive robust statistical approach to estimate a nonlinear manifold is well suited for raw, unfiltered gene counts from high-throughput sequencing technologies for visualization, exploration, and uncertainty estimation of cell states.

Asunto(s)

Dinámicas no Lineales , RNA-Seq , Análisis de la Célula Individual/métodos , Células Sanguíneas/metabolismo , Regulación de la Expresión Génica , Humanos , Modelos Genéticos , Neuronas/metabolismo , Distribución Normal , Análisis de Componente Principal , Factores de Tiempo

18.

Sparse multi-output Gaussian processes for online medical time series prediction.

Cheng, Li-Fang; Dumitrascu, Bianca; Darnell, Gregory; Chivers, Corey; Draugelis, Michael; Li, Kai; Engelhardt, Barbara E.

BMC Med Inform Decis Mak ; 20(1): 152, 2020 07 08.

Artículo en Inglés | MEDLINE | ID: mdl-32641134

RESUMEN

BACKGROUND: For real-time monitoring of hospital patients, high-quality inference of patients' health status using all information available from clinical covariates and lab test results is essential to enable successful medical interventions and improve patient outcomes. Developing a computational framework that can learn from observational large-scale electronic health records (EHRs) and make accurate real-time predictions is a critical step. In this work, we develop and explore a Bayesian nonparametric model based on multi-output Gaussian process (GP) regression for hospital patient monitoring. METHODS: We propose MedGP, a statistical framework that incorporates 24 clinical covariates and supports a rich reference data set from which relationships between observed covariates may be inferred and exploited for high-quality inference of patient state over time. To do this, we develop a highly structured sparse GP kernel to enable tractable computation over tens of thousands of time points while estimating correlations among clinical covariates, patients, and periodicity in patient observations. MedGP has a number of benefits over current methods, including (i) not requiring an alignment of the time series data, (ii) quantifying confidence regions in the predictions, (iii) exploiting a vast and rich database of patients, and (iv) inferring interpretable relationships among clinical covariates. RESULTS: We evaluate and compare results from MedGP on the task of online prediction for three patient subgroups from two medical data sets across 8,043 patients. We find MedGP improves online prediction over baseline and state-of-the-art methods for nearly all covariates across different disease subgroups and hospitals. CONCLUSIONS: The MedGP framework is robust and efficient in estimating the temporal dependencies from sparse and irregularly sampled medical time series data for online prediction. The publicly available code is at https://github.com/bee-hive/MedGP .

Asunto(s)

Algoritmos , Modelos Estadísticos , Teorema de Bayes , Distribución Normal

19.

netNMF-sc: leveraging gene-gene interactions for imputation and dimensionality reduction in single-cell expression analysis.

Elyanow, Rebecca; Dumitrascu, Bianca; Engelhardt, Barbara E; Raphael, Benjamin J.

Genome Res ; 30(2): 195-204, 2020 02.

Artículo en Inglés | MEDLINE | ID: mdl-31992614

RESUMEN

Single-cell RNA-sequencing (scRNA-seq) enables high-throughput measurement of RNA expression in single cells. However, because of technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells in a lower-dimensional space, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc learns a low-dimensional representation of scRNA-seq transcript counts using network-regularized non-negative matrix factorization. The network regularization takes advantage of prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be nearby each other in the low-dimensional representation. The resulting matrix factorization imputes gene abundance for both zero and nonzero counts and can be used to cluster cells into meaningful subpopulations. We show that netNMF-sc outperforms existing methods at clustering cells and estimating gene-gene covariance using both simulated and real scRNA-seq data, with increasing advantages at higher dropout rates (e.g., >60%). We also show that the results from netNMF-sc are robust to variation in the input network, with more representative networks leading to greater performance gains.

Asunto(s)

Epistasis Genética/genética , RNA-Seq , Análisis de la Célula Individual/métodos , Programas Informáticos , Análisis por Conglomerados , Perfilación de la Expresión Génica , Humanos , Secuenciación del Exoma

20.

Neurodevelopmental Outcomes of Neonates Randomized to Morphine or Methadone for Treatment of Neonatal Abstinence Syndrome.

Czynski, Adam J; Davis, Jonathan M; Dansereau, Lynne M; Engelhardt, Barbara; Marro, Peter; Bogen, Debra L; Hudak, Mark L; Shenberger, Jeffrey; Wachman, Elisha M; Oliveira, Erica L; Lester, Barry M.

J Pediatr ; 219: 146-151.e1, 2020 04.

Artículo en Inglés | MEDLINE | ID: mdl-31987653

RESUMEN

OBJECTIVE: To evaluate the effects of pharmacologic treatment of neonatal abstinence syndrome on neurodevelopmental outcome from a randomized, controlled trial. STUDY DESIGN: Eight sites enrolled 116 full-term newborn infants with neonatal abstinence syndrome born to mothers maintained on methadone or buprenorphine into a randomized trial of morphine vs methadone. Ninety-nine infants (85%) were evaluated at hospital discharge using the NICU Network Neurobehavioral Scale. At 18 months, 83 of 99 infants (83.8%) were evaluated with the Bayley Scales of Infant and Toddler Development-Third Edition and 77 of 99 (77.7%) with the Child Behavior Checklist (CBCL). RESULTS: Primary analyses showed no significant differences between treatment groups on the NICU Network Neurobehavioral Scale, Bayley Scales of Infant and Toddler Development-Third Edition, or CBCL. However in post hoc analyses, we found differences by atypical NICU Network Neurobehavioral Scale profile on the CBCL. Infants receiving adjunctive phenobarbital had lower Bayley Scales of Infant and Toddler Development-Third Edition scores and more behavior problems on the CBCL. In adjusted analyses, internalizing and total behavior problems were associated with use of phenobarbital (P = .03; P = .04), maternal psychological distress (measured by the Brief Symptom Inventory) (both P < .01), and infant medical problems (both P = .02). Externalizing problems were associated with maternal psychological distress (P < .01) and continued maternal substance use (P < .01). CONCLUSIONS: Infants treated with either morphine or methadone had similar short-term and longer term neurobehavioral outcomes. Neurodevelopmental outcome may be related to the need for phenobarbital, overall health of the infant, and postnatal caregiving environment. TRIAL REGISTRATION: ClinicalTrials.gov: NCT01958476.

Asunto(s)

Metadona/farmacología , Metadona/uso terapéutico , Morfina/farmacología , Morfina/uso terapéutico , Narcóticos/farmacología , Narcóticos/uso terapéutico , Síndrome de Abstinencia Neonatal/tratamiento farmacológico , Sistema Nervioso/efectos de los fármacos , Sistema Nervioso/crecimiento & desarrollo , Femenino , Humanos , Lactante , Recién Nacido , Masculino , Fenobarbital/uso terapéutico

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA