Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 32
1.
Nat Commun ; 15(1): 2765, 2024 Mar 29.
Article En | MEDLINE | ID: mdl-38553455

Single-cell technologies can measure the expression of thousands of molecular features in individual cells undergoing dynamic biological processes. While examining cells along a computationally-ordered pseudotime trajectory can reveal how changes in gene or protein expression impact cell fate, identifying such dynamic features is challenging due to the inherent noise in single-cell data. Here, we present DELVE, an unsupervised feature selection method for identifying a representative subset of molecular features which robustly recapitulate cellular trajectories. In contrast to previous work, DELVE uses a bottom-up approach to mitigate the effects of confounding sources of variation, and instead models cell states from dynamic gene or protein modules based on core regulatory complexes. Using simulations, single-cell RNA sequencing, and iterative immunofluorescence imaging data in the context of cell cycle and cellular differentiation, we demonstrate how DELVE selects features that better define cell-types and cell-type transitions. DELVE is available as an open-source python package: https://github.com/jranek/delve .


Gene Expression Profiling , Software , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Cell Differentiation , Cell Cycle/genetics , Sequence Analysis, RNA/methods
2.
BMC Bioinformatics ; 25(1): 25, 2024 Jan 15.
Article En | MEDLINE | ID: mdl-38221640

With the growing number of single-cell datasets collected under more complex experimental conditions, there is an opportunity to leverage single-cell variability to reveal deeper insights into how cells respond to perturbations. Many existing approaches rely on discretizing the data into clusters for differential gene expression (DGE), effectively ironing out any information unveiled by the single-cell variability across cell-types. In addition, DGE often assumes a statistical distribution that, if erroneous, can lead to false positive differentially expressed genes. Here, we present Cellograph: a semi-supervised framework that uses graph neural networks to quantify the effects of perturbations at single-cell granularity. Cellograph not only measures how prototypical cells are of each condition but also learns a latent space that is amenable to interpretable data visualization and clustering. The learned gene weight matrix from training reveals pertinent genes driving the differences between conditions. We demonstrate the utility of our approach on publicly-available datasets including cancer drug therapy, stem cell reprogramming, and organoid differentiation. Cellograph outperforms existing methods for quantifying the effects of experimental perturbations and offers a novel framework to analyze single-cell data using deep learning.


Data Visualization , Neural Networks, Computer , Cell Differentiation , Cluster Analysis , RNA
3.
Genome Biol ; 25(1): 9, 2024 Jan 03.
Article En | MEDLINE | ID: mdl-38172966

BACKGROUND: To analyze the large volume of data generated by single-cell technologies and to identify cellular correlates of particular clinical or experimental outcomes, differential abundance analyses are often applied. These algorithms identify subgroups of cells whose abundances change significantly in response to disease progression, or to an experimental perturbation. Despite the effectiveness of differential abundance analyses in identifying critical cell-states, there is currently no systematic benchmarking study to compare their applicability, usefulness, and accuracy in practice across single-cell modalities. RESULTS: Here, we perform a comprehensive benchmarking study to objectively evaluate and compare the benefits and potential downsides of current state-of-the-art differential abundance testing methods. We benchmarked six single-cell testing methods on several practical tasks, using both synthetic and real single-cell datasets. The tasks evaluated include effectiveness in identifying true differentially abundant subpopulations, accuracy in the adequate handling of batch effects, runtime efficiency, and hyperparameter usability and robustness. Based on various evaluation results, this paper gives dataset-specific suggestions for the practical use of differential abundance testing approaches. CONCLUSIONS: Based on our benchmarking study, we provide a set of recommendations for the optimal usage of single-cell DA testing methods in practice, particularly with respect to factors such as the presence of technical noise (for example batch effects), dataset size, and hyperparameter sensitivity.


Algorithms , Benchmarking , Research Design , Single-Cell Analysis/methods
4.
bioRxiv ; 2024 Jan 10.
Article En | MEDLINE | ID: mdl-38260395

Amyotrophic lateral sclerosis is the most common fatal motor neuron disease. Approximately 90% of ALS patients exhibit pathology of the master RNA regulator, Transactive Response DNA Binding protein (TDP-43). Despite the prevalence TDP-43 pathology in ALS motor neurons, recent findings suggest immune dysfunction is a determinant of disease progression in patients. Whether TDP-43 pathology elicits disease-modifying immune responses in ALS remains underexplored. In this study, we demonstrate that TDP-43 pathology is internalized by antigen presenting cells, causes vesicle rupture, and leads to innate and adaptive immune cell activation. Using a multiplex imaging platform, we observed interactions between innate and adaptive immune cells near TDP-43 pathological lesions in ALS brain. We used a mass cytometry-based whole-blood stimulation assay to provide evidence that ALS patient peripheral immune cells exhibit responses to TDP-43 aggregates. Taken together, this study provides a novel link between TDP-43 pathology and ALS immune dysfunction, and further highlights the translational and diagnostic implications of monitoring and manipulating the ALS immune response.

5.
Nat Comput Sci ; 3(4): 346-359, 2023 Apr.
Article En | MEDLINE | ID: mdl-38116462

Advanced measurement and data storage technologies have enabled high-dimensional profiling of complex biological systems. For this, modern multiomics studies regularly produce datasets with hundreds of thousands of measurements per sample, enabling a new era of precision medicine. Correlation analysis is an important first step to gain deeper insights into the coordination and underlying processes of such complex systems. However, the construction of large correlation networks in modern high-dimensional datasets remains a major computational challenge owing to rapidly growing runtime and memory requirements. Here we address this challenge by introducing CorALS (Correlation Analysis of Large-scale (biological) Systems), an open-source framework for the construction and analysis of large-scale parametric as well as non-parametric correlation networks for high-dimensional biological data. It features off-the-shelf algorithms suitable for both personal and high-performance computers, enabling workflows and downstream analysis approaches. We illustrate the broad scope and potential of CorALS by exploring perspectives on complex biological processes in large-scale multiomics and single-cell studies.

6.
Cell Rep Med ; 4(11): 101268, 2023 11 21.
Article En | MEDLINE | ID: mdl-37949070

In people with HIV (PWH), the post-antiretroviral therapy (ART) window is critical for immune restoration and HIV reservoir stabilization. We employ deep immune profiling and T cell receptor (TCR) sequencing and examine proliferation to assess how ART impacts T cell homeostasis. In PWH on long-term ART, lymphocyte frequencies and phenotypes are mostly stable. By contrast, broad phenotypic changes in natural killer (NK) cells, γδ T cells, B cells, and CD4+ and CD8+ T cells are observed in the post-ART window. Whereas CD8+ T cells mostly restore, memory CD4+ T subsets and cytolytic NK cells show incomplete restoration 1.4 years post ART. Surprisingly, the hierarchies and frequencies of dominant CD4 TCR clonotypes (0.1%-11% of all CD4+ T cells) remain stable post ART, suggesting that clonal homeostasis can be independent of homeostatic processes regulating CD4+ T cell absolute number, phenotypes, and function. The slow restoration of host immunity post ART also has implications for the design of ART interruption studies.


HIV Infections , Immune Reconstitution , Humans , CD8-Positive T-Lymphocytes , HIV Infections/drug therapy , CD4-Positive T-Lymphocytes , Anti-Retroviral Agents/therapeutic use , Receptors, Antigen, T-Cell
7.
bioRxiv ; 2023 Nov 15.
Article En | MEDLINE | ID: mdl-38014189

Single-cell technologies enable high-dimensional profiling of individual cells, therefore offering profound insights into subtle variation between specialized cell-types. However, translating the multitude of nuanced cellular profiles into meaningful per-sample representations is challenging due to heterogeneous cellular composition across individual profiled samples. To compute informative per-sample representations, we developed scLKME, a novel approach that uses a landmark-based kernel mean embedding method to convert multi-sample single-cell data into compact per-sample embeddings. Treating each sample as a distribution over cells, scLKME identifies landmarks across samples and maps these distributions into a reproducing kernel Hilbert space. Overall, scLKME outperforms state-of-the-art techniques in robustness, efficiency, accuracy, and practical usefulness of sample embeddings. Its application on a CyTOF dataset profiling immune responses in preterm birth highlighted its capacity to accurately identify patient-specific variations correlating with gestational age, suggesting broad applicability to multi-sample single-cell datasets with complex experimental designs. scLKME is available as an open-sourced python package at https://github.com/CompCy-lab/scLKME.

8.
bioRxiv ; 2023 May 12.
Article En | MEDLINE | ID: mdl-37214963

Single-cell technologies can readily measure the expression of thousands of molecular features from individual cells undergoing dynamic biological processes, such as cellular differentiation, immune response, and disease progression. While examining cells along a computationally ordered pseudotime offers the potential to study how subtle changes in gene or protein expression impact cell fate decision-making, identifying characteristic features that drive continuous biological processes remains difficult to detect from unenriched and noisy single-cell data. Given that all profiled sources of feature variation contribute to the cell-to-cell distances that define an inferred cellular trajectory, including confounding sources of biological variation (e.g. cell cycle or metabolic state) or noisy and irrelevant features (e.g. measurements with low signal-to-noise ratio) can mask the underlying trajectory of study and hinder inference. Here, we present DELVE (dynamic selection of locally covarying features), an unsupervised feature selection method for identifying a representative subset of dynamically-expressed molecular features that recapitulates cellular trajectories. In contrast to previous work, DELVE uses a bottom-up approach to mitigate the effect of unwanted sources of variation confounding inference, and instead models cell states from dynamic feature modules that constitute core regulatory complexes. Using simulations, single-cell RNA sequencing data, and iterative immunofluorescence imaging data in the context of the cell cycle and cellular differentiation, we demonstrate that DELVE selects features that more accurately characterize cell populations and improve the recovery of cell type transitions. This feature selection framework provides an alternative approach for improving trajectory inference and uncovering co-variation amongst features along a biological trajectory. DELVE is implemented as an open-source python package and is publicly available at: https://github.com/jranek/delve.

9.
bioRxiv ; 2023 Feb 27.
Article En | MEDLINE | ID: mdl-36909641

Modern single-cell data analysis relies on statistical testing (e.g. differential expression testing) to identify genes or proteins that are up-or down-regulated in relation to cell-types or clinical outcomes. However, existing algorithms for such statistical testing are often limited by technical noise and cellular heterogeneity, which lead to false-positive results. To constrain the analysis to a compact and phenotype-related cell population, differential abundance (DA) testing methods were employed to identify subgroups of cells whose abundance changed significantly in response to disease progression, or experimental perturbation. Despite the effectiveness of DA testing algorithms of identifying critical cell-states, there are no systematic benchmarking or comparative studies to compare their usages in practice. Herein, we performed the first comprehensive benchmarking study to objectively evaluate and compare the benefits and potential downsides of current state-of-the-art DA testing methods. We benchmarked six DA testing methods on several practical tasks, using both synthetic and real single-cell datasets. The task evaluated include, recognizing true DA subpopulations, appropriate handing of batch effects, runtime efficiency, and hyperparameter usability and robustness. Based on various evaluation results, this paper gives dataset-specific suggestions for the usage of DA testing methods.

10.
Pac Symp Biocomput ; 28: 85-96, 2023.
Article En | MEDLINE | ID: mdl-36540967

Graph-based algorithms have become essential in the analysis of single-cell data for numerous tasks, such as automated cell-phenotyping and identifying cellular correlates of experimental perturbations or disease states. In large multi-patient, multi-sample single-cell datasets, the analysis of cell-cell similarity graphs representations of these data becomes computationally prohibitive. Here, we introduce cytocoarsening, a novel graph-coarsening algorithm that significantly reduces the size of single-cell graph representations, which can then be used as input to downstream bioinformatics algorithms for improved computational efficiency. Uniquely, cytocoarsening considers both phenotypical similarity of cells and similarity of cells' associated clinical or experimental attributes in order to more readily identify condition-specific cell populations. The resulting coarse graph representations were evaluated based on both their structural correctness and the capacity of downstream algorithms to uncover the same biological conclusions as if the full graph had been used. Cytocoarsening is provided as open source code at https://github.com/ChenCookie/cytocoarsening.


Algorithms , Computational Biology , Humans , Computational Biology/methods , Software
11.
Ann Surg ; 277(3): e503-e512, 2023 03 01.
Article En | MEDLINE | ID: mdl-35129529

OBJECTIVE: The longitudinal assessment of physical function with high temporal resolution at a scalable and objective level in patients recovering from surgery is highly desirable to understand the biological and clinical factors that drive the clinical outcome. However, physical recovery from surgery itself remains poorly defined and the utility of wearable technologies to study recovery after surgery has not been established. BACKGROUND: Prolonged postoperative recovery is often associated with long-lasting impairment of physical, mental, and social functions. Although phenotypical and clinical patient characteristics account for some variation of individual recovery trajectories, biological differences likely play a major role. Specifically, patient-specific immune states have been linked to prolonged physical impairment after surgery. However, current methods of quantifying physical recovery lack patient specificity and objectivity. METHODS: Here, a combined high-fidelity accelerometry and state-of-the-art deep immune profiling approach was studied in patients undergoing major joint replacement surgery. The aim was to determine whether objective physical parameters derived from accelerometry data can accurately track patient-specific physical recovery profiles (suggestive of a 'clock of postoperative recovery'), compare the performance of derived parameters with benchmark metrics including step count, and link individual recovery profiles with patients' preoperative immune state. RESULTS: The results of our models indicate that patient-specific temporal patterns of physical function can be derived with a precision superior to benchmark metrics. Notably, 6 distinct domains of physical function and sleep are identified to represent the objective temporal patterns: ''activity capacity'' and ''moderate and overall activity (declined immediately after surgery); ''sleep disruption and sedentary activity (increased after surgery); ''overall sleep'', ''sleep onset'', and ''light activity'' (no clear changes were observed after surgery). These patterns can be linked to individual patients preopera-tive immune state using cross-validated canonical-correlation analysis. Importantly, the pSTAT3 signal activity in monocytic myeloid-derived suppressor cells predicted a slower recovery. CONCLUSIONS: Accelerometry-based recovery trajectories are scalable and objective outcomes to study patient-specific factors that drive physical recovery.


Benchmarking , Exercise , Humans , Monocytes , Physical Examination , Postoperative Period
12.
Patterns (N Y) ; 3(12): 100655, 2022 Dec 09.
Article En | MEDLINE | ID: mdl-36569558

Preeclampsia is a complex disease of pregnancy whose physiopathology remains unclear. We developed machine-learning models for early prediction of preeclampsia (first 16 weeks of pregnancy) and over gestation by analyzing six omics datasets from a longitudinal cohort of pregnant women. For early pregnancy, a prediction model using nine urine metabolites had the highest accuracy and was validated on an independent cohort (area under the receiver-operating characteristic curve [AUC] = 0.88, 95% confidence interval [CI] [0.76, 0.99] cross-validated; AUC = 0.83, 95% CI [0.62,1] validated). Univariate analysis demonstrated statistical significance of identified metabolites. An integrated multiomics model further improved accuracy (AUC = 0.94). Several biological pathways were identified including tryptophan, caffeine, and arachidonic acid metabolisms. Integration with immune cytometry data suggested novel associations between immune and proteomic dynamics. While further validation in a larger population is necessary, these encouraging results can serve as a basis for a simple, early diagnostic test for preeclampsia.

13.
Genome Biol ; 23(1): 186, 2022 09 05.
Article En | MEDLINE | ID: mdl-36064614

BACKGROUND: Current methods for analyzing single-cell datasets have relied primarily on static gene expression measurements to characterize the molecular state of individual cells. However, capturing temporal changes in cell state is crucial for the interpretation of dynamic phenotypes such as the cell cycle, development, or disease progression. RNA velocity infers the direction and speed of transcriptional changes in individual cells, yet it is unclear how these temporal gene expression modalities may be leveraged for predictive modeling of cellular dynamics. RESULTS: Here, we present the first task-oriented benchmarking study that investigates integration of temporal sequencing modalities for dynamic cell state prediction. We benchmark ten integration approaches on ten datasets spanning different biological contexts, sequencing technologies, and species. We find that integrated data more accurately infers biological trajectories and achieves increased performance on classifying cells according to perturbation and disease states. Furthermore, we show that simple concatenation of spliced and unspliced molecules performs consistently well on classification tasks and can be used over more memory intensive and computationally expensive methods. CONCLUSIONS: This work illustrates how integrated temporal gene expression modalities may be leveraged for predicting cellular trajectories and sample-associated perturbation and disease phenotypes. Additionally, this study provides users with practical recommendations for task-specific integration of single-cell gene expression modalities.


Benchmarking , Single-Cell Analysis , Gene Expression
14.
J Matern Fetal Neonatal Med ; 35(25): 5621-5628, 2022 Dec.
Article En | MEDLINE | ID: mdl-33653202

BACKGROUND: Early identification of pregnant women at risk for preeclampsia (PE) is important, as it will enable targeted interventions ahead of clinical manifestations. The quantitative analyses of plasma proteins feature prominently among molecular approaches used for risk prediction. However, derivation of protein signatures of sufficient predictive power has been challenging. The recent availability of platforms simultaneously assessing over 1000 plasma proteins offers broad examinations of the plasma proteome, which may enable the extraction of proteomic signatures with improved prognostic performance in prenatal care. OBJECTIVE: The primary aim of this study was to examine the generalizability of proteomic signatures predictive of PE in two cohorts of pregnant women whose plasma proteome was interrogated with the same highly multiplexed platform. Establishing generalizability, or lack thereof, is critical to devise strategies facilitating the development of clinically useful predictive tests. A second aim was to examine the generalizability of protein signatures predictive of gestational age (GA) in uncomplicated pregnancies in the same cohorts to contrast physiological and pathological pregnancy outcomes. STUDY DESIGN: Serial blood samples were collected during the first, second, and third trimesters in 18 women who developed PE and 18 women with uncomplicated pregnancies (Stanford cohort). The second cohort (Detroit), used for comparative analysis, consisted of 76 women with PE and 90 women with uncomplicated pregnancies. Multivariate analyses were applied to infer predictive and cohort-specific proteomic models, which were then tested in the alternate cohort. Gene ontology (GO) analysis was performed to identify biological processes that were over-represented among top-ranked proteins associated with PE. RESULTS: The model derived in the Stanford cohort was highly significant (p = 3.9E-15) and predictive (AUC = 0.96), but failed validation in the Detroit cohort (p = 9.7E-01, AUC = 0.50). Similarly, the model derived in the Detroit cohort was highly significant (p = 1.0E-21, AUC = 0.73), but failed validation in the Stanford cohort (p = 7.3E-02, AUC = 0.60). By contrast, proteomic models predicting GA were readily validated across the Stanford (p = 1.1E-454, R = 0.92) and Detroit cohorts (p = 1.1.E-92, R = 0.92) indicating that the proteomic assay performed well enough to infer a generalizable model across studied cohorts, which makes it less likely that technical aspects of the assay, including batch effects, accounted for observed differences. CONCLUSIONS: Results point to a broader issue relevant for proteomic and other omic discovery studies in patient cohorts suffering from a clinical syndrome, such as PE, driven by heterogeneous pathophysiologies. While novel technologies including highly multiplex proteomic arrays and adapted computational algorithms allow for novel discoveries for a particular study cohort, they may not readily generalize across cohorts. A likely reason is that the prevalence of pathophysiologic processes leading up to the "same" clinical syndrome can be distributed differently in different and smaller-sized cohorts. Signatures derived in individual cohorts may simply capture different facets of the spectrum of pathophysiologic processes driving a syndrome. Our findings have important implications for the design of omic studies of a syndrome like PE. They highlight the need for performing such studies in diverse and well-phenotyped patient populations that are large enough to characterize subsets of patients with shared pathophysiologies to then derive subset-specific signatures of sufficient predictive power.


Pre-Eclampsia , Proteomics , Female , Humans , Pregnancy , Proteomics/methods , Pre-Eclampsia/diagnosis , Proteome/metabolism , Biomarkers , Blood Proteins
15.
Ann Surg ; 275(3): 582-590, 2022 03 01.
Article En | MEDLINE | ID: mdl-34954754

OBJECTIVE: The aim of this study was to determine whether single-cell and plasma proteomic elements of the host's immune response to surgery accurately identify patients who develop a surgical site complication (SSC) after major abdominal surgery. SUMMARY BACKGROUND DATA: SSCs may occur in up to 25% of patients undergoing bowel resection, resulting in significant morbidity and economic burden. However, the accurate prediction of SSCs remains clinically challenging. Leveraging high-content proteomic technologies to comprehensively profile patients' immune response to surgery is a promising approach to identify predictive biological factors of SSCs. METHODS: Forty-one patients undergoing non-cancer bowel resection were prospectively enrolled. Blood samples collected before surgery and on postoperative day one (POD1) were analyzed using a combination of single-cell mass cytometry and plasma proteomics. The primary outcome was the occurrence of an SSC, including surgical site infection, anastomotic leak, or wound dehiscence within 30 days of surgery. RESULTS: A multiomic model integrating the single-cell and plasma proteomic data collected on POD1 accurately differentiated patients with (n = 11) and without (n = 30) an SSC [area under the curve (AUC) = 0.86]. Model features included coregulated proinflammatory (eg, IL-6- and MyD88- signaling responses in myeloid cells) and immunosuppressive (eg, JAK/STAT signaling responses in M-MDSCs and Tregs) events preceding an SSC. Importantly, analysis of the immunological data obtained before surgery also yielded a model accurately predicting SSCs (AUC = 0.82). CONCLUSIONS: The multiomic analysis of patients' immune response after surgery and immune state before surgery revealed systemic immune signatures preceding the development of SSCs. Our results suggest that integrating immunological data in perioperative risk assessment paradigms is a plausible strategy to guide individualized clinical care.


Anastomotic Leak/epidemiology , Blood Proteins/analysis , Dietary Proteins/blood , Surgical Wound Dehiscence/epidemiology , Surgical Wound Infection/epidemiology , Adult , Cohort Studies , Digestive System Surgical Procedures , Female , Humans , Male , Middle Aged , Models, Theoretical , Prognosis , Prospective Studies , Proteome , Single-Cell Analysis
17.
Sci Transl Med ; 13(592)2021 05 05.
Article En | MEDLINE | ID: mdl-33952678

Estimating the time of delivery is of high clinical importance because pre- and postterm deviations are associated with complications for the mother and her offspring. However, current estimations are inaccurate. As pregnancy progresses toward labor, major transitions occur in fetomaternal immune, metabolic, and endocrine systems that culminate in birth. The comprehensive characterization of maternal biology that precedes labor is key to understanding these physiological transitions and identifying predictive biomarkers of delivery. Here, a longitudinal study was conducted in 63 women who went into labor spontaneously. More than 7000 plasma analytes and peripheral immune cell responses were analyzed using untargeted mass spectrometry, aptamer-based proteomic technology, and single-cell mass cytometry in serial blood samples collected during the last 100 days of pregnancy. The high-dimensional dataset was integrated into a multiomic model that predicted the time to spontaneous labor [R = 0.85, 95% confidence interval (CI) [0.79 to 0.89], P = 1.2 × 10-40, N = 53, training set; R = 0.81, 95% CI [0.61 to 0.91], P = 3.9 × 10-7, N = 10, independent test set]. Coordinated alterations in maternal metabolome, proteome, and immunome marked a molecular shift from pregnancy maintenance to prelabor biology 2 to 4 weeks before delivery. A surge in steroid hormone metabolites and interleukin-1 receptor type 4 that preceded labor coincided with a switch from immune activation to regulation of inflammatory responses. Our study lays the groundwork for developing blood-based methods for predicting the day of labor, anchored in mechanisms shared in preterm and term pregnancies.


Labor Onset , Metabolome , Proteome , Biomarkers , Female , Humans , Labor Onset/immunology , Labor Onset/metabolism , Longitudinal Studies , Pregnancy
18.
JAMA Netw Open ; 3(12): e2029655, 2020 12 01.
Article En | MEDLINE | ID: mdl-33337494

Importance: Worldwide, preterm birth (PTB) is the single largest cause of deaths in the perinatal and neonatal period and is associated with increased morbidity in young children. The cause of PTB is multifactorial, and the development of generalizable biological models may enable early detection and guide therapeutic studies. Objective: To investigate the ability of transcriptomics and proteomics profiling of plasma and metabolomics analysis of urine to identify early biological measurements associated with PTB. Design, Setting, and Participants: This diagnostic/prognostic study analyzed plasma and urine samples collected from May 2014 to June 2017 from pregnant women in 5 biorepository cohorts in low- and middle-income countries (LMICs; ie, Matlab, Bangladesh; Lusaka, Zambia; Sylhet, Bangladesh; Karachi, Pakistan; and Pemba, Tanzania). These cohorts were established to study maternal and fetal outcomes and were supported by the Alliance for Maternal and Newborn Health Improvement and the Global Alliance to Prevent Prematurity and Stillbirth biorepositories. Data were analyzed from December 2018 to July 2019. Exposures: Blood and urine specimens that were collected early during pregnancy (median sampling time of 13.6 weeks of gestation, according to ultrasonography) were processed, stored, and shipped to the laboratories under uniform protocols. Plasma samples were assayed for targeted measurement of proteins and untargeted cell-free ribonucleic acid profiling; urine samples were assayed for metabolites. Main Outcomes and Measures: The PTB phenotype was defined as the delivery of a live infant before completing 37 weeks of gestation. Results: Of the 81 pregnant women included in this study, 39 had PTBs (48.1%) and 42 had term pregnancies (51.9%) (mean [SD] age of 24.8 [5.3] years). Univariate analysis demonstrated functional biological differences across the 5 cohorts. A cohort-adjusted machine learning algorithm was applied to each biological data set, and then a higher-level machine learning modeling combined the results into a final integrative model. The integrated model was more accurate, with an area under the receiver operating characteristic curve (AUROC) of 0.83 (95% CI, 0.72-0.91) compared with the models derived for each independent biological modality (transcriptomics AUROC, 0.73 [95% CI, 0.61-0.83]; metabolomics AUROC, 0.59 [95% CI, 0.47-0.72]; and proteomics AUROC, 0.75 [95% CI, 0.64-0.85]). Primary features associated with PTB included an inflammatory module as well as a metabolomic module measured in urine associated with the glutamine and glutamate metabolism and valine, leucine, and isoleucine biosynthesis pathways. Conclusions and Relevance: This study found that, in LMICs and high PTB settings, major biological adaptations during term pregnancy follow a generalizable model and the predictive accuracy for PTB was augmented by combining various omics data sets, suggesting that PTB is a condition that manifests within multiple biological systems. These data sets, with machine learning partnerships, may be a key step in developing valuable predictive tests and intervention candidates for preventing PTB.


Gene Expression Profiling/methods , Metabolomics/methods , Perinatal Care , Pregnancy , Premature Birth , Quality Improvement/organization & administration , Adult , Causality , Clinical Decision Rules , Developing Countries , Early Diagnosis , Female , Gestational Age , Humans , Infant, Newborn , Machine Learning , Perinatal Care/methods , Perinatal Care/standards , Perinatal Mortality , Pregnancy/blood , Pregnancy/urine , Pregnancy Outcome/epidemiology , Premature Birth/diagnosis , Premature Birth/epidemiology , Premature Birth/prevention & control
19.
Nat Mach Intell ; 2(10): 619-628, 2020 Oct.
Article En | MEDLINE | ID: mdl-33294774

The dense network of interconnected cellular signalling responses that are quantifiable in peripheral immune cells provides a wealth of actionable immunological insights. Although high-throughput single-cell profiling techniques, including polychromatic flow and mass cytometry, have matured to a point that enables detailed immune profiling of patients in numerous clinical settings, the limited cohort size and high dimensionality of data increase the possibility of false-positive discoveries and model overfitting. We introduce a generalizable machine learning platform, the immunological Elastic-Net (iEN), which incorporates immunological knowledge directly into the predictive models. Importantly, the algorithm maintains the exploratory nature of the high-dimensional dataset, allowing for the inclusion of immune features with strong predictive capabilities even if not consistent with prior knowledge. In three independent studies our method demonstrates improved predictions for clinically relevant outcomes from mass cytometry data generated from whole blood, as well as a large simulated dataset. The iEN is available under an open-source licence.

20.
Sci Adv ; 6(48)2020 11.
Article En | MEDLINE | ID: mdl-33239300

Peripheral blood mononuclear cells (PBMCs) may provide insight into the pathogenesis of Alzheimer's disease (AD) or Parkinson's disease (PD). We investigated PBMC samples from 132 well-characterized research participants using seven canonical immune stimulants, mass cytometric identification of 35 PBMC subsets, and single-cell quantification of 15 intracellular signaling markers, followed by machine learning model development to increase predictive power. From these, three main intracellular signaling pathways were identified specifically in PBMC subsets from people with AD versus controls: reduced activation of PLCγ2 across many cell types and stimulations and selectively variable activation of STAT1 and STAT5, depending on stimulant and cell type. Our findings functionally buttress the now multiply-validated observation that a rare coding variant in PLCG2 is associated with a decreased risk of AD. Together, these data suggest enhanced PLCγ2 activity as a potential new therapeutic target for AD with a readily accessible pharmacodynamic biomarker.


Alzheimer Disease , Parkinson Disease , Alzheimer Disease/drug therapy , Biomarkers , Humans , Leukocytes, Mononuclear , Phospholipase C gamma
...