RESUMEN
We introduce a multilevel functional Beta model to quantify the blood glucose levels measured by continuous glucose monitors for multiple days in study participants with type 2 diabetes mellitus. The model estimates the subject-specific marginal quantiles, quantifies the within- and between-subject variability, and produces interpretable parameters of blood glucose dynamics as a function of time from the actigraphy-estimated sleep onset. Results are validated via simulations and by studying the association between the estimated model parameters and hemoglobin A1c, the gold standard for assessing glucose control in diabetes.
Asunto(s)
Diabetes Mellitus Tipo 1 , Diabetes Mellitus Tipo 2 , Glucemia , Automonitorización de la Glucosa Sanguínea/métodos , Hemoglobina Glucada/análisis , Humanos , SueñoRESUMEN
The prevalence of data collected on the same set of samples from multiple sources (i.e., multi-view data) has prompted significant development of data integration methods based on low-rank matrix factorizations. These methods decompose signal matrices from each view into the sum of shared and individual structures, which are further used for dimension reduction, exploratory analyses, and quantifying associations across views. However, existing methods have limitations in modeling partially-shared structures due to either too restrictive models, or restrictive identifiability conditions. To address these challenges, we propose a new formulation for signal structures that include partially-shared signals based on grouping the views into so-called hierarchical levels with identifiable guarantees under suitable conditions. The proposed hierarchy leads us to introduce a new penalty, hierarchical nuclear norm (HNN), for signal estimation. In contrast to existing methods, HNN penalization avoids scores and loadings factorization of the signals and leads to a convex optimization problem, which we solve using a dual forward-backward algorithm. We propose a simple refitting procedure to adjust the penalization bias and develop an adapted version of bi-cross-validation for selecting tuning parameters. Extensive simulation studies and analysis of the genotype-tissue expression data demonstrate the advantages of our method over existing alternatives.
Asunto(s)
Algoritmos , Simulación por ComputadorRESUMEN
Continuous glucose monitors (CGMs) are increasingly used to measure blood glucose levels and provide information about the treatment and management of diabetes. Our motivating study contains CGM data during sleep for 174 study participants with type II diabetes mellitus measured at a 5-min frequency for an average of 10 nights. We aim to quantify the effects of diabetes medications and sleep apnea severity on glucose levels. Statistically, this is an inference question about the association between scalar covariates and functional responses observed at multiple visits (sleep periods). However, many characteristics of the data make analyses difficult, including (1) nonstationary within-period patterns; (2) substantial between-period heterogeneity, non-Gaussianity, and outliers; and (3) large dimensionality due to the number of study participants, sleep periods, and time points. For our analyses, we evaluate and compare two methods: fast univariate inference (FUI) and functional additive mixed models (FAMMs). We extend FUI and introduce a new approach for testing the hypotheses of no effect and time invariance of the covariates. We also highlight areas for further methodological development for FAMM. Our study reveals that (1) biguanide medication and sleep apnea severity significantly affect glucose trajectories during sleep and (2) the estimated effects are time invariant.
Asunto(s)
Diabetes Mellitus Tipo 2 , Síndromes de la Apnea del Sueño , Humanos , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Sueño , Glucemia/análisis , Glucosa/uso terapéuticoRESUMEN
Multi-view data, which is matched sets of measurements on the same subjects, have become increasingly common with advances in multi-omics technology. Often, it is of interest to find associations between the views that are related to the intrinsic class memberships. Existing association methods cannot directly incorporate class information, while existing classification methods do not take into account between-views associations. In this work, we propose a framework for Joint Association and Classification Analysis of multi-view data (JACA). Our goal is not to merely improve the misclassification rates, but to provide a latent representation of high-dimensional data that is both relevant for the subtype discrimination and coherent across the views. We motivate the methodology by establishing a connection between canonical correlation analysis and discriminant analysis. We also establish the estimation consistency of JACA in high-dimensional settings. A distinct advantage of JACA is that it can be applied to the multi-view data with block-missing structure, that is to cases where a subset of views or class labels is missing for some subjects. The application of JACA to quantify the associations between RNAseq and miRNA views with respect to consensus molecular subtypes in colorectal cancer data from The Cancer Genome Atlas project leads to improved misclassification rates and stronger found associations compared to existing methods.
Asunto(s)
MicroARNs , HumanosRESUMEN
The increased availability of multi-view data (data on the same samples from multiple sources) has led to strong interest in models based on low-rank matrix factorizations. These models represent each data view via shared and individual components, and have been successfully applied for exploratory dimension reduction, association analysis between the views, and consensus clustering. Despite these advances, there remain challenges in modeling partially-shared components and identifying the number of components of each type (shared/partially-shared/individual). We formulate a novel linked component model that directly incorporates partially-shared structures. We call this model SLIDE for Structural Learning and Integrative DEcomposition of multi-view data. The proposed model-fitting and selection techniques allow for joint identification of the number of components of each type, in contrast to existing sequential approaches. In our empirical studies, SLIDE demonstrates excellent performance in both signal estimation and component selection. We further illustrate the methodology on the breast cancer data from The Cancer Genome Atlas repository.
Asunto(s)
Aprendizaje , Modelos Estadísticos , Análisis de Componente Principal/métodos , Algoritmos , Neoplasias de la Mama , Análisis por Conglomerados , HumanosRESUMEN
Introduction: This study assesses the person-specific impact of extreme heat on low-income households using wearable sensors. The focus is on the intensive and longitudinal assessment of physical activity and sleep with the rising person-specific ambient temperature. Methods: This study recruited 30 participants in a low-income and predominantly Black community in Houston, Texas in August and September of 2022. Each participant wore on his/her wrist an accelerometer that recorded person-specific ambient temperature, sedentary behavior, physical activity intensity (low and moderate to vigorous), and sleep efficiency 24â h over 14 days. Mixed effects models were used to analyze associations among physical activity, sleep, and person-specific ambient temperature. Results: The main findings include increased sedentary time, sleep impairment with the rise of person-level ambient temperature, and the mitigating role of AC. Conclusions: Extreme heat negatively affects physical activity and sleep. The negative consequences are especially critical for those with limited use of AC in lower-income neighborhoods of color. Staying home with a high indoor temperature during hot days can lead to various adverse health outcomes including accelerated cognitive decline, higher cancer risk, and social isolation.
RESUMEN
Background: Continuous glucose monitors (CGMs) are increasingly used to provide detailed quantification of glycemic control and glucose variability. An open-source R package iglu has been developed to assist with automatic CGM metrics computation and data visualization, providing a comprehensive list of implemented CGM metrics. Motivated by the recent international consensus statement on CGM metrics and recommendations from recent reviews of available CGM software, we present an updated version of iglu with improved accessibility and expanded functionality. Methods: The functionality was expanded to include automated computation of hypo- and hyperglycemia episodes with corresponding visualizations, composite metrics of glycemic control (glycemia risk index and personal glycemic state), and glycemic metrics associated with postprandial excursions. The algorithm for mean amplitude of glycemic excursions has been updated for improved accuracy, and the corresponding visualization has been added. Automated hierarchical clustering capabilities have been added to facilitate statistical analysis. Accessibility was improved by providing support for the automatic processing of common data formats, expanding the graphical user interface, and providing mirrored functionality in Python. Results: The updated version of iglu has been released to the Comprehensive R Archive Network (CRAN) as version 4. The corresponding Python wrapper has been released to the Python Package Index (PyPI) as version 1. The new functionality has been demonstrated using CGM data from 19 subjects with prediabetes and type 2 diabetes. Conclusions: An updated version of iglu provides comprehensive and accessible software for analyses of CGM data that meets the needs of researchers with varying levels of programming experience. It is freely available on CRAN and on GitHub at https://github.com/irinagain/iglu.
RESUMEN
Despite the world-wide prevalence of hypertension, there is a lack in open-source software for analyzing blood pressure data. The R package bp fills this gap by providing functionality for blood pressure data processing, visualization, and feature extraction. In addition to the comprehensive functionality, the package includes six sample data sets covering continuous arterial pressure data (AP), home blood pressure monitoring data (HBPM) and ambulatory blood pressure monitoring data (ABPM), making it easier for researchers to get started. The R package bp is publicly available on CRAN and at https://github.com/johnschwenck/bp.
Asunto(s)
Monitoreo Ambulatorio de la Presión Arterial , Hipertensión , Presión Sanguínea , Determinación de la Presión Sanguínea , Humanos , Hipertensión/diagnóstico , Hipertensión/epidemiología , PrevalenciaRESUMEN
OBJECTIVES: Continuous glucose monitoring (CGM) provides temporal data on glycemic variability, a predictor of outcomes related to type 2 diabetes mellitus. The current study sought to determine whether CGM-derived metrics in patients with type 2 diabetes are different in moderate-to-severe versus mild obstructive sleep apnea (OSA). METHODS: In adults with type 2 diabetes, home testing was used of assess the presence of OSA. CGM data were collected for at least 7 days in those with an oxygen desaturation index (ODI) ≥ 5 events/hr. The study sample was divided into mild (ODI: 5.0-14.9 events/hr) and moderate-to-severe OSA (ODI ≥15 events/hr). Actigraphy was used to distinguish the wake and sleep periods. CGM-derived metrics were compared between the two groups using multivariable regression models. RESULTS: Compared to mild OSA, patients with moderate-to-severe OSA had higher mean glucose levels during sleep (adjusted difference 8.4 mg/dL; p-value: 0.03) and wakefulness (adjusted difference 7.1 mg/dL; p-value: 0.06). Moderate-to-severe OSA patients also had lower odds for having their glucose values within the acceptable range during wakefulness than those with mild OSA (adjusted odds ratio of 0.63; p-value: 0.02). The mean amplitude of glycemic excursion and standard deviation of the rate of change in glucose values (SD-ROC) were higher in moderate-to-severe than mild OSA, but only during wakefulness. Sex modified the association between OSA severity and SD-ROC, but not the other CGM-derived metrics. CONCLUSIONS: In patients with type 2 diabetes, moderate-to-severe OSA is associated with greater abnormalities in CGM-derived metrics than mild OSA with notable differences between sleep and wakefulness.
Asunto(s)
Diabetes Mellitus Tipo 2 , Apnea Obstructiva del Sueño , Adulto , Glucemia , Automonitorización de la Glucosa Sanguínea , Diabetes Mellitus Tipo 2/complicaciones , Glucosa , Humanos , Oxígeno , Apnea Obstructiva del Sueño/complicacionesRESUMEN
Latent Gaussian copula models provide a powerful means to perform multi-view data integration since these models can seamlessly express dependencies between mixed variable types (binary, continuous, zero-inflated) via latent Gaussian correlations. The estimation of these latent correlations, however, comes at considerable computational cost, having prevented the routine use of these models on high-dimensional data. Here, we propose a new computational approach for estimating latent correlations via a hybrid multilinear interpolation and optimization scheme. Our approach speeds up the current state of the art computation by several orders of magnitude, thus allowing fast computation of latent Gaussian copula models even when the number of variables p is large. We provide theoretical guarantees for the approximation error of our numerical scheme and support its excellent performance on simulated and real-world data. We illustrate the practical advantages of our method on high-dimensional sparse quantitative and relative abundance microbiome data as well as multi-view data from The Cancer Genome Atlas Project. Our method is implemented in the R package mixedCCA, available at https://github.com/irinagain/mixedCCA.
RESUMEN
Continuous Glucose Monitoring (CGM) data play an increasing role in clinical practice as they provide detailed quantification of blood glucose levels during the entire 24-hour period. The R package iglu implements a wide range of CGM-derived metrics for measuring glucose control and glucose variability. The package also allows one to visualize CGM data using time-series and lasagna plots. A distinct advantage of iglu is that it comes with a point-and-click graphical user interface (GUI) which makes the package widely accessible to users regardless of their programming experience. Thus, the open-source and easy to use iglu package will help advance CGM research and CGM data analyses. R package iglu is publicly available on CRAN and at https://github.com/irinagain/iglu.
Asunto(s)
Automonitorización de la Glucosa Sanguínea/instrumentación , Glucemia/análisis , Diabetes Mellitus/sangre , Programas Informáticos , Automonitorización de la Glucosa Sanguínea/métodos , Análisis de Datos , Diabetes Mellitus/diagnóstico , Manejo de la Enfermedad , HumanosRESUMEN
Methadone, a widely prescribed medication for chronic pain and opioid addiction, is associated with respiratory depression and increased predisposition for torsades de pointes, a potentially fatal arrhythmia. Most methadone-related deaths occur during sleep. The objective of this study was to determine whether methadone's arrhythmogenic effects increase during sleep, with a focus on cardiac repolarization instability using QT variability index (QTVI), a measure shown to predict arrhythmias and mortality. Sleep study data of 24 patients on chronic methadone therapy referred to a tertiary clinic for overnight polysomnography were compared with two matched groups not on methadone: 24 patients referred for overnight polysomnography to the same clinic (clinic group), and 24 volunteers who had overnight polysomnography at home (community group). Despite similar values for heart rate, heart rate variability, corrected QT interval, QTVI, and oxygen saturation (SpO2 ) when awake, patients on methadone had larger QTVI (P = 0.015 vs. clinic, P < 0.001 vs. community) and lower SpO2 (P = 0.008 vs. clinic, P = 0.013 vs. community) during sleep, and the increase in their QTVI during sleep vs. wakefulness correlated with the decrease in SpO2 (r = -0.54, P = 0.013). QTVI positively correlated with methadone dose during sleep (r = 0.51, P = 0.012) and wakefulness (r = 0.73, P < 0.001). High-density ectopy (> 1,000 premature beats per median sleep period), a precursor for torsades de pointes, was uncommon but more frequent in patients on methadone (P = 0.039). This study demonstrates that chronic methadone use is associated with increased cardiac repolarization instability. Methadone's pro-arrhythmic impact may be mediated by sleep-related hypoxemia, which could explain the increased nocturnal mortality associated with this opioid.
Asunto(s)
Analgésicos Opioides/efectos adversos , Arritmias Cardíacas/inducido químicamente , Sistema de Conducción Cardíaco/fisiopatología , Metadona/efectos adversos , Sueño , Adulto , Anciano , Anciano de 80 o más Años , Analgésicos Opioides/uso terapéutico , Electrocardiografía , Femenino , Humanos , Masculino , Metadona/uso terapéutico , Persona de Mediana Edad , PolisomnografíaRESUMEN
Canonical correlation analysis investigates linear relationships between two sets of variables, but often works poorly on modern datasets due to high-dimensionality and mixed data types (continuous/binary/zero-inflated). We propose a new approach for sparse canonical correlation analysis of mixed data types that does not require explicit parametric assumptions. Our main contribution is the use of truncated latent Gaussian copula to model the data with excess zeroes, which allows us to derive a rank-based estimator of latent correlation matrix without the estimation of marginal transformation functions. The resulting semiparametric sparse canonical correlation analysis method works well in high-dimensional settings as demonstrated via numerical studies, and application to the analysis of association between gene expression and micro RNA data of breast cancer patients.
RESUMEN
BACKGROUND: Melanoma causes the vast majority of deaths attributable to skin cancer, largely due to its propensity for metastasis. To date, few studies have examined molecular changes between primary cutaneous melanoma and adjacent putatively normal skin. To broaden temporal inferences related to initiation of disease, we performed a metabolomics investigation of primary melanoma and matched extratumoral microenvironment (EM) tissues; and, to make inferences about progressive disease, we also compared unmatched metastatic melanoma tissues to EM tissues. METHODS: Ultra-high performance liquid chromatography-mass spectrometry-based metabolic profiling was performed on frozen human tissues. RESULTS: We observed 824 metabolites as differentially abundant among 33 matched tissue samples, and 1,118 metabolites as differentially abundant between metastatic melanoma (n = 46) and EM (n = 34) after false discovery rate (FDR) adjustment (p<0.01). No significant differences in metabolite abundances were noted comparing primary and metastatic melanoma tissues. CONCLUSIONS: Overall, pathway-based results significantly distinguished melanoma tissues from EM in the metabolism of: ascorbate and aldarate, propanoate, tryptophan, histidine, and pyrimidine. Within pathways, the majority of individual metabolite abundances observed in comparisons of primary melanoma vs. EM and metastatic melanoma vs. EM were directionally consistent. This observed concordance suggests most identified compounds are implicated in the initiation or maintenance of melanoma.
Asunto(s)
Melanoma , Metaboloma , Neoplasias Cutáneas , Microambiente Tumoral , Adulto , Anciano , Anciano de 80 o más Años , Cromatografía Líquida de Alta Presión/métodos , Femenino , Humanos , Masculino , Espectrometría de Masas/métodos , Melanoma/metabolismo , Melanoma/secundario , Metabolómica/métodos , Persona de Mediana Edad , Neoplasias Cutáneas/metabolismo , Neoplasias Cutáneas/secundario , Adulto Joven , Melanoma Cutáneo MalignoRESUMEN
We consider the problem of high-dimensional classification between two groups with unequal covariance matrices. Rather than estimating the full quadratic discriminant rule, we propose to perform simultaneous variable selection and linear dimension reduction on the original data, with the subsequent application of quadratic discriminant analysis on the reduced space. In contrast to quadratic discriminant analysis, the proposed framework doesn't require the estimation of precision matrices; it scales linearly with the number of measurements, making it especially attractive for the use on high-dimensional datasets. We support the methodology with theoretical guarantees on variable selection consistency, and empirical comparisons with competing approaches. We apply the method to gene expression data of breast cancer patients, and confirm the crucial importance of the ESR1 gene in differentiating estrogen receptor status.
RESUMEN
High-throughput microbial sequencing techniques, such as targeted amplicon-based and metagenomic profiling, provide low-cost genomic survey data of microbial communities in their natural environment, ranging from marine ecosystems to host-associated habitats. While standard microbiome profiling data can provide sparse relative abundances of operational taxonomic units or genes, recent advances in experimental protocols give a more quantitative picture of microbial communities by pairing sequencing-based techniques with orthogonal measurements of microbial cell counts from the same sample. These tandem measurements provide absolute microbial count data albeit with a large excess of zeros due to limited sequencing depth. In this contribution we consider the fundamental statistical problem of estimating correlations and partial correlations from such quantitative microbiome data. To this end, we propose a semi-parametric rank-based approach to correlation estimation that can naturally deal with the excess zeros in the data. Combining this estimator with sparse graphical modeling techniques leads to the Semi-Parametric Rank-based approach for INference in Graphical model (SPRING). SPRING enables inference of statistical microbial association networks from quantitative microbiome data which can serve as high-level statistical summary of the underlying microbial ecosystem and can provide testable hypotheses for functional species-species interactions. Due to the absence of verified microbial associations we also introduce a novel quantitative microbiome data generation mechanism which mimics empirical marginal distributions of measured count data while simultaneously allowing user-specified dependencies among the variables. SPRING shows superior network recovery performance on a wide range of realistic benchmark problems with varying network topologies and is robust to misspecifications of the total cell count estimate. To highlight SPRING's broad applicability we infer taxon-taxon associations from the American Gut Project data and genus-genus associations from a recent quantitative gut microbiome dataset. We believe that, as quantitative microbiome profiling data will become increasingly available, the semi-parametric estimators for correlation and partial correlation estimation introduced here provide an important tool for reliable statistical analysis of quantitative microbiome data.
RESUMEN
BACKGROUND: Urine protein loss is common in dogs with chronic kidney disease (CKD). Currently available noninvasive means of evaluating CKD in dogs cannot accurately predict the severity of glomerular and tubulointerstitial damage. Electrophoretic analysis of urine proteins can indicate the compromised renal compartment (glomerular vs tubular), but extensive evaluation of protein banding pattern associations with histologic damage severity has not been performed in dogs. OBJECTIVES: We aimed to evaluate electrophoretic banding patterns as indicators of the presence and severity of glomerular and tubulointerstitial damage in dogs with naturally occurring, predominantly proteinuric CKD. METHODS: We performed a retrospective study using urine and renal tissue from 207 dogs with CKD. Urine protein banding patterns were correlated with histologic severity of renal damage. Sensitivity and specificity of banding patterns for the detection of glomerular and tubulointerstitial damage were determined. RESULTS: Banding patterns were 97% sensitive and 100% specific for the detection of glomerular damage and 90% sensitive and 100% specific for the detection of tubulointerstitial damage. Correlations between composite banding patterns and the severity of renal damage were strong, while glomerular banding patterns correlated moderately with glomerular damage severity, and tubular gel scores correlated weakly to moderately with the severity of tubulointerstitial damage. CONCLUSIONS AND CLINICAL IMPORTANCE: Urine protein banding patterns are useful for the detection of glomerular and tubulointerstitial damage in dogs with proteinuric CKD.