RESUMO
In this study, we employ a comprehensive approach to model the concurrent effects of the COVID-19 epidemic and heatwaves on all-cause excess mortality. Our investigation uncovers distinct peaks in excess mortality, notably among individuals aged 80 years and older, revealing a strong positive correlation with excess temperatures (ET) during the summer of 2022 in Italy. Furthermore, we identify a notable role played by COVID-19 hospitalizations, exhibiting regional disparities, particularly during the winter months. Leveraging functional data regression, we offer robust and coherent insights into the excess mortality trends observed in Italy throughout 2022.
RESUMO
Functional data analysis (FDA) is a statistical framework that allows for the analysis of curves, images, or functions on higher dimensional domains. The goals of FDA, such as descriptive analyses, classification, and regression, are generally the same as for statistical analyses of scalar-valued or multivariate data, but FDA brings additional challenges due to the high- and infinite dimensionality of observations and parameters, respectively. This paper provides an introduction to FDA, including a description of the most common statistical analysis techniques, their respective software implementations, and some recent developments in the field. The paper covers fundamental concepts such as descriptives and outliers, smoothing, amplitude and phase variation, and functional principal component analysis. It also discusses functional regression, statistical inference with functional data, functional classification and clustering, and machine learning approaches for functional data analysis. The methods discussed in this paper are widely applicable in fields such as medicine, biophysics, neuroscience, and chemistry and are increasingly relevant due to the widespread use of technologies that allow for the collection of functional data. Sparse functional data methods are also relevant for longitudinal data analysis. All presented methods are demonstrated using available software in R by analyzing a dataset on human motion and motor control. To facilitate the understanding of the methods, their implementation, and hands-on application, the code for these practical examples is made available through a code and data supplement and on GitHub.
Assuntos
Biometria , Biometria/métodos , Análise de Dados , Aprendizado de Máquina , Humanos , Software , Análise de Componente PrincipalRESUMO
OBJECTIVES: The regional population mortality patterns in China exhibit substantial geographical distribution characteristics. This paper aims to explore the impact and mechanisms of geographical environmental factors on regional population mortality patterns. METHODS: This study first utilized the data from China's Seventh Population Census to obtain mortality patterns for the 31 provincial-level administrative regions. Subsequently, a functional regression method was employed to explore the geographical environmental driving factors of regional mortality patterns. RESULTS: The study provides a detailed explanation of the mechanisms and marginal contributions of key geographical environmental factors at different age groups. CONCLUSIONS: (1) The impact of geographical environmental factors on mortality patterns shows distinct phased characteristics. Mortality patterns before the age of 40 years are hardly influenced by geographical environmental factors, with a noticeable impact beginning at ages 40-69 years and reaching the maximum influence after the age of 70 years. (2) In mortality patterns at ages 40-69 years, average altitude have the most substantial impact, followed by extreme low-temperature days and PM2.5 concentration. In mortality patterns at ages 70-94 years, high-temperature days have the greatest influence, followed by the impact of SO2 concentration. (3) In comparisons based on gender, socioeconomic factors, and geographical environmental factors, gender and urban-rural differences have the most significant impact on regional population mortality patterns, followed by the influence of other socioeconomic factors, with geographical environmental factors having a relatively smaller impact.
RESUMO
Sensor devices, such as accelerometers, are widely used for measuring physical activity (PA). These devices provide outputs at fine granularity (e.g., 10-100 Hz or minute-level), which while providing rich data on activity patterns, also pose computational challenges with multilevel densely sampled data, resulting in PA records that are measured continuously across multiple days and visits. On the other hand, a scalar health outcome (e.g., BMI) is usually observed only at the individual or visit level. This leads to a discrepancy in numbers of nested levels between the predictors (PA) and outcomes, raising analytic challenges. To address this issue, we proposed a multilevel longitudinal functional principal component analysis (mLFPCA) model to directly model multilevel functional PA inputs in a longitudinal study, and then implemented a longitudinal functional principal component regression (FPCR) to explore the association between PA and obesity-related health outcomes. Additionally, we conducted a comprehensive simulation study to examine the impact of imbalanced multilevel data on both mLFPCA and FPCR performance and offer guidelines for selecting optimal methods.
Assuntos
Simulação por Computador , Obesidade , Análise de Componente Principal , Humanos , Estudos Longitudinais , Acelerometria , Índice de Massa Corporal , Modelos Estatísticos , Feminino , Exercício Físico/fisiologia , Análise Multinível , MasculinoRESUMO
The present work intends to compare two statistical classification methods using images as covariates and under the comparison criterion of the ROC curve. The first implemented procedure is based on exploring a mathematical-statistical model using multidimensional arrangements, frequently known as tensors. It is based on the theoretical framework of the high-dimensional generalized linear model. The second methodology is situated in the field of functional data analysis, particularly in the space of functions that have a finite measure of the total variation. A simulation study is carried out to compare both classification methodologies using the area under the ROC curve (AUC). The model based on functional data had better performance than the tensor model. A real data application using medical images is presented.
RESUMO
Increasingly, large, nationally representative health and behavioral surveys conducted under a multistage stratified sampling scheme collect high dimensional data with correlation structured along some domain (eg, wearable sensor data measured continuously and correlated over time, imaging data with spatiotemporal correlation) with the goal of associating these data with health outcomes. Analysis of this sort requires novel methodologic work at the intersection of survey statistics and functional data analysis. Here, we address this crucial gap in the literature by proposing an estimation and inferential framework for generalizable scalar-on-function regression models for data collected under a complex survey design. We propose to: (1) estimate functional regression coefficients using weighted score equations; and (2) perform inference using novel functional balanced repeated replication and survey-weighted bootstrap for multistage survey designs. This is the first frequentist study to discuss the estimation of scalar-on-function regression models in the context of complex survey studies and to assess the validity of various inferential techniques based on re-sampling methods via a comprehensive simulation study. We implement our methods to predict mortality using diurnal activity profiles measured via wearable accelerometers using the National Health and Nutrition Examination Survey 2003-2006 data. The proposed computationally efficient methods are implemented in R software package surveySoFR.
RESUMO
In the brain, functional connections form a network whose topological organization can be described by graph-theoretic network diagnostics. These include characterizations of the community structure, such as modularity and participation coefficient, which have been shown to change over the course of childhood and adolescence. To investigate if such changes in the functional network are associated with changes in cognitive performance during development, network studies often rely on an arbitrary choice of preprocessing parameters, in particular the proportional threshold of network edges. Because the choice of parameter can impact the value of the network diagnostic, and therefore downstream conclusions, we propose to circumvent that choice by conceptualizing the network diagnostic as a function of the parameter. As opposed to a single value, a network diagnostic curve describes the connectome topology at multiple scales-from the sparsest group of the strongest edges to the entire edge set. To relate these curves to executive function and other covariates, we use scalar-on-function regression, which is more flexible than previous functional data-based models used in network neuroscience. We then consider how systematic differences between networks can manifest in misalignment of diagnostic curves, and consequently propose a supervised curve alignment method that incorporates auxiliary information from other variables. Our algorithm performs both functional regression and alignment via an iterative, penalized, and nonlinear likelihood optimization. The illustrated method has the potential to improve the interpretability and generalizability of neuroscience studies where the goal is to study heterogeneity among a mixture of function- and scalar-valued measures.
RESUMO
Task-evoked functional magnetic resonance imaging studies, such as the Human Connectome Project (HCP), are a powerful tool for exploring how brain activity is influenced by cognitive tasks like memory retention, decision-making, and language processing. A fast Bayesian function-on-scalar model is proposed for estimating population-level activation maps linked to the working memory task. The model is based on the canonical polyadic (CP) tensor decomposition of coefficient maps obtained for each subject. This decomposition effectively yields a tensor basis capable of extracting both common features and subject-specific features from the coefficient maps. These subject-specific features, in turn, are modeled as a function of covariates of interest using a Bayesian model that accounts for the correlation of the CP-extracted features. The dimensionality reduction achieved with the tensor basis allows for a fast MCMC estimation of population-level activation maps. This model is applied to one hundred unrelated subjects from the HCP dataset, yielding significant insights into brain signatures associated with working memory.
RESUMO
Regional population mortality correlates with regional socioeconomic development. This study aimed to identify the key socioeconomic factors influencing mortality patterns in Chinese provinces. Using data from the Seventh Population Census, we analyzed mortality patterns by gender and urbanârural division in 31 provinces. Using a functional regression model, we assessed the influence of fourteen indicators on mortality patterns. Main findings: (1) China shows notable gender and urbanârural mortality variations across age groups. Males generally have higher mortality than females, and rural areas experience elevated mortality rates compared to urban areas. Mortality in individuals younger than 40 years is influenced mainly by urbanârural factors, with gender becoming more noticeable in the 40-84 age group. (2) The substantial marginal impact of socioeconomic factors on mortality patterns generally becomes evident after the age of 45, with less pronounced differences in their impact on early-life mortality patterns. (3) Various factors have age-specific impacts on mortality. Education has a negative effect on mortality in individuals aged 0-29, extending to those aged 30-59 and diminishing in older age groups. Urbanization positively influences the probability of death in individuals aged 45-54 years, while the impact of traffic accidents increases with age. Among elderly people, the effect of socioeconomic variables is smaller, highlighting the intricate and heterogeneous nature of these influences and acknowledging certain limitations.
Assuntos
Mortalidade , População Rural , Fatores Socioeconômicos , Humanos , China/epidemiologia , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Adulto , População Rural/estatística & dados numéricos , Mortalidade/tendências , Pré-Escolar , Idoso de 80 Anos ou mais , Adolescente , Adulto Jovem , Criança , Lactente , População Urbana , Recém-Nascido , Fatores Econômicos , Urbanização , Fatores EtáriosRESUMO
BACKGROUND: Although type 2 diabetes mellitus (T2DM) is an established risk factor for cognitive impairment, the underlying mechanisms remain poorly explored. One potential mechanism may be through effects of T2DM on cerebral perfusion. The current study hypothesized that T2DM is associated with altered peripheral and central hemodynamic responses to orthostasis, which may in turn be associated with cognitive impairment in T2DM. METHODS: A novel use of function-on-scalar regression, which allows the entire hemodynamic response curve to be modeled, was employed to assess the association between T2DM and hemodynamic responses to orthostasis. Logistic regression was used to assess the relationship between tissue saturation index (TSI), T2DM, and cognitive impairment. All analyses used cross-sectional data from Wave 3 of The Irish Longitudinal Study on Ageing (TILDA). RESULTS: Of 2 984 older adults (aged 64.3â ±â 8.0; 55% female), 189 (6.3%) had T2DM. T2DM was associated with many features that are indicative of autonomic dysfunction including a blunted peak heart rate and lower diastolic blood pressure. T2DM was associated with reduced TSI and also with greater odds of impaired performance on the Montreal Cognitive Assessment (odds ratio [OR]: 1.62; confidence interval [CI: 1.07, 2.56]; pâ =â .019). Greater TSI was associated with lower odds of impaired performance (OR: 0.90, CI [0.81-0.99]; pâ =â .047). CONCLUSIONS: T2DM was associated with impaired peripheral and cerebral hemodynamic responses to active stand. Both T2DM and reduced cerebral perfusion were associated with impaired cognitive performance. Altered cerebral perfusion may represent an important mechanism linking T2DM and adverse brain health outcomes in older adults.
Assuntos
Disfunção Cognitiva , Diabetes Mellitus Tipo 2 , Humanos , Feminino , Idoso , Masculino , Diabetes Mellitus Tipo 2/complicações , Estudos Longitudinais , Tontura , Estudos Transversais , Disfunção Cognitiva/etiologia , HemodinâmicaRESUMO
In the brain, functional connections form a network whose topological organization can be described by graph-theoretic network diagnostics. These include characterizations of the community structure, such as modularity and participation coefficient, which have been shown to change over the course of childhood and adolescence. To investigate if such changes in the functional network are associated with changes in cognitive performance during development, network studies often rely on an arbitrary choice of pre-processing parameters, in particular the proportional threshold of network edges. Because the choice of parameter can impact the value of the network diagnostic, and therefore downstream conclusions, we propose to circumvent that choice by conceptualizing the network diagnostic as a function of the parameter. As opposed to a single value, a network diagnostic curve describes the connectome topology at multiple scales-from the sparsest group of the strongest edges to the entire edge set. To relate these curves to executive function and other covariates, we use scalar-on-function regression, which is more flexible than previous functional data-based models used in network neuroscience. We then consider how systematic differences between networks can manifest in misalignment of diagnostic curves, and consequently propose a supervised curve alignment method that incorporates auxiliary information from other variables. Our algorithm performs both functional regression and alignment via an iterative, penalized, and nonlinear likelihood optimization. The illustrated method has the potential to improve the interpretability and generalizability of neuroscience studies where the goal is to study heterogeneity among a mixture of function- and scalar-valued measures.
RESUMO
PURPOSE: Many chronic diseases have detrimental impact on the physical activity (PA) patterns of older adults. Often such diseases have different degrees of severity in males and females. Quantifying this gender difference would not only enhance our understanding of diseases but would also help design individual-specific PA interventions, thereby improving health outcomes for both genders. METHODS: PA data for 747 participants from round 11 (2021) of the National Health and Aging Trends Study were analyzed. Multilevel functional regression models were used to study gender difference in the effects of chronic diseases on daily PA patterns while adjusting for confounders. RESULTS: Females with dementia (or Alzheimer's disease), hypertension, heart and lung disease had lower PA at different times of day compared to females without these diseases, whereas males with and without these diseases had comparable daily PA. Males with diabetes had higher midnight PA and lower noon PA compared to males without diabetes, while females' PA with and without diabetes were similar. CONCLUSIONS: Our analysis demonstrates that although for most diseases, the daily PA patterns of individuals with the disease are negatively altered compared to healthy individuals, the extent of decline varies by gender and time of day. Designing personalized physical activity interventions considering gender and diurnal PA pattern can potentially improve quality of life across both genders.
Assuntos
Exercício Físico , Qualidade de Vida , Humanos , Masculino , Feminino , Idoso , Fatores Sexuais , Envelhecimento , Doença CrônicaRESUMO
Alzheimer's Disease (AD) is the leading cause of dementia and impairment in various domains. Recent AD studies, (ie, Alzheimer's Disease Neuroimaging Initiative (ADNI) study), collect multimodal data, including longitudinal neurological assessments and magnetic resonance imaging (MRI) data, to better study the disease progression. Adopting early interventions is essential to slow AD progression for subjects with mild cognitive impairment (MCI). It is of particular interest to develop an AD predictive model that leverages multimodal data and provides accurate personalized predictions. In this article, we propose a multivariate functional mixed model with MRI data (MFMM-MRI) that simultaneously models longitudinal neurological assessments, baseline MRI data, and the survival outcome (ie, dementia onset) for subjects with MCI at baseline. Two functional forms (the random-effects model and instantaneous model) linking the longitudinal and survival process are investigated. We use Markov Chain Monte Carlo (MCMC) method based on No-U-Turn Sampling (NUTS) algorithm to obtain posterior samples. We develop a dynamic prediction framework that provides accurate personalized predictions of longitudinal trajectories and survival probability. We apply MFMM-MRI to the ADNI study and identify significant associations among longitudinal outcomes, MRI data, and the risk of dementia onset. The instantaneous model with voxels from the whole brain has the best prediction performance among all candidate models. The simulation study supports the validity of the estimation and dynamic prediction method.
Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Humanos , Imageamento por Ressonância Magnética , Neuroimagem , Encéfalo/diagnóstico por imagem , Encéfalo/patologia , Progressão da Doença , Disfunção Cognitiva/diagnóstico por imagemRESUMO
We consider general nonlinear function-on-scalar (FOS) regression models, where the functional response depends on multiple scalar predictors in a general unknown nonlinear form. Existing methods either assume specific model forms (e.g., additive models) or directly estimate the nonlinear function in a space with dimension equal to the number of scalar predictors, which can only be applied to models with a few scalar predictors. To overcome these shortcomings, motivated by the classic universal approximation theorem used in neural networks, we develop a functional universal approximation theorem which can be used to approximate general nonlinear FOS maps and can be easily adopted into the framework of functional data analysis. With this theorem and utilizing smoothness regularity, we develop a novel method to fit the general nonlinear FOS regression model and make predictions. Our new method does not make any specific assumption on the model forms, and it avoids the direct estimation of nonlinear functions in a space with dimension equal to the number of scalar predictors. By estimating a sequence of bivariate functions, our method can be applied to models with a relatively large number of scalar predictors. The good performance of the proposed method is demonstrated by empirical studies on various simulated and real datasets.
Assuntos
Redes Neurais de Computação , Dinâmica não LinearRESUMO
The increase in the use of mobile and wearable devices now allows dense assessment of mediating processes over time. For example, a pharmacological intervention may have an effect on smoking cessation via reductions in momentary withdrawal symptoms. We define and identify the causal direct and indirect effects in terms of potential outcomes on the mean difference and odds ratio scales, and present a method for estimating and testing the indirect effect of a randomized treatment on a distal binary variable as mediated by the nonparametric trajectory of an intensively measured longitudinal variable (e.g., from ecological momentary assessment). Coverage of a bootstrap test for the indirect effect is demonstrated via simulation. An empirical example is presented based on estimating later smoking abstinence from patterns of craving during smoking cessation treatment. We provide an R package, funmediation, available on CRAN at https://cran.r-project.org/web/packages/funmediation/index.html, to conveniently apply this technique. We conclude by discussing possible extensions to multiple mediators and directions for future research.
Assuntos
Abandono do Hábito de Fumar , Síndrome de Abstinência a Substâncias , Humanos , Abandono do Hábito de Fumar/métodos , Análise de Mediação , Fumar/terapia , Fissura , Síndrome de Abstinência a Substâncias/tratamento farmacológicoRESUMO
Motivated by the analysis of longitudinal neuroimaging studies, we study the longitudinal functional linear regression model under asynchronous data setting for modeling the association between clinical outcomes and functional (or imaging) covariates. In the asynchronous data setting, both covariates and responses may be measured at irregular and mismatched time points, posing methodological challenges to existing statistical methods. We develop a kernel weighted loss function with roughness penalty to obtain the functional estimator and derive its representer theorem. The rate of convergence, a Bahadur representation, and the asymptotic pointwise distribution of the functional estimator are obtained under the reproducing kernel Hilbert space framework. We propose a penalized likelihood ratio test to test the nullity of the functional coefficient, derive its asymptotic distribution under the null hypothesis, and investigate the separation rate under the alternative hypotheses. Simulation studies are conducted to examine the finite-sample performance of the proposed procedure. We apply the proposed methods to the analysis of multitype data obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, which reveals significant association between 21 regional brain volume density curves and the cognitive function. Data used in preparation of this paper were obtained from the ADNI database (adni.loni.usc.edu).
Assuntos
Doença de Alzheimer , Humanos , Modelos Lineares , Doença de Alzheimer/diagnóstico por imagem , Simulação por Computador , Algoritmos , Funções VerossimilhançaRESUMO
Recent use of noninvasive and continuous hemoglobin (SpHb) concentration monitor has emerged as an alternative to invasive laboratory-based hematological analysis. Unlike delayed laboratory based measures of hemoglobin (HgB), SpHb monitors can provide real-time information about the HgB levels. Real-time SpHb measurements will offer healthcare providers with warnings and early detections of abnormal health status, e.g., hemorrhagic shock, anemia, and thus support therapeutic decision-making, as well as help save lives. However, the finger-worn CO-Oximeter sensors used in SpHb monitors often get detached or have to be removed, which causes missing data in the continuous SpHb measurements. Missing data among SpHb measurements reduce the trust in the accuracy of the device, influence the effectiveness of hemorrhage interventions and future HgB predictions. A model with imputation and prediction method is investigated to deal with missing values and improve prediction accuracy. The Gaussian process and functional regression methods are proposed to impute missing SpHb data and make predictions on laboratory-based HgB measurements. Within the proposed method, multiple choices of sub-models are considered. The proposed method shows a significant improvement in accuracy based on a real-data study. Proposed method shows superior performance with the real data, within the proposed framework, different choices of sub-models are discussed and the usage recommendation is provided accordingly. The modeling framework can be extended to other application scenarios with missing values.
Assuntos
Hemoglobinas , Oximetria , Hemoglobinas/análise , Hemorragia , Humanos , Monitorização Fisiológica/métodos , Distribuição NormalRESUMO
In this chapter, we will provide a review on imputation in the context of DNA methylation, specifically focusing on a penalized functional regression (PFR) method we have previously developed. We will start with a brief review of DNA methylation, genomic and epigenomic contexts where imputation has proven beneficial in practice, and statistical or computational methods proposed for DNA methylation in the recent literature (Subheading 1). The rest of the chapter (Subheadings 2-4) will provide a detailed review of our PFR method proposed for across-platform imputation, which incorporates nonlocal information using a penalized functional regression framework. Subheading 2 introduces commonly employed technologies for DNA methylation measurement and describes the real dataset we have used in the development of our method: the acute myeloid leukemia (AML) dataset from The Cancer Genome Atlas (TCGA) project. Subheading 3 comprehensively reviews our method, encompassing data harmonization prior to model building, the actual building of penalized functional regression model, post-imputation quality filter, and imputation quality assessment. Subheading 4 shows the performance of our method in both simulation and the TCGA AML dataset, demonstrating that our penalized functional regression model is a valuable across-platform imputation tool for DNA methylation data, particularly because of its ability to boost statistical power for subsequent epigenome-wide association study. Finally, Subheading 5 provides future perspectives on imputation for DNA methylation data.
Assuntos
Metilação de DNA , Leucemia Mieloide Aguda , Epigenômica , Genômica , Humanos , Leucemia Mieloide Aguda/genética , Análise de RegressãoRESUMO
Frontal power asymmetry (FA), a measure of brain function derived from electroencephalography, is a potential biomarker for major depressive disorder (MDD). Though FA is functional in nature, it is typically reduced to a scalar value prior to analysis, possibly obscuring its relationship with MDD and leading to a number of studies that have provided contradictory results. To overcome this issue, we sought to fit a functional regression model to characterize the association between FA and MDD status, adjusting for age, sex, cognitive ability, and handedness using data from a large clinical study that included both MDD and healthy control (HC) subjects. Since nearly 40% of the observations are missing data on either FA or cognitive ability, we propose an extension of multiple imputation (MI) by chained equations that allows for the imputation of both scalar and functional data. We also propose an extension of Rubin's Rules for conducting valid inference in this setting. The proposed methods are evaluated in a simulation and applied to our FA data. For our FA data, a pooled analysis from the imputed data sets yielded similar results to those of the complete case analysis. We found that, among young females, HCs tended to have higher FA over the θ, α, and ß frequency bands, but that the difference between HC and MDD subjects diminishes and ultimately reverses with age. For males, HCs tended to have higher FA in the ß frequency band, regardless of age. Young male HCs had higher FA in the θ and α bands, but this difference diminishes with increasing age in the α band and ultimately reverses with increasing age in the θ band.
RESUMO
Despite the importance of maternal gestational weight gain, it is not yet conclusively understood how weight gain during different stages of pregnancy influences health outcomes for either mother or child. We partially attribute this to differences in and the validity of statistical methods for the analysis of longitudinal and scalar outcome data. In this paper, we propose a Bayesian joint regression model that estimates and uses trajectory parameters as predictors of a scalar response. Our model remedies notable issues with traditional linear regression approaches found in the clinical literature. In particular, our methodology accommodates nonprospective designs by correcting for bias in self-reported prestudy measures; truly accommodates sparse longitudinal observations and short-term variation without data aggregation or precomputation; and is more robust to the choice of model changepoints. We demonstrate these advantages through a real-world application to the Alberta Pregnancy Outcomes and Nutrition (APrON) dataset and a comparison to a linear regression approach from the clinical literature. Our methods extend naturally to other maternal and infant outcomes as well as to areas of research that employ similarly structured data.