Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 323
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39285512

RESUMEN

With rapidly evolving high-throughput technologies and consistently decreasing costs, collecting multimodal omics data in large-scale studies has become feasible. Although studying multiomics provides a new comprehensive approach in understanding the complex biological mechanisms of human diseases, the high dimensionality of omics data and the complexity of the interactions among various omics levels in contributing to disease phenotypes present tremendous analytical challenges. There is a great need of novel analytical methods to address these challenges and to facilitate multiomics analyses. In this paper, we propose a multimodal functional deep learning (MFDL) method for the analysis of high-dimensional multiomics data. The MFDL method models the complex relationships between multiomics variants and disease phenotypes through the hierarchical structure of deep neural networks and handles high-dimensional omics data using the functional data analysis technique. Furthermore, MFDL leverages the structure of the multimodal model to capture interactions between different types of omics data. Through simulation studies and real-data applications, we demonstrate the advantages of MFDL in terms of prediction accuracy and its robustness to the high dimensionality and noise within the data.


Asunto(s)
Aprendizaje Profundo , Genómica , Humanos , Genómica/métodos , Biología Computacional/métodos , Redes Neurales de la Computación , Algoritmos , Multiómica
2.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-39007595

RESUMEN

Biomedical research now commonly integrates diverse data types or views from the same individuals to better understand the pathobiology of complex diseases, but the challenge lies in meaningfully integrating these diverse views. Existing methods often require the same type of data from all views (cross-sectional data only or longitudinal data only) or do not consider any class outcome in the integration method, which presents limitations. To overcome these limitations, we have developed a pipeline that harnesses the power of statistical and deep learning methods to integrate cross-sectional and longitudinal data from multiple sources. In addition, it identifies key variables that contribute to the association between views and the separation between classes, providing deeper biological insights. This pipeline includes variable selection/ranking using linear and nonlinear methods, feature extraction using functional principal component analysis and Euler characteristics, and joint integration and classification using dense feed-forward networks for cross-sectional data and recurrent neural networks for longitudinal data. We applied this pipeline to cross-sectional and longitudinal multiomics data (metagenomics, transcriptomics and metabolomics) from an inflammatory bowel disease (IBD) study and identified microbial pathways, metabolites and genes that discriminate by IBD status, providing information on the etiology of IBD. We conducted simulations to compare the two feature extraction methods.


Asunto(s)
Aprendizaje Profundo , Enfermedades Inflamatorias del Intestino , Humanos , Estudios Transversales , Enfermedades Inflamatorias del Intestino/clasificación , Enfermedades Inflamatorias del Intestino/genética , Estudios Longitudinales , Análisis Discriminante , Metabolómica/métodos , Biología Computacional/métodos
3.
Biostatistics ; 2024 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-39140988

RESUMEN

In the brain, functional connections form a network whose topological organization can be described by graph-theoretic network diagnostics. These include characterizations of the community structure, such as modularity and participation coefficient, which have been shown to change over the course of childhood and adolescence. To investigate if such changes in the functional network are associated with changes in cognitive performance during development, network studies often rely on an arbitrary choice of preprocessing parameters, in particular the proportional threshold of network edges. Because the choice of parameter can impact the value of the network diagnostic, and therefore downstream conclusions, we propose to circumvent that choice by conceptualizing the network diagnostic as a function of the parameter. As opposed to a single value, a network diagnostic curve describes the connectome topology at multiple scales-from the sparsest group of the strongest edges to the entire edge set. To relate these curves to executive function and other covariates, we use scalar-on-function regression, which is more flexible than previous functional data-based models used in network neuroscience. We then consider how systematic differences between networks can manifest in misalignment of diagnostic curves, and consequently propose a supervised curve alignment method that incorporates auxiliary information from other variables. Our algorithm performs both functional regression and alignment via an iterative, penalized, and nonlinear likelihood optimization. The illustrated method has the potential to improve the interpretability and generalizability of neuroscience studies where the goal is to study heterogeneity among a mixture of function- and scalar-valued measures.

4.
Biostatistics ; 2024 Mar 13.
Artículo en Inglés | MEDLINE | ID: mdl-38476094

RESUMEN

Linear and generalized linear scalar-on-function modeling have been commonly used to understand the relationship between a scalar response variable (e.g. continuous, binary outcomes) and functional predictors. Such techniques are sensitive to model misspecification when the relationship between the response variable and the functional predictors is complex. On the other hand, support vector machines (SVMs) are among the most robust prediction models but do not take account of the high correlations between repeated measurements and cannot be used for irregular data. In this work, we propose a novel method to integrate functional principal component analysis with SVM techniques for classification and regression to account for the continuous nature of functional data and the nonlinear relationship between the scalar response variable and the functional predictors. We demonstrate the performance of our method through extensive simulation experiments and two real data applications: the classification of alcoholics using electroencephalography signals and the prediction of glucobrassicin concentration using near-infrared reflectance spectroscopy. Our methods especially have more advantages when the measurement errors in functional predictors are relatively large.

5.
Genet Epidemiol ; 47(6): 409-431, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37101379

RESUMEN

In genetic studies, many phenotypes have multiple naturally ordered discrete values. The phenotypes can be correlated with each other. If multiple correlated ordinal traits are analyzed simultaneously, the power of analysis may increase significantly while the false positives can be controlled well. In this study, we propose bivariate functional ordinal linear regression (BFOLR) models using latent regressions with cumulative logit link or probit link to perform a gene-based analysis for bivariate ordinal traits and sequencing data. In the proposed BFOLR models, genetic variant data are viewed as stochastic functions of physical positions, and the genetic effects are treated as a function of physical positions. The BFOLR models take the correlation of the two ordinal traits into account via latent variables. The BFOLR models are built upon functional data analysis which can be revised to analyze the bivariate ordinal traits and high-dimension genetic data. The methods are flexible and can analyze three types of genetic data: (1) rare variants only, (2) common variants only, and (3) a combination of rare and common variants. Extensive simulation studies show that the likelihood ratio tests of the BFOLR models control type I errors well and have good power performance. The BFOLR models are applied to analyze Age-Related Eye Disease Study data, in which two genes, CFH and ARMS2, are found to strongly associate with eye drusen size, drusen area, age-related macular degeneration (AMD) categories, and AMD severity scale.


Asunto(s)
Degeneración Macular , Modelos Genéticos , Humanos , Fenotipo , Degeneración Macular/genética , Simulación por Computador , Modelos Lineales
6.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35998893

RESUMEN

Cells and tissues respond to perturbations in multiple ways that can be sensitively reflected in the alterations of gene expression. Current approaches to finding and quantifying the effects of perturbations on cell-level responses over time disregard the temporal consistency of identifiable gene programs. To leverage the occurrence of these patterns for perturbation analyses, we developed CellDrift (https://github.com/KANG-BIOINFO/CellDrift), a generalized linear model-based functional data analysis method that is capable of identifying covarying temporal patterns of various cell types in response to perturbations. As compared to several other approaches, CellDrift demonstrated superior performance in the identification of temporally varied perturbation patterns and the ability to impute missing time points. We applied CellDrift to multiple longitudinal datasets, including COVID-19 disease progression and gastrointestinal tract development, and demonstrated its ability to identify specific gene programs associated with sequential biological processes, trajectories and outcomes.


Asunto(s)
COVID-19 , COVID-19/genética , Humanos , Modelos Lineales
7.
Hum Genomics ; 17(1): 8, 2023 02 11.
Artículo en Inglés | MEDLINE | ID: mdl-36774528

RESUMEN

BACKGROUND: Aging affects the incidence of diseases such as cancer and dementia, so the development of biomarkers for aging is an important research topic in medical science. While such biomarkers have been mainly identified based on the assumption of a linear relationship between phenotypic parameters, including molecular markers, and chronological age, numerous nonlinear changes between markers and aging have been identified. However, the overall landscape of the patterns in nonlinear changes that exist in aging is unknown. RESULT: We propose a novel computational method, Data-driven Identification and Classification of Nonlinear Aging Patterns (DICNAP), that is based on functional data analysis to identify biomarkers for aging and potential patterns of change during aging in a data-driven manner. We applied the proposed method to large-scale, public DNA methylation data to explore the potential patterns of age-related changes in methylation intensity. The results showed that not only linear, but also nonlinear changes in DNA methylation patterns exist. A monotonous demethylation pattern during aging, with its rate decreasing at around age 60, was identified as the candidate stable nonlinear pattern. We also analyzed the age-related changes in methylation variability. The results showed that the variability of methylation intensity tends to increase with age at age-associated sites. The representative variability pattern is a monotonically increasing pattern that accelerates after middle age. CONCLUSION: DICNAP was able to identify the potential patterns of the changes in the landscape of DNA methylation during aging. It contributes to an improvement in our theoretical understanding of the aging process.


Asunto(s)
Metilación de ADN , Neoplasias , Persona de Mediana Edad , Humanos , Metilación de ADN/genética , Envejecimiento/genética , Biomarcadores , Neoplasias/genética , Epigénesis Genética , Islas de CpG/genética , Epigenómica/métodos
8.
Biometrics ; 80(1)2024 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-38477485

RESUMEN

Environmental epidemiologic studies routinely utilize aggregate health outcomes to estimate effects of short-term (eg, daily) exposures that are available at increasingly fine spatial resolutions. However, areal averages are typically used to derive population-level exposure, which cannot capture the spatial variation and individual heterogeneity in exposures that may occur within the spatial and temporal unit of interest (eg, within a day or ZIP code). We propose a general modeling approach to incorporate within-unit exposure heterogeneity in health analyses via exposure quantile functions. Furthermore, by viewing the exposure quantile function as a functional covariate, our approach provides additional flexibility in characterizing associations at different quantile levels. We apply the proposed approach to an analysis of air pollution and emergency department (ED) visits in Atlanta over 4 years. The analysis utilizes daily ZIP code-level distributions of personal exposures to 4 traffic-related ambient air pollutants simulated from the Stochastic Human Exposure and Dose Simulator. Our analyses find that effects of carbon monoxide on respiratory and cardiovascular disease ED visits are more pronounced with changes in lower quantiles of the population's exposure. Software for implement is provided in the R package nbRegQF.


Asunto(s)
Contaminantes Atmosféricos , Contaminación del Aire , Humanos , Contaminantes Atmosféricos/análisis , Material Particulado/análisis , Exposición a Riesgos Ambientales , Contaminación del Aire/análisis , Monóxido de Carbono/análisis
9.
Stat Med ; 2024 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-39370732

RESUMEN

Mendelian randomization is an instrumental variable method that utilizes genetic information to investigate the causal effect of a modifiable exposure on an outcome. In most cases, the exposure changes over time. Understanding the time-varying causal effect of the exposure can yield detailed insights into mechanistic effects and the potential impact of public health interventions. Recently, a growing number of Mendelian randomization studies have attempted to explore time-varying causal effects. However, the proposed approaches oversimplify temporal information and rely on overly restrictive structural assumptions, limiting their reliability in addressing time-varying causal problems. This article considers a novel approach to estimate time-varying effects through continuous-time modelling by combining functional principal component analysis and weak-instrument-robust techniques. Our method effectively utilizes available data without making strong structural assumptions and can be applied in general settings where the exposure measurements occur at different timepoints for different individuals. We demonstrate through simulations that our proposed method performs well in estimating time-varying effects and provides reliable inference when the time-varying effect form is correctly specified. The method could theoretically be used to estimate arbitrarily complex time-varying effects. However, there is a trade-off between model complexity and instrument strength. Estimating complex time-varying effects requires instruments that are unrealistically strong. We illustrate the application of this method in a case study examining the time-varying effects of systolic blood pressure on urea levels.

10.
Stat Med ; 43(6): 1153-1169, 2024 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-38221776

RESUMEN

Wastewater-based surveillance has become an important tool for research groups and public health agencies investigating and monitoring the COVID-19 pandemic and other public health emergencies including other pathogens and drug abuse. While there is an emerging body of evidence exploring the possibility of predicting COVID-19 infections from wastewater signals, there remain significant challenges for statistical modeling. Longitudinal observations of viral copies in municipal wastewater can be influenced by noisy datasets and missing values with irregular and sparse samplings. We propose an integrative Bayesian framework to predict daily positive cases from weekly wastewater observations with missing values via functional data analysis techniques. In a unified procedure, the proposed analysis models severe acute respiratory syndrome coronavirus-2 RNA wastewater signals as a realization of a smooth process with error and combines the smooth process with COVID-19 cases to evaluate the prediction of positive cases. We demonstrate that the proposed framework can achieve these objectives with high predictive accuracies through simulated and observed real data.


Asunto(s)
COVID-19 , Humanos , Teorema de Bayes , COVID-19/epidemiología , Pandemias , ARN Viral/genética , SARS-CoV-2/genética , Aguas Residuales
11.
J R Stat Soc Series B Stat Methodol ; 86(3): 694-713, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39005888

RESUMEN

Quantifying the association between components of multivariate random curves is of general interest and is a ubiquitous and basic problem that can be addressed with functional data analysis. An important application is the problem of assessing functional connectivity based on functional magnetic resonance imaging (fMRI), where one aims to determine the similarity of fMRI time courses that are recorded on anatomically separated brain regions. In the functional brain connectivity literature, the static temporal Pearson correlation has been the prevailing measure for functional connectivity. However, recent research has revealed temporally changing patterns of functional connectivity, leading to the study of dynamic functional connectivity. This motivates new similarity measures for pairs of random curves that reflect the dynamic features of functional similarity. Specifically, we introduce gradient synchronization measures in a general setting. These similarity measures are based on the concordance and discordance of the gradients between paired smooth random functions. Asymptotic normality of the proposed estimates is obtained under regularity conditions. We illustrate the proposed synchronization measures via simulations and an application to resting-state fMRI signals from the Alzheimer's Disease Neuroimaging Initiative and they are found to improve discrimination between subjects with different disease status.

12.
Sensors (Basel) ; 24(10)2024 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-38793825

RESUMEN

The advancements of Internet of Things (IoT) technologies have enabled the implementation of smart and wearable sensors, which can be employed to provide older adults with affordable and accessible continuous biophysiological status monitoring. The quality of such monitoring data, however, is unsatisfactory due to excessive noise induced by various disturbances, such as motion artifacts. Existing methods take advantage of summary statistics, such as mean or median values, for denoising, without taking into account the biophysiological patterns embedded in data. In this research, a functional data analysis modeling method was proposed to enhance the data quality by learning individual subjects' diurnal heart rate (HR) patterns from historical data, which were further improved by fusing newly collected data. This proposed data-fusion approach was developed based on a Bayesian inference framework. Its effectiveness was demonstrated in an HR analysis from a prospective study involving older adults residing in assisted living or home settings. The results indicate that it is imperative to conduct personalized healthcare by estimating individualized HR patterns. Furthermore, the proposed calibration method provides a more accurate (smaller mean errors) and more precise (smaller error standard deviations) HR estimation than raw HR and conventional methods, such as the mean.


Asunto(s)
Teorema de Bayes , Frecuencia Cardíaca , Dispositivos Electrónicos Vestibles , Humanos , Frecuencia Cardíaca/fisiología , Masculino , Anciano , Femenino , Monitoreo Fisiológico/métodos , Monitoreo Fisiológico/instrumentación , Algoritmos , Estudios Prospectivos
13.
Ergonomics ; : 1-17, 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-39037945

RESUMEN

Recent studies have focused on accurately estimating mental workload using machine learning algorithms and extracting features from physiological measures. However, feature extraction leads to the loss of valuable information and often results in binary classifications that lack specificity in the identification of optimum mental workload. This study investigates the feasibility of using raw physiological data (EEG, facial EMG, ECG, EDA, pupillometry) combined with Functional Data Analysis (FDA) to estimate the mental workload of human drivers. A driving scenario with five tasks was employed, and subjective ratings were collected. Results demonstrate that the FDA applied nine different combinations of raw physiological signals achieving a maximum 90% accuracy, outperforming extracted features by 73%. This study shows that the mental workload of human drivers can be accurately estimated without utilising burdensome feature extraction. The approach proposed in this study offers promise for mental workload assessment in real-world applications.


This study aimed to estimate the mental workload of human drivers using physiological signals and Functional Data Analysis (FDA). By comparing models using raw data and extracted features, the results show that the FDA with raw data achieved a high accuracy of 90%, outperforming the model with extracted features (73%).

14.
Clin Linguist Phon ; 38(1): 64-81, 2024 03.
Artículo en Inglés | MEDLINE | ID: mdl-36636014

RESUMEN

This study aims to reveal dynamic changes in prosodic prominence patterns associated with Parkinson's disease (PD). To fulfill this purpose, the study proposes an exploratory methodology involving measuring a novel syllable-based prosody index (SPI) and performing functional principal component analyses (fPCAs) in a semi-automatic manner. First, SPI trajectories were collected from 31 speakers with PD before and after speech therapy and from 36 healthy controls. Then, the SPI trajectories were converted to continuous functions using B-splines. Finally, the functional SPIs were examined using fPCAs. The results showed that PD was associated with an increase of overall prominence for male speakers. The findings regarding higher prominence patterns in PD were supported by traditional phonetic measurements. For female speakers, however, there were no significant differences in prosodic prominence between speakers with PD and healthy controls. The results encourage to explore the proposed methodology also in analyses of other forms of atypical speech.


Asunto(s)
Enfermedad de Parkinson , Humanos , Masculino , Femenino , Proyectos Piloto , Enfermedad de Parkinson/complicaciones , Medición de la Producción del Habla , Habla , Trastornos del Habla
15.
Genet Epidemiol ; 46(5-6): 234-255, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35438198

RESUMEN

In this paper, we develop functional ordinal logistic regression (FOLR) models to perform gene-based analysis of ordinal traits. In the proposed FOLR models, genetic variant data are viewed as stochastic functions of physical positions and the genetic effects are treated as a function of physical positions. The FOLR models are built upon functional data analysis which can be revised to analyze the ordinal traits and high dimension genetic data. The proposed methods are capable of dealing with dense genotype data which is usually encountered in analyzing the next-generation sequencing data. The methods are flexible and can analyze three types of genetic data: (1) rare variants only, (2) common variants only, and (3) a combination of rare and common variants. Simulation studies show that the likelihood ratio test statistics of the FOLR models control type I errors well and have good power performance. The proposed methods achieve the goals of analyzing ordinal traits directly, reducing high dimensionality of dense genetic variants, being computationally manageable, facilitating model convergence, properly controlling type I errors, and maintaining high power levels. The FOLR models are applied to analyze Age-Related Eye Disease Study data, in which two genes are found to strongly associate with four ordinal traits.


Asunto(s)
Pruebas Genéticas , Modelos Genéticos , Simulación por Computador , Variación Genética , Genotipo , Humanos , Modelos Logísticos , Fenotipo
16.
Biostatistics ; 23(2): 574-590, 2022 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-33040145

RESUMEN

In recent biomedical research, genome-wide association studies (GWAS) have demonstrated great success in investigating the genetic architecture of human diseases. For many complex diseases, multiple correlated traits have been collected. However, most of the existing GWAS are still limited because they analyze each trait separately without considering their correlations and suffer from a lack of sufficient information. Moreover, the high dimensionality of single nucleotide polymorphism (SNP) data still poses tremendous challenges to statistical methods, in both theoretical and practical aspects. In this article, we innovatively propose an integrative functional linear model for GWAS with multiple traits. This study is the first to approximate SNPs as functional objects in a joint model of multiple traits with penalization techniques. It effectively accommodates the high dimensionality of SNPs and correlations among multiple traits to facilitate information borrowing. Our extensive simulation studies demonstrate the satisfactory performance of the proposed method in the identification and estimation of disease-associated genetic variants, compared to four alternatives. The analysis of type 2 diabetes data leads to biologically meaningful findings with good prediction accuracy and selection stability.


Asunto(s)
Diabetes Mellitus Tipo 2 , Estudio de Asociación del Genoma Completo , Diabetes Mellitus Tipo 2/genética , Estudio de Asociación del Genoma Completo/métodos , Humanos , Modelos Lineales , Fenotipo , Polimorfismo de Nucleótido Simple
17.
Biostatistics ; 23(4): 1218-1241, 2022 10 14.
Artículo en Inglés | MEDLINE | ID: mdl-35640937

RESUMEN

Quantile regression is a semiparametric method for modeling associations between variables. It is most helpful when the covariates have complex relationships with the location, scale, and shape of the outcome distribution. Despite the method's robustness to distributional assumptions and outliers in the outcome, regression quantiles may be biased in the presence of measurement error in the covariates. The impact of function-valued covariates contaminated with heteroscedastic error has not yet been examined previously; although, studies have investigated the case of scalar-valued covariates. We present a two-stage strategy to consistently fit linear quantile regression models with a function-valued covariate that may be measured with error. In the first stage, an instrumental variable is used to estimate the covariance matrix associated with the measurement error. In the second stage, simulation extrapolation (SIMEX) is used to correct for measurement error in the function-valued covariate. Point-wise standard errors are estimated by means of nonparametric bootstrap. We present simulation studies to assess the robustness of the measurement error corrected for functional quantile regression. Our methods are applied to National Health and Examination Survey data to assess the relationship between physical activity and body mass index among adults in the United States.


Asunto(s)
Análisis de Regresión , Simulación por Computador , Humanos , Modelos Lineales
18.
Biostatistics ; 2022 Sep 08.
Artículo en Inglés | MEDLINE | ID: mdl-36073640

RESUMEN

Distributed lag models are useful in environmental epidemiology as they allow the user to investigate critical windows of exposure, defined as the time periods during which exposure to a pollutant adversely affects health outcomes. Recent studies have focused on estimating the health effects of a large number of environmental exposures, or an environmental mixture, on health outcomes. In such settings, it is important to understand which environmental exposures affect a particular outcome, while acknowledging the possibility that different exposures have different critical windows. Further, in studies of environmental mixtures, it is important to identify interactions among exposures and to account for the fact that this interaction may occur between two exposures having different critical windows. Exposure to one exposure early in time could cause an individual to be more or less susceptible to another exposure later in time. We propose a Bayesian model to estimate the temporal effects of a large number of exposures on an outcome. We use spike-and-slab priors and semiparametric distributed lag curves to identify important exposures and exposure interactions and discuss extensions with improved power to detect harmful exposures. We then apply these methods to estimate the effects of exposure to multiple air pollutants during pregnancy on birthweight from vital records in Colorado.

19.
Biostatistics ; 23(2): 558-573, 2022 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-33017019

RESUMEN

Multi-dimensional functional data arises in numerous modern scientific experimental and observational studies. In this article, we focus on longitudinal functional data, a structured form of multidimensional functional data. Operating within a longitudinal functional framework we aim to capture low dimensional interpretable features. We propose a computationally efficient nonparametric Bayesian method to simultaneously smooth observed data, estimate conditional functional means and functional covariance surfaces. Statistical inference is based on Monte Carlo samples from the posterior measure through adaptive blocked Gibbs sampling. Several operative characteristics associated with the proposed modeling framework are assessed comparatively in a simulated environment. We illustrate the application of our work in two case studies. The first case study involves age-specific fertility collected over time for various countries. The second case study is an implicit learning experiment in children with autism spectrum disorder.


Asunto(s)
Trastorno del Espectro Autista , Teorema de Bayes , Niño , Humanos , Método de Montecarlo
20.
Biometrics ; 79(2): 1239-1253, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-35583919

RESUMEN

Functional principal component analysis (FPCA) has been widely used to capture major modes of variation and reduce dimensions in functional data analysis. However, standard FPCA based on the sample covariance estimator does not work well if the data exhibits heavy-tailedness or outliers. To address this challenge, a new robust FPCA approach based on a functional pairwise spatial sign (PASS) operator, termed PASS FPCA, is introduced. We propose robust estimation procedures for eigenfunctions and eigenvalues. Theoretical properties of the PASS operator are established, showing that it adopts the same eigenfunctions as the standard covariance operator and also allows recovering ratios between eigenvalues. We also extend the proposed procedure to handle functional data measured with noise. Compared to existing robust FPCA approaches, the proposed PASS FPCA requires weaker distributional assumptions to conserve the eigenspace of the covariance function. Specifically, existing work are often built upon a class of functional elliptical distributions, which requires inherently symmetry. In contrast, we introduce a class of distributions called the weakly functional coordinate symmetry (weakly FCS), which allows for severe asymmetry and is much more flexible than the functional elliptical distribution family. The robustness of the PASS FPCA is demonstrated via extensive simulation studies, especially its advantages in scenarios with nonelliptical distributions. The proposed method was motivated by and applied to analysis of accelerometry data from the Objective Physical Activity and Cardiovascular Health Study, a large-scale epidemiological study to investigate the relationship between objectively measured physical activity and cardiovascular health among older women.


Asunto(s)
Análisis de Componente Principal , Anciano , Femenino , Humanos , Acelerometría , Ejercicio Físico , Sistema Cardiovascular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA