RESUMO
This study aims to propose modified semiparametric estimators based on six different penalty and shrinkage strategies for the estimation of a right-censored semiparametric regression model. In this context, the methods used to obtain the estimators are ridge, lasso, adaptive lasso, SCAD, MCP, and elasticnet penalty functions. The most important contribution that distinguishes this article from its peers is that it uses the local polynomial method as a smoothing method. The theoretical estimation procedures for the obtained estimators are explained. In addition, a simulation study is performed to see the behavior of the estimators and make a detailed comparison, and hepatocellular carcinoma data are estimated as a real data example. As a result of the study, the estimators based on adaptive lasso and SCAD were more resistant to censorship and outperformed the other four estimators.
RESUMO
This paper focuses on the adaptive spline (A-spline) fitting of the semiparametric regression model to time series data with right-censored observations. Typically, there are two main problems that need to be solved in such a case: dealing with censored data and obtaining a proper A-spline estimator for the components of the semiparametric model. The first problem is traditionally solved by the synthetic data approach based on the Kaplan-Meier estimator. In practice, although the synthetic data technique is one of the most widely used solutions for right-censored observations, the transformed data's structure is distorted, especially for heavily censored datasets, due to the nature of the approach. In this paper, we introduced a modified semiparametric estimator based on the A-spline approach to overcome data irregularity with minimum information loss and to resolve the second problem described above. In addition, the semiparametric B-spline estimator was used as a benchmark method to gauge the success of the A-spline estimator. To this end, a detailed Monte Carlo simulation study and a real data sample were carried out to evaluate the performance of the proposed estimator and to make a practical comparison.
RESUMO
Electroencephalography/Magnetoencephalography (EEG/MEG) source localization involves the estimation of neural activity inside the brain volume that underlies the EEG/MEG measures observed at the sensor array. In this paper, we consider a Bayesian finite spatial mixture model for source reconstruction and implement Ant Colony System (ACS) optimization coupled with Iterated Conditional Modes (ICM) for computing estimates of the neural source activity. Our approach is evaluated using simulation studies and a real data application in which we implement a nonparametric bootstrap for interval estimation. We demonstrate improved performance of the ACS-ICM algorithm as compared to existing methodology for the same spatiotemporal model.
RESUMO
In a host of business applications, biomedical and epidemiological studies, the problem of multicollinearity among predictor variables is a frequent issue in longitudinal data analysis for linear mixed models (LMM). We consider an efficient estimation strategy for high-dimensional data application, where the dimensions of the parameters are larger than the number of observations. In this paper, we are interested in estimating the fixed effects parameters of the LMM when it is assumed that some prior information is available in the form of linear restrictions on the parameters. We propose the pretest and shrinkage estimation strategies using the ridge full model as the base estimator. We establish the asymptotic distributional bias and risks of the suggested estimators and investigate their relative performance with respect to the ridge full model estimator. Furthermore, we compare the numerical performance of the LASSO-type estimators with the pretest and shrinkage ridge estimators. The methodology is investigated using simulation studies and then demonstrated on an application exploring how effective brain connectivity in the default mode network (DMN) may be related to genetics within the context of Alzheimer's disease.
RESUMO
In biomedical and epidemiological studies, gene-environment (G-E) interactions have been shown to importantly contribute to the etiology and progression of many complex diseases. Most existing approaches for identifying G-E interactions are limited by the lack of robustness against outliers/contaminations in response and predictor spaces. In this study, we develop a novel robust G-E identification approach using the trimmed regression technique under joint modeling. A robust data-driven criterion and stability selection are adopted to determine the trimmed subset which is free from both vertical outliers and leverage points. An effective penalization approach is developed to identify important G-E interactions, respecting the "main effects, interactions" hierarchical structure. Extensive simulations demonstrate the better performance of the proposed approach compared to multiple alternatives. Interesting findings with superior prediction accuracy and stability are observed in the analysis of TCGA data on cutaneous melanoma and breast invasive carcinoma.
RESUMO
For many complex diseases, prognosis is of essential importance. It has been shown that, beyond the main effects of genetic (G) and environmental (E) risk factors, gene-environment (G × E) interactions also play a critical role. In practical data analysis, part of the prognosis outcome data can have a distribution different from that of the rest of the data because of contamination or a mixture of subtypes. Literature has shown that data contamination as well as a mixture of distributions, if not properly accounted for, can lead to severely biased model estimation. In this study, we describe prognosis using an accelerated failure time (AFT) model. An exponential squared loss is proposed to accommodate data contamination or a mixture of distributions. A penalization approach is adopted for regularized estimation and marker selection. The proposed method is realized using an effective coordinate descent (CD) and minorization maximization (MM) algorithm. The estimation and identification consistency properties are rigorously established. Simulation shows that without contamination or mixture, the proposed method has performance comparable to or better than the nonrobust alternative. However, with contamination or mixture, it outperforms the nonrobust alternative and, under certain scenarios, is superior to the robust method based on quantile regression. The proposed method is applied to the analysis of TCGA (The Cancer Genome Atlas) lung cancer data. It identifies interactions different from those using the alternatives. The identified markers have important implications and satisfactory stability.