Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
BMC Med Inform Decis Mak ; 24(1): 120, 2024 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-38715002

RESUMEN

In recent times, time-to-event data such as time to failure or death is routinely collected alongside high-throughput covariates. These high-dimensional bioinformatics data often challenge classical survival models, which are either infeasible to fit or produce low prediction accuracy due to overfitting. To address this issue, the focus has shifted towards introducing a novel approaches for feature selection and survival prediction. In this article, we propose a new hybrid feature selection approach that handles high-dimensional bioinformatics datasets for improved survival prediction. This study explores the efficacy of four distinct variable selection techniques: LASSO, RSF-vs, SCAD, and CoxBoost, in the context of non-parametric biomedical survival prediction. Leveraging these methods, we conducted comprehensive variable selection processes. Subsequently, survival analysis models-specifically CoxPH, RSF, and DeepHit NN-were employed to construct predictive models based on the selected variables. Furthermore, we introduce a novel approach wherein only variables consistently selected by a majority of the aforementioned feature selection techniques are considered. This innovative strategy, referred to as the proposed method, aims to enhance the reliability and robustness of variable selection, subsequently improving the predictive performance of the survival analysis models. To evaluate the effectiveness of the proposed method, we compare the performance of the proposed approach with the existing LASSO, RSF-vs, SCAD, and CoxBoost techniques using various performance metrics including integrated brier score (IBS), concordance index (C-Index) and integrated absolute error (IAE) for numerous high-dimensional survival datasets. The real data applications reveal that the proposed method outperforms the competing methods in terms of survival prediction accuracy.


Asunto(s)
Redes Neurales de la Computación , Humanos , Análisis de Supervivencia , Estadísticas no Paramétricas , Biología Computacional/métodos
2.
J Pak Med Assoc ; 70(7): 1169-1172, 2020 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-32799268

RESUMEN

OBJECTIVE: To assess the risk factors associated with tonsillitis. METHODS: The cross-sectional study was conducted at Mardan Medical Complex and District Headquarter Hospital, Mardan, Pakistan, from January to June 2018, and comprised tonsillitis patients. Data was collected using a questionnaire which included different risk factors like age 1-10 years, gender, residential area, dietary habit etc. Data was analysed using SPSS 20. RESULTS: Of the 325 subjects, 200(61.54%), were clinically diagnosed with tonsillitis; 138(69%) being males. Age, unhygienic living condition, balanced diet, stressful environment and the use of sore/spicy foods were identified as significantly associated factors (p<0.05). CONCLUSIONS: Age, unhygienic living condition, balanced diet, stressful environment and the use of sore/spicy food were found to have a strong association with tonsillitis.


Asunto(s)
Tonsilitis , Niño , Preescolar , Estudios Transversales , Conducta Alimentaria , Humanos , Lactante , Masculino , Pakistán/epidemiología , Factores de Riesgo , Tonsilitis/epidemiología
3.
J Pak Med Assoc ; 70(12(B)): 2356-2362, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33475543

RESUMEN

OBJECTIVE: The aim of this study is to filter out the most informative genes that mainly regulate the target tissue class, increase classification accuracy, reduce the curse of dimensionality, and discard redundant and irrelevant genes. METHOD: This paper presented the idea of gene selection using bagging sub-forest (BSF). The proposed method provided genes importance grounded on the idea specified in the standard random forest algorithm. The new method is compared with three state-of-the art methods, i.e., Wilcoxon, masked painter and proportional overlapped score (POS). These methods were applied on 5 data sets, i.e. Colon, Lymph node breast cancer, Leukaemia, Serrated colorectal carcinomas, and Breast Cancer. Comparison was done by selecting top 20 genes by applying the gene selection methods and applying random forest (RF) and support vector machine (SVM) classifiers to assess their predictive performance on the datasets with selected genes. Classification accuracy, Brier score, and sensitivity have been used as performance measures. RESULTS: The proposed method gave better results than the other methods using both random forest and SVM classifiers on all the datasets among all the feature selection methods. CONCLUSIONS: The proposed method showed improved performance in terms of classification accuracy, Brier score and sensitivity, and hence, could be used as a novel method for gene selection to classify tissue samples into their correct classes.


Asunto(s)
Aprendizaje Automático , Máquina de Vectores de Soporte , Algoritmos , Genes Reguladores , Genómica , Humanos
4.
PLoS One ; 19(5): e0297544, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38809823

RESUMEN

Statistical quality control is concerned with the analysis of production and manufacturing processes. Control charts are process control techniques, commonly applied to observe and control deviations. Shewhart control charts are very sensitive and used for large shifts based on the basic assumption of normality. Cumulative Sum (CUSUM) control charts are effective for identifying that may have special causes, such as outliers or excessive variability in subgroup means. This study uses a CUSUM control chart problems structure to evaluate the performance of robust dispersion parameters. We investigated the design structure features of various control charts, based on currently defined estimators and some new robust scale estimators using trimming and winsorization in different scenarios. The Median Absolute Deviation based on trimming and winsorization is introduced. The effectiveness of CUSUM control charts based on these estimators is evaluated in terms of average run length (ARL) and Standard Deviation of the Run Length (SDRL) using a simulation study. The results show the robustness of the CUSUM chart in observing small changes in magnitude for both normal and contaminated data. In general, robust estimators MADTM and MADWM based on CUSUM charts outperform in all environments.


Asunto(s)
Control de Calidad , Modelos Estadísticos , Simulación por Computador , Algoritmos
5.
PLoS One ; 18(4): e0284619, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37098036

RESUMEN

Feature selection in high dimensional gene expression datasets not only reduces the dimension of the data, but also the execution time and computational cost of the underlying classifier. The current study introduces a novel feature selection method called weighted signal to noise ratio (WSNR) by exploiting the weights of features based on support vectors and signal to noise ratio, with an objective to identify the most informative genes in high dimensional classification problems. The combination of two state-of-the-art procedures enables the extration of the most informative genes. The corresponding weights of these procedures are then multiplied and arranged in decreasing order. Larger weight of a feature indicates its discriminatory power in classifying the tissue samples to their true classes. The current method is validated on eight gene expression datasets. Moreover, results of the proposed method (WSNR) are also compared with four well known feature selection methods. We found that the (WSNR) outperform the other competing methods on 6 out of 8 datasets. Box-plots and Bar-plots of the results of the proposed method and all the other methods are also constructed. The proposed method is further assessed on simulated data. Simulation analysis reveal that (WSNR) outperforms all the other methods included in the study.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Perfilación de la Expresión Génica/métodos , Relación Señal-Ruido , Análisis por Micromatrices , Expresión Génica
6.
PeerJ Comput Sci ; 7: e562, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34141889

RESUMEN

In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate between the classes are selected. The results of the proposed method are compared with five state-of-the-art gene selection methods based on classification error, Brier score, and sensitivity, by considering eleven gene expression datasets. Classification of observations for different sets of selected genes by the proposed method is carried out by three different classifiers, i.e., random forest, k-nearest neighbors (k-NN), and support vector machine (SVM). Box-plots and stability scores of the results are also shown in this paper. The results reveal that in most of the cases the proposed method outperforms the other methods.

SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda