Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 3.127
Filtrar
1.
Healthc Technol Lett ; 11(4): 210-212, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39100500

RESUMO

A priority for machine learning in healthcare and other high stakes applications is to enable end-users to easily interpret individual predictions. This opinion piece outlines recent developments in interpretable classifiers and methods to open black box models.

2.
BMC Bioinformatics ; 25(1): 256, 2024 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-39098908

RESUMO

BACKGROUND: Antioxidant proteins are involved in several biological processes and can protect DNA and cells from the damage of free radicals. These proteins regulate the body's oxidative stress and perform a significant role in many antioxidant-based drugs. The current invitro-based medications are costly, time-consuming, and unable to efficiently screen and identify the targeted motif of antioxidant proteins. METHODS: In this model, we proposed an accurate prediction method to discriminate antioxidant proteins namely StackedEnC-AOP. The training sequences are formulation encoded via incorporating a discrete wavelet transform (DWT) into the evolutionary matrix to decompose the PSSM-based images via two levels of DWT to form a Pseudo position-specific scoring matrix (PsePSSM-DWT) based embedded vector. Additionally, the Evolutionary difference formula and composite physiochemical properties methods are also employed to collect the structural and sequential descriptors. Then the combined vector of sequential features, evolutionary descriptors, and physiochemical properties is produced to cover the flaws of individual encoding schemes. To reduce the computational cost of the combined features vector, the optimal features are chosen using Minimum redundancy and maximum relevance (mRMR). The optimal feature vector is trained using a stacking-based ensemble meta-model. RESULTS: Our developed StackedEnC-AOP method reported a prediction accuracy of 98.40% and an AUC of 0.99 via training sequences. To evaluate model validation, the StackedEnC-AOP training model using an independent set achieved an accuracy of 96.92% and an AUC of 0.98. CONCLUSION: Our proposed StackedEnC-AOP strategy performed significantly better than current computational models with a ~ 5% and ~ 3% improved accuracy via training and independent sets, respectively. The efficacy and consistency of our proposed StackedEnC-AOP make it a valuable tool for data scientists and can execute a key role in research academia and drug design.


Assuntos
Antioxidantes , Proteínas , Antioxidantes/química , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Aprendizado de Máquina , Algoritmos , Análise de Ondaletas , Máquina de Vetores de Suporte , Bases de Dados de Proteínas , Matrizes de Pontuação de Posição Específica
3.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39101500

RESUMO

Genomic selection (GS) has emerged as an effective technology to accelerate crop hybrid breeding by enabling early selection prior to phenotype collection. Genomic best linear unbiased prediction (GBLUP) is a robust method that has been routinely used in GS breeding programs. However, GBLUP assumes that markers contribute equally to the total genetic variance, which may not be the case. In this study, we developed a novel GS method called GA-GBLUP that leverages the genetic algorithm (GA) to select markers related to the target trait. We defined four fitness functions for optimization, including AIC, BIC, R2, and HAT, to improve the predictability and bin adjacent markers based on the principle of linkage disequilibrium to reduce model dimension. The results demonstrate that the GA-GBLUP model, equipped with R2 and HAT fitness function, produces much higher predictability than GBLUP for most traits in rice and maize datasets, particularly for traits with low heritability. Moreover, we have developed a user-friendly R package, GAGBLUP, for GS, and the package is freely available on CRAN (https://CRAN.R-project.org/package=GAGBLUP).


Assuntos
Algoritmos , Genômica , Seleção Genética , Zea mays , Genômica/métodos , Zea mays/genética , Oryza/genética , Modelos Genéticos , Melhoramento Vegetal/métodos , Desequilíbrio de Ligação , Fenótipo , Locos de Características Quantitativas , Genoma de Planta , Polimorfismo de Nucleotídeo Único , Software
4.
Front Vet Sci ; 11: 1406107, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39104548

RESUMO

Introduction: Clinical reasoning in veterinary medicine is often based on clinicians' personal experience in combination with information derived from publications describing cohorts of patients. Studies on the use of scientific methods for patient individual decision making are largely lacking. This applies to the prediction of the individual underlying pathology in seizuring dogs as well. The aim of this study was to apply machine learning to the prediction of the risk of structural epilepsy in dogs with seizures. Materials and methods: Dogs with a history of seizures were retrospectively as well as prospectively included. Data about clinical history, neurological examination, diagnostic tests performed as well as the final diagnosis were collected. For data analysis, the Bayesian Network and Random Forest algorithms were used. A total of 33 features for Random Forest and 17 for Bayesian Network were available for analysis. The following four feature selection methods were applied to select features for further analysis: Permutation Importance, Forward Selection, Random Selection and Expert Opinion. The two algorithms Bayesian Network and Random Forest were trained to predict structural epilepsy using the selected features. Results: A total of 328 dogs of 119 different breeds were identified retrospectively between January 2017 and June 2021, of which 33.2% were diagnosed with structural epilepsy. An overall of 89,848 models were trained. The Bayesian Network in combination with the Random feature selection performed best. It was able to predict structural epilepsy with an accuracy of 0.969 (sensitivity: 0.857, specificity: 1.000) among all dogs with seizures using the following features: age at first seizure, cluster seizures, seizure in last 24 h, seizure in last 6 month, and seizure in last year. Conclusion: Machine learning algorithms such as Bayesian Networks and Random Forests identify dogs with structural epilepsy with a high sensitivity and specificity. This information could provide some guidance to clinicians and pet owners in their clinical decision-making process.

5.
J Hazard Mater ; 478: 135407, 2024 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-39116745

RESUMO

The accurate spatial mapping of heavy metal levels in agricultural soils is crucial for environmental management and food security. However, the inherent limitations of traditional interpolation methods and emerging machine-learning techniques restrict their spatial prediction accuracy. This study aimed to refine the spatial prediction of heavy metal distributions in Guangxi, China, by integrating machine learning models and spatial regionalization indices (SRIs). The results demonstrated that random forest (RF) models incorporating SRIs outperformed artificial neural network and support vector regression models, achieving R2 values exceeding 0.96 for eight heavy metals on the test data. Hierarchical clustering for feature selection further improved the model performance. The optimized RF models accurately predicted the heavy metal distributions in agricultural soils across the province, revealing higher levels in the central-western regions and lower levels in the north and south. Notably, the models identified that 25.78 % of agricultural soils constitute hotspots with multiple co-occurring heavy metals, and over 6.41 million people are exposed to excessive soil heavy metal levels. Our findings provide valuable insights for the development of targeted strategies for soil pollution control and agricultural soil management to safeguard food security and public health.

6.
Sci Rep ; 14(1): 18696, 2024 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-39134565

RESUMO

In this paper, an enhanced equilibrium optimization (EO) version named Levy-opposition-equilibrium optimization (LOEO) is proposed to select effective features in network intrusion detection systems (IDSs). The opposition-based learning (OBL) approach is applied by this algorithm to improve the diversity of the population. Also, the Levy flight method is utilized to escape local optima. Then, the binary rendition of the algorithm called BLOEO is employed to feature selection in IDSs. One of the main challenges in IDSs is the high-dimensional feature space, with many irrelevant or redundant features. The BLOEO algorithm is designed to intelligently select the most informative subset of features. The empirical findings on NSL-KDD, UNSW-NB15, and CIC-IDS2017 datasets demonstrate the effectiveness of the BLOEO algorithm. This algorithm has an acceptable ability to effectively reduce the number of data features, maintaining a high intrusion detection accuracy of over 95%. Specifically, on the UNSW-NB15 dataset, BLOEO selected only 10.8 features on average, achieving an accuracy of 97.6% and a precision of 100%.

7.
Comput Biol Med ; 180: 108996, 2024 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-39137669

RESUMO

Accurately differentiating indeterminate pulmonary nodules remains a significant challenge in clinical practice. This challenge becomes increasingly formidable when dealing with the vast radiomic features obtained from low-dose computed tomography, a lung cancer screening technique being rolling out in many areas of the world. Consequently, this study proposed the Altruistic Seagull Optimization Algorithm (AltSOA) for the selection of radiomic features in predicting the malignancy risk of pulmonary nodules. This innovative approach incorporated altruism into the traditional seagull optimization algorithm to seek a global optimal solution. A multi-objective fitness function was designed for training the pulmonary nodule prediction model, aiming to use fewer radiomic features while ensuring prediction performance. Among global radiomic features, the AltSOA identified 11 interested features, including the gray level co-occurrence matrix. This automatically selected panel of radiomic features enabled precise prediction (area under the curve = 0.8383 (95 % confidence interval 0.7862-0.8863)) of the malignancy risk of pulmonary nodules, surpassing the proficiency of radiologists. Furthermore, the interpretability, clinical utility, and generalizability of the pulmonary nodule prediction model were thoroughly discussed. All results consistently underscore the superiority of the AltSOA in predicting the malignancy risk of pulmonary nodules. And the proposed malignant risk prediction model for pulmonary nodules holds promise for enhancing existing lung cancer screening methods. The supporting source codes of this work can be found at: https://github.com/zzl2022/PBMPN.

8.
Comput Biol Med ; 180: 108984, 2024 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-39128177

RESUMO

The identification of tumors through gene analysis in microarray data is a pivotal area of research in artificial intelligence and bioinformatics. This task is challenging due to the large number of genes relative to the limited number of observations, making feature selection a critical step. This paper introduces a novel wrapper feature selection method that leverages a hybrid optimization algorithm combining a genetic operator with a Sinh Cosh Optimizer (SCHO), termed SCHO-GO. The SCHO-GO algorithm is designed to avoid local optima, streamline the search process, and select the most relevant features without compromising classifier performance. Traditional methods often falter with extensive search spaces, necessitating hybrid approaches. Our method aims to reduce the dimensionality and improve the classification accuracy, which is essential in pattern recognition and data analysis. The SCHO-GO algorithm, integrated with a support vector machine (SVM) classifier, significantly enhances cancer classification accuracy. We evaluated the performance of SCHO-GO using the CEC'2022 benchmark function and compared it with seven well-known metaheuristic algorithms. Statistical analyses indicate that SCHO-GO consistently outperforms these algorithms. Experimental tests on eight microarray gene expression datasets, particularly the Gene Expression Cancer RNA-Seq dataset, demonstrate an impressive accuracy of 99.01% with the SCHO-GO-SVM model, highlighting its robustness and precision in handling complex datasets. Furthermore, the SCHO-GO algorithm excels in feature selection and solving mathematical benchmark problems, presenting a promising approach for tumor identification and classification in microarray data analysis.

9.
Eur J Pharm Sci ; : 106876, 2024 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-39128815

RESUMO

BACKGROUND: Valproic acid (VPA) is a commonly used broad-spectrum antiepileptic drug. For elderly epileptic patients, VPA plasma concentrations have a considerable variation. We aim to establish a prediction model via a combination of machine learning and population pharmacokinetics (PPK) for VPA plasma concentration. METHODS: A retrospective study was performed incorporating 43 variables, including PPK parameters. Recursive Feature Elimination with Cross-Validation was used for feature selection. Multiple algorithms were employed for ensemble model, and the model was interpreted by Shapley Additive exPlanations. RESULTS: The inclusion of PPK parameters significantly enhances the performance of individual algorithm model. The composition of categorical boosting, light gradient boosting machine, and random forest (7:2:1) with the highest R2 (0.74) was determined as the ensemble model. The model included 11 variables after feature selection, of which the predictive performance was comparable to the model that incorporated all variables. CONCLUSIONS: Our model was specifically tailored for elderly epileptic patients, providing an efficient and cost-effective approach to predict VPA plasma concentration. The model combined classical PPK with machine learning, and underwent optimization through feature selection and algorithm integration. Our model can serve as a fundamental tool for clinicians in determining VPA plasma concentration and individualized dosing regimens accordingly.

10.
Neural Netw ; 179: 106569, 2024 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-39121787

RESUMO

Driver intention recognition is a critical component of advanced driver assistance systems, with significant implications for improving vehicle safety, intelligence, and fuel economy. However, previous research on driver intention recognition has not fully considered the influence of the driving environment on speed intentions and has not exploited the temporal dependency inherent in the lateral intentions to prevent erroneous changes in recognition. Furthermore, the coupling of speed and lateral intentions was overlooked; they were generally considered separately. To address these limitations, a unified recognition approach for speed and lateral intentions based on deep learning is presented in this study. First, extensive naturalistic driving data are collected, and information related to road slope and driving trajectories is extracted. A comprehensive classification of driver intentions is then performed. Toeplitz inverse covariance-based clustering and trajectory clustering methods are applied separately to label speed and lateral intentions, so that the influence of driving environments and the coupling of speed and lateral intentions are integrated into intention recognition. Finally, a deep-learning-based unified recognition model for driver intention is developed. This model uses a hierarchical recognition approach for speed intentions and includes a double-layer networks architecture with long short-term memory for the recognition of lateral intention. The validation results show that the created driver intention recognition model can accurately and stably recognize both speed and lateral intentions in complex driving environments.

11.
J Mol Biol ; : 168741, 2024 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-39122168

RESUMO

The purpose of feature selection in protein sequence recognition problems is to select the optimal feature set and use it as training input for classifiers and discover key sequence features of specific proteins. In the feature selection process, relevant features associated with the target task will be retained, and irrelevant and redundant features will be removed. Therefore, in an ideal state, a feature combination with smaller feature dimensions and higher performance indicators is desired. This paper proposes an algorithm called IIFS2.0 based on the cache elimination strategy, which takes the local optimal combination of cached feature subsets as a breakthrough point. It searches for a new feature combination method through the cache elimination strategy to avoid the drawbacks of human factors and excessive reliance on feature sorting results. We validated and analyzed its effectiveness on the protein dataset, demonstrating that IIFS2.0 significantly reduces the dimensionality of feature combinations while also improving various evaluation indicators. In addition, we provide IIFS2.0 on http://112.124.26.17:8006/ for researchers to use.

12.
Spectrochim Acta A Mol Biomol Spectrosc ; 323: 124913, 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39126867

RESUMO

In this study, a simple and accurate approach is proposed for enhancing the origin identification of raspberry samples using a combination of innovative Raman spectral preprocessing techniques, feature selection, and machine learning algorithms. Window function was creatively introduced and combined with baseline removal technique to preprocess the Raman spectral data, reducing the dimensionality of the raw data and ensuring the quality of the processed data. An optimization process was conducted to determine the optimal parameter for the window function, resulting in a binning window width of 5 that yielded the highest accuracy. After applying three feature selection techniques, it was found that the information gain model had the best performance in extracting discriminative spectral features. Finally, ten different machine learning algorithms were employed to construct predictive models, and the optimal models were selected. Linear Support Vector Classifier (LinearSVC), Multi-Layer Perceptron Classifier (MLPClassifier), and Linear Discriminant Analysis (LDA) achieve accuracy, precision, recall, and F1 values above 0.96, while the Random Vector Functional Link Network Classifier (RVFLClassifier) surpasses 0.93 for these performance metrics. These results demonstrate the effectiveness of the proposed approach in identifying the origin of raspberry samples with high accuracy and robustness, providing a valuable tool for agricultural product authentication and quality control.

13.
BMC Biol ; 22(1): 167, 2024 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-39113021

RESUMO

BACKGROUND: Single-cell RNA sequencing enables studying cells individually, yet high gene dimensions and low cell numbers challenge analysis. And only a subset of the genes detected are involved in the biological processes underlying cell-type specific functions. RESULT: In this study, we present COMSE, an unsupervised feature selection framework using community detection to capture informative genes from scRNA-seq data. COMSE identified homogenous cell substates with high resolution, as demonstrated by distinguishing different cell cycle stages. Evaluations based on real and simulated scRNA-seq datasets showed COMSE outperformed methods even with high dropout rates in cell clustering assignment. We also demonstrate that by identifying communities of genes associated with batch effects, COMSE parses signals reflecting biological difference from noise arising due to differences in sequencing protocols, thereby enabling integrated analysis of scRNA-seq datasets of different sources. CONCLUSIONS: COMSE provides an efficient unsupervised framework that selects highly informative genes in scRNA-seq data improving cell sub-states identification and cell clustering. It identifies gene subsets that reveal biological and technical heterogeneity, supporting applications like batch effect correction and pathway analysis. It also provides robust results for bulk RNA-seq data analysis.


Assuntos
RNA-Seq , Análise da Expressão Gênica de Célula Única , Animais , Humanos , Camundongos , RNA-Seq/métodos
14.
Curr Genomics ; 25(3): 185-201, 2024 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-39087000

RESUMO

Background: Analyzing genomic sequences plays a crucial role in understanding biological diversity and classifying Bamboo species. Existing methods for genomic sequence analysis suffer from limitations such as complexity, low accuracy, and the need for constant reconfiguration in response to evolving genomic datasets. Aim: This study addresses these limitations by introducing a novel Dual Heuristic Feature Selection-based Ensemble Classification Model (DHFS-ECM) for the precise identification of Bamboo species from genomic sequences. Methods: The proposed DHFS-ECM method employs a Genetic Algorithm to perform dual heuristic feature selection. This process maximizes inter-class variance, leading to the selection of informative N-gram feature sets. Subsequently, intra-class variance levels are used to create optimal training and validation sets, ensuring comprehensive coverage of class-specific features. The selected features are then processed through an ensemble classification layer, combining multiple stratification models for species-specific categorization. Results: Comparative analysis with state-of-the-art methods demonstrate that DHFS-ECM achieves remarkable improvements in accuracy (9.5%), precision (5.9%), recall (8.5%), and AUC performance (4.5%). Importantly, the model maintains its performance even with an increased number of species classes due to the continuous learning facilitated by the Dual Heuristic Genetic Algorithm Model. Conclusion: DHFS-ECM offers several key advantages, including efficient feature extraction, reduced model complexity, enhanced interpretability, and increased robustness and accuracy through the ensemble classification layer. These attributes make DHFS-ECM a promising tool for real-time clinical applications and a valuable contribution to the field of genomic sequence analysis.

15.
Comput Biol Med ; 180: 108982, 2024 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-39111152

RESUMO

Kidney transplant recipients face a high cardiovascular risk, which is a leading cause of death in this patient group. This article proposes the application of clustering techniques and feature selection to predict the survival outcomes of kidney transplant recipients based on machine learning techniques and mainstream statistical methods. First, feature selection techniques (Boruta, Random Survival Forest and Elastic Net) are used to detect the most relevant variables. Subsequently, each set of variables obtained by each feature selection technique is used as input for the clustering algorithms used (Consensus Clustering, Self-Organizing Map and Agglomerative Clustering) to determine which combination of feature selection, clustering algorithm and number of clusters maximizes intercluster variability. Next, the mechanism called False Clustering Discovery Reduction is applied to obtain the minimum number of statistically differentiable populations after applying a control metric. This metric is based on a variance test to confirm that reducing the number of clusters does not generate significant losses in the heterogeneity obtained. This approach was applied to the Organ Procurement and Transplantation Network medical dataset (n = 11,332). The combination of Random Survival Forest and consensus clustering yielded the optimal result of 4 clusters starting from 8 initial ones. Finally, for each population, Kaplan-Meier survival curves are generated to predict the survival of new patients based on the predictions of the XGBoost classifier, with an overall multi-class AUC of 98.11%.

16.
Sensors (Basel) ; 24(15)2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39124036

RESUMO

The accuracy of classifying motor imagery (MI) activities is a significant challenge when using brain-computer interfaces (BCIs). BCIs allow people with motor impairments to control external devices directly with their brains using electroencephalogram (EEG) patterns that translate brain activity into control signals. Many researchers have been working to develop MI-based BCI recognition systems using various time-frequency feature extraction and classification approaches. However, the existing systems still face challenges in achieving satisfactory performance due to large amount of non-discriminative and ineffective features. To get around these problems, we suggested a multiband decomposition-based feature extraction and classification method that works well, along with a strong feature selection method for MI tasks. Our method starts by splitting the preprocessed EEG signal into four sub-bands. In each sub-band, we then used a common spatial pattern (CSP) technique to pull out narrowband-oriented useful features, which gives us a high-dimensional feature vector. Subsequently, we utilized an effective feature selection method, Relief-F, which reduces the dimensionality of the final features. Finally, incorporating advanced classification techniques, we classified the final reduced feature vector. To evaluate the proposed model, we used the three different EEG-based MI benchmark datasets, and our proposed model achieved better performance accuracy than existing systems. Our model's strong points include its ability to effectively reduce feature dimensionality and improve classification accuracy through advanced feature extraction and selection methods.


Assuntos
Interfaces Cérebro-Computador , Eletroencefalografia , Eletroencefalografia/métodos , Humanos , Algoritmos , Processamento de Sinais Assistido por Computador , Imaginação/fisiologia , Encéfalo/fisiologia
17.
JMIR Med Inform ; 12: e52896, 2024 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-39087585

RESUMO

Background: The application of machine learning in health care often necessitates the use of hierarchical codes such as the International Classification of Diseases (ICD) and Anatomical Therapeutic Chemical (ATC) systems. These codes classify diseases and medications, respectively, thereby forming extensive data dimensions. Unsupervised feature selection tackles the "curse of dimensionality" and helps to improve the accuracy and performance of supervised learning models by reducing the number of irrelevant or redundant features and avoiding overfitting. Techniques for unsupervised feature selection, such as filter, wrapper, and embedded methods, are implemented to select the most important features with the most intrinsic information. However, they face challenges due to the sheer volume of ICD and ATC codes and the hierarchical structures of these systems. Objective: The objective of this study was to compare several unsupervised feature selection methods for ICD and ATC code databases of patients with coronary artery disease in different aspects of performance and complexity and select the best set of features representing these patients. Methods: We compared several unsupervised feature selection methods for 2 ICD and 1 ATC code databases of 51,506 patients with coronary artery disease in Alberta, Canada. Specifically, we used the Laplacian score, unsupervised feature selection for multicluster data, autoencoder-inspired unsupervised feature selection, principal feature analysis, and concrete autoencoders with and without ICD or ATC tree weight adjustment to select the 100 best features from over 9000 ICD and 2000 ATC codes. We assessed the selected features based on their ability to reconstruct the initial feature space and predict 90-day mortality following discharge. We also compared the complexity of the selected features by mean code level in the ICD or ATC tree and the interpretability of the features in the mortality prediction task using Shapley analysis. Results: In feature space reconstruction and mortality prediction, the concrete autoencoder-based methods outperformed other techniques. Particularly, a weight-adjusted concrete autoencoder variant demonstrated improved reconstruction accuracy and significant predictive performance enhancement, confirmed by DeLong and McNemar tests (P<.05). Concrete autoencoders preferred more general codes, and they consistently reconstructed all features accurately. Additionally, features selected by weight-adjusted concrete autoencoders yielded higher Shapley values in mortality prediction than most alternatives. Conclusions: This study scrutinized 5 feature selection methods in ICD and ATC code data sets in an unsupervised context. Our findings underscore the superiority of the concrete autoencoder method in selecting salient features that represent the entire data set, offering a potential asset for subsequent machine learning research. We also present a novel weight adjustment approach for the concrete autoencoders specifically tailored for ICD and ATC code data sets to enhance the generalizability and interpretability of the selected features.

18.
Heliyon ; 10(12): e32570, 2024 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-38975140

RESUMO

Prediction of student academic performance is still a problem because of the limitations of the existing methods specifically low generalizability and lack of interpretability. This study suggests a new approach that deals with the current problems and provides more reliable predictions. The proposed approach combines the information gain (IG) and Laplacian score (LS) for feature selection. In this feature selection scheme, combination of IG and LS is used for ranking features and then, Sequential Forward Selection mechanism is used for determining the most relevant indicators. Also, combination of random forest algorithm with a genetic algorithm for is introduced for multi-class classification. This approach strives to attain more accuracy and reliability than current techniques. The case study shows the proposed strategy can predict performance of students with average accuracy of 93.11 % which shows a minimum improvement of 2.25 % compared to the baseline methods. The findings were further confirmed by the analysis of different evaluation metrics (Accuracy, Precision, Recall, F-Measure) to prove the efficiency of the proposed mechanism.

19.
BMC Public Health ; 24(1): 1777, 2024 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-38961394

RESUMO

BACKGROUND: Dyslipidemia, characterized by variations in plasma lipid profiles, poses a global health threat linked to millions of deaths annually. OBJECTIVES: This study focuses on predicting dyslipidemia incidence using machine learning methods, addressing the crucial need for early identification and intervention. METHODS: The dataset, derived from the Lifestyle Promotion Project (LPP) in East Azerbaijan Province, Iran, undergoes a comprehensive preprocessing, merging, and null handling process. Target selection involves five distinct dyslipidemia-related variables. Normalization techniques and three feature selection algorithms are applied to enhance predictive modeling. RESULT: The study results underscore the potential of different machine learning algorithms, specifically multi-layer perceptron neural network (MLP), in reaching higher performance metrics such as accuracy, F1 score, sensitivity and specificity, among other machine learning methods. Among other algorithms, Random Forest also showed remarkable accuracies and outperformed K-Nearest Neighbors (KNN) in metrics like precision, recall, and F1 score. The study's emphasis on feature selection detected meaningful patterns among five target variables related to dyslipidemia, indicating fundamental shared unities among dyslipidemia-related factors. Features such as waist circumference, serum vitamin D, blood pressure, sex, age, diabetes, and physical activity related to dyslipidemia. CONCLUSION: These results cooperatively highlight the complex nature of dyslipidemia and its connections with numerous factors, strengthening the importance of applying machine learning methods to understand and predict its incidence precisely.


Assuntos
Dislipidemias , Aprendizado de Máquina , Humanos , Dislipidemias/epidemiologia , Incidência , Irã (Geográfico)/epidemiologia , Masculino , Feminino , Estilo de Vida , Algoritmos , Promoção da Saúde/métodos , Pessoa de Meia-Idade , Adulto
20.
World J Surg Oncol ; 22(1): 177, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38970097

RESUMO

This study investigates the genetic factors contributing to the disparity in prostate cancer incidence and progression among African American men (AAM) compared to European American men (EAM). The research focuses on employing Weighted Gene Co-expression Network Analysis (WGCNA) on public microarray data obtained from prostate cancer patients. The study employed WGCNA to identify clusters of genes with correlated expression patterns, which were then analyzed for their connection to population backgrounds. Additionally, pathway enrichment analysis was conducted to understand the significance of the identified gene modules in prostate cancer pathways. The Least Absolute Shrinkage and Selection Operator (LASSO) and Correlation-based Feature Selection (CFS) methods were utilized for selection of biomarker genes. The results revealed 353 differentially expressed genes (DEGs) between AAM and EAM. Six significant gene expression modules were identified through WGCNA, showing varying degrees of correlation with prostate cancer. LASSO and CFS methods pinpointed critical genes, as well as six common genes between both approaches, which are indicative of their vital role in the disease. The XGBoost classifier validated these findings, achieving satisfactory prediction accuracy. Genes such as APRT, CCL2, BEX2, MGC26963, and PLAU were identified as key genes significantly associated with cancer progression. In conclusion, the research underlines the importance of incorporating AAM and EAM population diversity in genomic studies, particularly in cancer research. In addition, the study highlights the effectiveness of integrating machine learning techniques with gene expression analysis as a robust methodology for identifying critical genes in cancer research.


Assuntos
Biomarcadores Tumorais , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Neoplasias da Próstata , População Branca , Humanos , Masculino , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologia , Biomarcadores Tumorais/genética , Perfilação da Expressão Gênica/métodos , População Branca/genética , População Branca/estatística & dados numéricos , Negro ou Afro-Americano/genética , Negro ou Afro-Americano/estatística & dados numéricos , Regulação Neoplásica da Expressão Gênica , Transcriptoma , Prognóstico , Progressão da Doença
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...