Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 123
Filtrar
1.
BMC Med Imaging ; 24(1): 177, 2024 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-39030508

RESUMO

BACKGROUND: Cancer pathology shows disease development and associated molecular features. It provides extensive phenotypic information that is cancer-predictive and has potential implications for planning treatment. Based on the exceptional performance of computational approaches in the field of digital pathogenic, the use of rich phenotypic information in digital pathology images has enabled us to identify low-level gliomas (LGG) from high-grade gliomas (HGG). Because the differences between the textures are so slight, utilizing just one feature or a small number of features produces poor categorization results. METHODS: In this work, multiple feature extraction methods that can extract distinct features from the texture of histopathology image data are used to compare the classification outcomes. The successful feature extraction algorithms GLCM, LBP, multi-LBGLCM, GLRLM, color moment features, and RSHD have been chosen in this paper. LBP and GLCM algorithms are combined to create LBGLCM. The LBGLCM feature extraction approach is extended in this study to multiple scales using an image pyramid, which is defined by sampling the image both in space and scale. The preprocessing stage is first used to enhance the contrast of the images and remove noise and illumination effects. The feature extraction stage is then carried out to extract several important features (texture and color) from histopathology images. Third, the feature fusion and reduction step is put into practice to decrease the number of features that are processed, reducing the computation time of the suggested system. The classification stage is created at the end to categorize various brain cancer grades. We performed our analysis on the 821 whole-slide pathology images from glioma patients in the Cancer Genome Atlas (TCGA) dataset. Two types of brain cancer are included in the dataset: GBM and LGG (grades II and III). 506 GBM images and 315 LGG images are included in our analysis, guaranteeing representation of various tumor grades and histopathological features. RESULTS: The fusion of textural and color characteristics was validated in the glioma patients using the 10-fold cross-validation technique with an accuracy equals to 95.8%, sensitivity equals to 96.4%, DSC equals to 96.7%, and specificity equals to 97.1%. The combination of the color and texture characteristics produced significantly better accuracy, which supported their synergistic significance in the predictive model. The result indicates that the textural characteristics can be an objective, accurate, and comprehensive glioma prediction when paired with conventional imagery. CONCLUSION: The results outperform current approaches for identifying LGG from HGG and provide competitive performance in classifying four categories of glioma in the literature. The proposed model can help stratify patients in clinical studies, choose patients for targeted therapy, and customize specific treatment schedules.


Assuntos
Algoritmos , Neoplasias Encefálicas , Cor , Glioma , Gradação de Tumores , Humanos , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/patologia , Neoplasias Encefálicas/classificação , Glioma/diagnóstico por imagem , Glioma/patologia , Glioma/classificação , Diagnóstico por Computador/métodos , Interpretação de Imagem Assistida por Computador/métodos
2.
Sensors (Basel) ; 23(18)2023 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-37765916

RESUMO

Technological advancements in healthcare, production, automobile, and aviation industries have shifted working styles from manual to automatic. This automation requires smart, intellectual, and safe machinery to develop an accurate and efficient brain-computer interface (BCI) system. However, developing such BCI systems requires effective processing and analysis of human physiology. Electroencephalography (EEG) is one such technique that provides a low-cost, portable, non-invasive, and safe solution for BCI systems. However, the non-stationary and nonlinear nature of EEG signals makes it difficult for experts to perform accurate subjective analyses. Hence, there is an urgent need for the development of automatic mental state detection. This paper presents the classification of three mental states using an ensemble of the tunable Q wavelet transform, the multilevel discrete wavelet transform, and the flexible analytic wavelet transform. Various features are extracted from the subbands of EEG signals during focused, unfocused, and drowsy states. Separate and fused features from ensemble decomposition are classified using an optimized ensemble classifier. Our analysis shows that the fusion of features results in a dimensionality reduction. The proposed model obtained the highest accuracies of 92.45% and 97.8% with ten-fold cross-validation and the iterative majority voting technique. The proposed method is suitable for real-time mental state detection to improve BCI systems.


Assuntos
Algoritmos , Interfaces Cérebro-Computador , Humanos , Eletroencefalografia/métodos , Análise de Ondaletas , Processamento de Sinais Assistido por Computador
3.
Sensors (Basel) ; 23(23)2023 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-38067959

RESUMO

The Internet of Things (IoT) is a powerful technology that connect its users worldwide with everyday objects without any human interference. On the contrary, the utilization of IoT infrastructure in different fields such as smart homes, healthcare and transportation also raises potential risks of attacks and anomalies caused through node security breaches. Therefore, an Intrusion Detection System (IDS) must be developed to largely scale up the security of IoT technologies. This paper proposes a Logistic Regression based Ensemble Classifier (LREC) for effective IDS implementation. The LREC combines AdaBoost and Random Forest (RF) to develop an effective classifier using the iterative ensemble approach. The issue of data imbalance is avoided by using the adaptive synthetic sampling (ADASYN) approach. Further, inappropriate features are eliminated using recursive feature elimination (RFE). There are two different datasets, namely BoT-IoT and TON-IoT, for analyzing the proposed RFE-LREC method. The RFE-LREC is analyzed on the basis of accuracy, recall, precision, F1-score, false alarm rate (FAR), receiver operating characteristic (ROC) curve, true negative rate (TNR) and Matthews correlation coefficient (MCC). The existing researches, namely NetFlow-based feature set, TL-IDS and LSTM, are used to compare with the RFE-LREC. The classification accuracy of RFE-LREC for the BoT-IoT dataset is 99.99%, which is higher when compared to those of TL-IDS and LSTM.

4.
Sensors (Basel) ; 23(3)2023 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-36772390

RESUMO

Nowadays, machine learning (ML) is a revolutionary and cutting-edge technology widely used in the medical domain and health informatics in the diagnosis and prognosis of cardiovascular diseases especially. Therefore, we propose a ML-based soft-voting ensemble classifier (SVEC) for the predictive modeling of acute coronary syndrome (ACS) outcomes such as STEMI and NSTEMI, discharge reasons for the patients admitted in the hospitals, and death types for the affected patients during the hospital stay. We used the Korea Acute Myocardial Infarction Registry (KAMIR-NIH) dataset, which has 13,104 patients' data containing 551 features. After data extraction and preprocessing, we used the 125 useful features and applied the SMOTETomek hybrid sampling technique to oversample the data imbalance of minority classes. Our proposed SVEC applied three ML algorithms, such as random forest, extra tree, and the gradient-boosting machine for predictive modeling of our target variables, and compared with the performances of all base classifiers. The experiments showed that the SVEC outperformed other ML-based predictive models in accuracy (99.0733%), precision (99.0742%), recall (99.0734%), F1-score (99.9719%), and the area under the ROC curve (AUC) (99.9702%). Overall, the performance of the SVEC was better than other applied models, but the AUC was slightly lower than the extra tree classifier for the predictive modeling of ACS outcomes. The proposed predictive model outperformed other ML-based models; hence it can be used practically in hospitals for the diagnosis and prediction of heart problems so that timely detection of proper treatments can be chosen, and the occurrence of disease predicted more accurately.


Assuntos
Síndrome Coronariana Aguda , Humanos , Tempo de Internação , Síndrome Coronariana Aguda/diagnóstico , Prognóstico , Algoritmos , Aprendizado de Máquina
5.
J Digit Imaging ; 36(4): 1460-1479, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37145248

RESUMO

An automated diagnosis system is crucial for helping radiologists identify brain abnormalities efficiently. The convolutional neural network (CNN) algorithm of deep learning has the advantage of automated feature extraction beneficial for an automated diagnosis system. However, several challenges in the CNN-based classifiers of medical images, such as a lack of labeled data and class imbalance problems, can significantly hinder the performance. Meanwhile, the expertise of multiple clinicians may be required to achieve accurate diagnoses, which can be reflected in the use of multiple algorithms. In this paper, we present Deep-Stacked CNN, a deep heterogeneous model based on stacked generalization to harness the advantages of different CNN-based classifiers. The model aims to improve robustness in the task of multi-class brain disease classification when we have no opportunity to train single CNNs on sufficient data. We propose two levels of learning processes to obtain the desired model. At the first level, different pre-trained CNNs fine-tuned via transfer learning will be selected as the base classifiers through several procedures. Each base classifier has a unique expert-like character, which provides diversity to the diagnosis outcomes. At the second level, the base classifiers are stacked together through neural network, representing the meta-learner that best combines their outputs and generates the final prediction. The proposed Deep-Stacked CNN obtained an accuracy of 99.14% when evaluated on the untouched dataset. This model shows its superiority over existing methods in the same domain. It also requires fewer parameters and computations while maintaining outstanding performance.


Assuntos
Encefalopatias , Redes Neurais de Computação , Humanos , Imageamento por Ressonância Magnética/métodos , Algoritmos , Encefalopatias/diagnóstico por imagem , Encéfalo/diagnóstico por imagem
6.
BMC Bioinformatics ; 23(1): 90, 2022 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-35287576

RESUMO

BACKGROUND: Current protein family modeling methods like profile Hidden Markov Model (pHMM), k-mer based methods, and deep learning-based methods do not provide very accurate protein function prediction for proteins in the twilight zone, due to low sequence similarity to reference proteins with known functions. RESULTS: We present a novel method EnsembleFam, aiming at better function prediction for proteins in the twilight zone. EnsembleFam extracts the core characteristics of a protein family using similarity and dissimilarity features calculated from sequence homology relations. EnsembleFam trains three separate Support Vector Machine (SVM) classifiers for each family using these features, and an ensemble prediction is made to classify novel proteins into these families. Extensive experiments are conducted using the Clusters of Orthologous Groups (COG) dataset and G Protein-Coupled Receptor (GPCR) dataset. EnsembleFam not only outperforms state-of-the-art methods on the overall dataset but also provides a much more accurate prediction for twilight zone proteins. CONCLUSIONS: EnsembleFam, a machine learning method to model protein families, can be used to better identify members with very low sequence homology. Using EnsembleFam protein functions can be predicted  using just sequence information with better accuracy than state-of-the-art methods.


Assuntos
Proteínas , Máquina de Vetores de Suporte , Humanos , Proteínas/metabolismo
7.
J Biomed Inform ; 135: 104216, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36208833

RESUMO

Robust and rabid mortality prediction is crucial in intensive care units because it is considered one of the critical steps for treating patients with serious conditions. Combining mortality prediction with the length of stay (LoS) prediction adds another level of importance to these models. No studies in the literature predict such tasks for neonates, especially using time-series data and dynamic ensemble techniques. Dynamic ensembles are novel techniques that dynamically select the base classifiers for each new case. Medically, implementing an accurate machine learning model is insufficient to gain the trust of physicians. The model must be able to justify its decisions. While explainable AI (XAI) techniques can be used to handle this challenge, no studies have been done in this regard for neonate monitoring in the neonatal intensive care unit (NICU). This study utilizes advanced machine learning approaches to predict mortality and LoS through data-driven learning. We propose a multilayer dynamic ensemble-based model to predict mortality as a classification task and LoS as a regression task for neonates admitted to the NICU. The model has been built based on the patient's time-series data of the first 24 h in the NICU. We utilized a cohort of 3,133 infants from the MIMIC-III real dataset to build and optimize the selected algorithms. It has shown that the dynamic ensemble models achieved better results than other classifiers, and static ensemble regressors achieved better results than classical machine learning regressors. The proposed optimized model is supported by three well-known explainability techniques of SHAP, decision tree visualization, and rule-based system. To provide online assistance to physicians in monitoring and managing neonates in the NICU, we implemented a web-based clinical decision support system based on the most accurate models and selected XAI techniques. The code of the proposed models is publicly available at https://github.com/InfoLab-SKKU/neonateMortalityPrediction.


Assuntos
Algoritmos , Aprendizado de Máquina , Recém-Nascido , Humanos , Unidades de Terapia Intensiva , Unidades de Terapia Intensiva Neonatal , Tempo de Internação
8.
Sensors (Basel) ; 22(18)2022 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-36146299

RESUMO

Wind turbines are widely used worldwide to generate clean, renewable energy. The biggest issue with a wind turbine is reducing failures and downtime, which lowers costs associated with operations and maintenance. Wind turbines' consistency and timely maintenance can enhance their performance and dependability. Still, the traditional routine configuration makes detecting faults of wind turbines difficult. Supervisory control and data acquisition (SCADA) produces reliable and affordable quality data for the health condition of wind turbine operations. For wind power to be sufficiently reliable, it is crucial to retrieve useful information from SCADA successfully. This article proposes a new AdaBoost, K-nearest neighbors, and logistic regression-based stacking ensemble (AKL-SE) classifier to classify the faults of the wind turbine condition monitoring system. A stacking ensemble classifier integrates different classification models to enhance the model's accuracy. We have used three classifiers, AdaBoost, K-nearest neighbors, and logistic regression, as base models to make output. The output of these three classifiers is used as input in the logistic regression classifier's meta-model. To improve the data validity, SCADA data are first preprocessed by cleaning and removing any abnormal data. Next, the Pearson correlation coefficient was used to choose the input variables. The Stacking Ensemble classifier was trained using these parameters. The analysis demonstrates that the suggested method successfully identifies faults in wind turbines when applied to local 3 MW wind turbines. The proposed approach shows the potential for effective wind energy use, which could encourage the use of clean energy.

9.
Sensors (Basel) ; 22(22)2022 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-36433540

RESUMO

The goal of this research was to improve wetland classification by fully exploiting multi-source remotely sensed data. Three distinct classifiers were designed to distinguish individual or compound wetland categories using random forest (RF) classification. They were determined, in part, to best use the available remotely sensed features in order to maximize that information and to maximize classification accuracy. The results from these classifiers were integrated according to Dempster−Shafer theory (D−S theory). The developed method was tested on data collected from a study area in Northern Alberta, Canada. The data utilized were Landsat-8 and Sentinel-2 (multi-spectral), Sentinel-1 (synthetic aperture radar­SAR), and digital elevation model (DEM). Classification of fen, bog, marsh, swamps, and upland resulted in an overall accuracy of 0.93 using the proposed methodology, an improvement of 5% when compared to a traditional classification method based on the aggregated features from these data sources. It was noted that, with the traditional method, some pixels were misclassified with a high level of confidence (>85%). Such misclassification was significantly reduced (by ~10%) by the proposed method. Results also showed that some features important in separating compound wetland classes were not considered important using the traditional method based on the RF feature selection mechanism. When used in the proposed method, these features increased the classification accuracy, which demonstrated that the proposed method provided an effective means to fully employ available data to improve wetland classification.


Assuntos
Radar , Áreas Alagadas , Armazenamento e Recuperação da Informação , Canadá
10.
Sensors (Basel) ; 22(23)2022 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-36502183

RESUMO

Emotion charting using multimodal signals has gained great demand for stroke-affected patients, for psychiatrists while examining patients, and for neuromarketing applications. Multimodal signals for emotion charting include electrocardiogram (ECG) signals, electroencephalogram (EEG) signals, and galvanic skin response (GSR) signals. EEG, ECG, and GSR are also known as physiological signals, which can be used for identification of human emotions. Due to the unbiased nature of physiological signals, this field has become a great motivation in recent research as physiological signals are generated autonomously from human central nervous system. Researchers have developed multiple methods for the classification of these signals for emotion detection. However, due to the non-linear nature of these signals and the inclusion of noise, while recording, accurate classification of physiological signals is a challenge for emotion charting. Valence and arousal are two important states for emotion detection; therefore, this paper presents a novel ensemble learning method based on deep learning for the classification of four different emotional states including high valence and high arousal (HVHA), low valence and low arousal (LVLA), high valence and low arousal (HVLA) and low valence high arousal (LVHA). In the proposed method, multimodal signals (EEG, ECG, and GSR) are preprocessed using bandpass filtering and independent components analysis (ICA) for noise removal in EEG signals followed by discrete wavelet transform for time domain to frequency domain conversion. Discrete wavelet transform results in spectrograms of the physiological signal and then features are extracted using stacked autoencoders from those spectrograms. A feature vector is obtained from the bottleneck layer of the autoencoder and is fed to three classifiers SVM (support vector machine), RF (random forest), and LSTM (long short-term memory) followed by majority voting as ensemble classification. The proposed system is trained and tested on the AMIGOS dataset with k-fold cross-validation. The proposed system obtained the highest accuracy of 94.5% and shows improved results of the proposed method compared with other state-of-the-art methods.


Assuntos
Nível de Alerta , Emoções , Humanos , Emoções/fisiologia , Nível de Alerta/fisiologia , Análise de Ondaletas , Eletroencefalografia/métodos , Máquina de Vetores de Suporte
11.
Sensors (Basel) ; 22(21)2022 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-36366249

RESUMO

Rapid advancements in the medical field have drawn much attention to automatic emotion classification from EEG data. People's emotional states are crucial factors in how they behave and interact physiologically. The diagnosis of patients' mental disorders is one potential medical use. When feeling well, people work and communicate more effectively. Negative emotions can be detrimental to both physical and mental health. Many earlier studies that investigated the use of the electroencephalogram (EEG) for emotion classification have focused on collecting data from the whole brain because of the rapidly developing science of machine learning. However, researchers cannot understand how various emotional states and EEG traits are related. This work seeks to classify EEG signals' positive, negative, and neutral emotional states by using a stacking-ensemble-based classification model that boosts accuracy to increase the efficacy of emotion classification using EEG. The selected features are used to train a model that was created using a random forest, light gradient boosting machine, and gradient-boosting-based stacking ensemble classifier (RLGB-SE), where the base classifiers random forest (RF), light gradient boosting machine (LightGBM), and gradient boosting classifier (GBC) were used at level 0. The meta classifier (RF) at level 1 is trained using the results from each base classifier to acquire the final predictions. The suggested ensemble model achieves a greater classification accuracy of 99.55%. Additionally, while comparing performance indices, the suggested technique outperforms as compared with the base classifiers. Comparing the proposed stacking strategy to state-of-the-art techniques, it can be seen that the performance for emotion categorization is promising.


Assuntos
Eletroencefalografia , Emoções , Humanos , Eletroencefalografia/métodos , Emoções/fisiologia , Aprendizado de Máquina , Encéfalo
12.
Sensors (Basel) ; 23(1)2022 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-36617019

RESUMO

Visual analysis of an electroencephalogram (EEG) by medical professionals is highly time-consuming and the information is difficult to process. To overcome these limitations, several automated seizure detection strategies have been introduced by combining signal processing and machine learning. This paper proposes a hybrid optimization-controlled ensemble classifier comprising the AdaBoost classifier, random forest (RF) classifier, and the decision tree (DT) classifier for the automatic analysis of an EEG signal dataset to predict an epileptic seizure. The EEG signal is pre-processed initially to make it suitable for feature selection. The feature selection process receives the alpha, beta, delta, theta, and gamma wave data from the EEG, where the significant features, such as statistical features, wavelet features, and entropy-based features, are extracted by the proposed hybrid seek optimization algorithm. These extracted features are fed forward to the proposed ensemble classifier that produces the predicted output. By the combination of corvid and gregarious search agent characteristics, the proposed hybrid seek optimization technique has been developed, and is used to evaluate the fusion parameters of the ensemble classifier. The suggested technique's accuracy, sensitivity, and specificity are determined to be 96.6120%, 94.6736%, and 91.3684%, respectively, for the CHB-MIT database. This demonstrates the effectiveness of the suggested technique for early seizure prediction. The accuracy, sensitivity, and specificity of the proposed technique are 95.3090%, 93.1766%, and 90.0654%, respectively, for the Siena Scalp database, again demonstrating its efficacy in the early seizure prediction process.


Assuntos
Epilepsia , Convulsões , Humanos , Convulsões/diagnóstico , Epilepsia/diagnóstico , Eletroencefalografia/métodos , Processamento de Sinais Assistido por Computador , Algoritmos , Máquina de Vetores de Suporte
13.
Sensors (Basel) ; 22(19)2022 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-36236367

RESUMO

Diabetes is a chronic disease that continues to be a primary and worldwide health concern since the health of the entire population has been affected by it. Over the years, many academics have attempted to develop a reliable diabetes prediction model using machine learning (ML) algorithms. However, these research investigations have had a minimal impact on clinical practice as the current studies focus mainly on improving the performance of complicated ML models while ignoring their explainability to clinical situations. Therefore, the physicians find it difficult to understand these models and rarely trust them for clinical use. In this study, a carefully constructed, efficient, and interpretable diabetes detection method using an explainable AI has been proposed. The Pima Indian diabetes dataset was used, containing a total of 768 instances where 268 are diabetic, and 500 cases are non-diabetic with several diabetic attributes. Here, six machine learning algorithms (artificial neural network (ANN), random forest (RF), support vector machine (SVM), logistic regression (LR), AdaBoost, XGBoost) have been used along with an ensemble classifier to diagnose the diabetes disease. For each machine learning model, global and local explanations have been produced using the Shapley additive explanations (SHAP), which are represented in different types of graphs to help physicians in understanding the model predictions. The balanced accuracy of the developed weighted ensemble model was 90% with a F1 score of 89% using a five-fold cross-validation (CV). The median values were used for the imputation of the missing values and the synthetic minority oversampling technique (SMOTETomek) was used to balance the classes of the dataset. The proposed approach can improve the clinical understanding of a diabetes diagnosis and help in taking necessary action at the very early stages of the disease.


Assuntos
Diabetes Mellitus , Iodeto de Potássio , Diabetes Mellitus/diagnóstico , Humanos , Modelos Logísticos , Aprendizado de Máquina , Redes Neurais de Computação
14.
Molecules ; 27(22)2022 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-36431973

RESUMO

In recent years, single-cell RNA sequencing technology (scRNA-seq) has developed rapidly and has been widely used in biological and medical research, such as in expression heterogeneity and transcriptome dynamics of single cells. The investigation of RNA velocity is a new topic in the study of cellular dynamics using single-cell RNA sequencing data. It can recover directional dynamic information from single-cell transcriptomics by linking measurements to the underlying dynamics of gene expression. Predicting the RNA velocity vector of each cell based on its gene expression data and formulating RNA velocity prediction as a classification problem is a new research direction. In this paper, we develop a cascade forest model to predict RNA velocity. Compared with other popular ensemble classifiers, such as XGBoost, RandomForest, LightGBM, NGBoost, and TabNet, it performs better in predicting RNA velocity. This paper provides guidance for researchers in selecting and applying appropriate classification tools in their analytical work and suggests some possible directions for future improvement of classification tools.


Assuntos
Pesquisa Biomédica , RNA , Humanos , RNA/genética , Análise de Sequência de RNA , Transcriptoma , Pesquisadores
15.
J Biomed Inform ; 118: 103803, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33965639

RESUMO

The importance of automating the diagnosis of Alzheimer disease (AD) towards facilitating its early prediction has long been emphasized, hampered in part by lack of empirical support. Given the evident association of AD with age and the increasing aging population owing to the general well-being of individuals, there have been unprecedented estimated economic complications. Consequently, many recent studies have attempted to employ the language deficiency caused by cognitive decline in automating the diagnostic task via training machine learning (ML) algorithms with linguistic patterns and deficits. In this study, we aim to develop multiple heterogeneous stacked fusion models that harness the advantages of several base learning algorithms to improve the overall generalizability and robustness of AD diagnostic ML models, where we parallelly utilized two different written and spoken-based datasets to train our stacked fusion models. Further, we examined the effect of linking these two datasets to develop a hybrid stacked fusion model that can predict AD from written and spoken languages. Our feature spaces involved two widely used linguistic patterns: lexicosyntactics and character n-gram spaces. We firstly investigated lexicosyntactics of AD alongside healthy controls (HC), where we explored a few new lexicosyntactic features, then optimized the lexicosyntactic feature space by proposing a correlation feature selection technique that eliminates features based on their feature-feature inter-correlations and feature-target correlations according to a certain threshold. Our stacked fusion models establish benchmarks on both datasets with AUC of 98.1% and 99.47% for the spoken and written-based datasets, respectively, and corresponding accuracy and F1 score values around 95% on spoken-based dataset and around 97% on the written-based dataset. Likewise, the hybrid stacked fusion model on linked data presents an optimal performance with 99.2% AUC as well as accuracy and F1 score falling around 97%. In view of the achieved performance and enhanced generalizability of such fusion models over single classifiers, this study suggests replacing the initial traditional screening test with such models that can be embedded into an online format for a fully automated remote diagnosis.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Idoso , Algoritmos , Doença de Alzheimer/diagnóstico , Disfunção Cognitiva/diagnóstico , Humanos , Idioma , Aprendizado de Máquina , Imageamento por Ressonância Magnética
16.
BMC Med Inform Decis Mak ; 21(1): 1, 2021 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-33388057

RESUMO

BACKGROUND: Intrauterine Insemination (IUI) outcome prediction is a challenging issue which the assisted reproductive technology (ART) practitioners are dealing with. Predicting the success or failure of IUI based on the couples' features can assist the physicians to make the appropriate decision for suggesting IUI to the couples or not and/or continuing the treatment or not for them. Many previous studies have been focused on predicting the in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) outcome using machine learning algorithms. But, to the best of our knowledge, a few studies have been focused on predicting the outcome of IUI. The main aim of this study is to propose an automatic classification and feature scoring method to predict intrauterine insemination (IUI) outcome and ranking the most significant features. METHODS: For this purpose, a novel approach combining complex network-based feature engineering and stacked ensemble (CNFE-SE) is proposed. Three complex networks are extracted considering the patients' data similarities. The feature engineering step is performed on the complex networks. The original feature set and/or the features engineered are fed to the proposed stacked ensemble to classify and predict IUI outcome for couples per IUI treatment cycle. Our study is a retrospective study of a 5-year couples' data undergoing IUI. Data is collected from Reproductive Biomedicine Research Center, Royan Institute describing 11,255 IUI treatment cycles for 8,360 couples. Our dataset includes the couples' demographic characteristics, historical data about the patients' diseases, the clinical diagnosis, the treatment plans and the prescribed drugs during the cycles, semen quality, laboratory tests and the clinical pregnancy outcome. RESULTS: Experimental results show that the proposed method outperforms the compared methods with Area under receiver operating characteristics curve (AUC) of 0.84 ± 0.01, sensitivity of 0.79 ± 0.01, specificity of 0.91 ± 0.01, and accuracy of 0.85 ± 0.01 for the prediction of IUI outcome. CONCLUSIONS: The most important predictors for predicting IUI outcome are semen parameters (sperm motility and concentration) as well as female body mass index (BMI).


Assuntos
Análise do Sêmen , Motilidade dos Espermatozoides , Feminino , Fertilização in vitro , Humanos , Inseminação , Masculino , Gravidez , Taxa de Gravidez , Estudos Retrospectivos
17.
Sensors (Basel) ; 20(11)2020 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-32503198

RESUMO

For the agricultural food production sector, the control and assessment of food quality is an essential issue, which has a direct impact on both human health and the economic value of the product. One of the fundamental properties from which the quality of the food can be derived is the smell of the product. A significant trend in this context is machine olfaction or the automated simulation of the sense of smell using a so-called electronic nose or e-nose. Hereby, many sensors are used to detect compounds, which define the odors and herewith the quality of the product. The proper assessment of the food quality is based on the correct functioning of the adopted sensors. Unfortunately, sensors may fail to provide the correct measures due to, for example, physical aging or environmental factors. To tolerate this problem, various approaches have been applied, often focusing on correcting the input data from the failed sensor. In this study, we adopt an alternative approach and propose machine learning-based failure tolerance that ignores failed sensors. To tolerate for the failed sensor and to keep the overall prediction accuracy acceptable, a Single Plurality Voting System (SPVS) classification approach is used. Hereby, single classifiers are trained by each feature and based on the outcome of these classifiers, and a composed classifier is built. To build our SPVS-based technique, K-Nearest Neighbor (kNN), Decision Tree, and Linear Discriminant Analysis (LDA) classifiers are applied as the base classifiers. Our proposed approach has a clear advantage over traditional machine learning models since it can tolerate the sensor failure or other types of failures by ignoring and thus enhance the assessment of food quality. To illustrate our approach, we use the case study of beef cut quality assessment. The experiments showed promising results for beef cut quality prediction in particular, and food quality assessment in general.


Assuntos
Algoritmos , Análise de Alimentos/métodos , Qualidade dos Alimentos , Aprendizado de Máquina , Análise por Conglomerados , Nariz Eletrônico
18.
Molecules ; 25(19)2020 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-32977371

RESUMO

Study of interface residue pairs is important for understanding the interactions between monomers inside a trimer protein-protein complex. We developed a two-layer support vector machine (SVM) ensemble-classifier that considers physicochemical and geometric properties of amino acids and the influence of surrounding amino acids. Different descriptors and different combinations may give different prediction results. We propose feature combination engineering based on correlation coefficients and F-values. The accuracy of our method is 65.38% in independent test set, indicating biological significance. Our predictions are consistent with the experimental results. It shows the effectiveness and reliability of our method to predict interface residue pairs of protein trimers.


Assuntos
Biologia Computacional/métodos , Multimerização Proteica , Proteínas/química , Máquina de Vetores de Suporte , Estrutura Quaternária de Proteína
19.
J Theor Biol ; 464: 1-8, 2019 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-30578798

RESUMO

Drug target interaction prediction is a very labor-intensive and expensive experimental process which has motivated researchers to focus on in silico prediction to provide information on potential interaction. In recent years, researchers have proposed several computational approaches for predicting new drug target interactions. In this paper, we present CFSBoost, a simple and computationally cheap ensemble boosting classification model for identification and prediction of drug-target interactions using evolutionary and structural features. CFSBoost uses a simple yet novel feature group selection procedure which allows the model to be computationally very cheap while being able to achieve state of the art performance. The ensemble model uses extra tree as weak learners inside a boosting scheme while holding on to the best model per iteration. We tested our method of four benchmark datasets, which are also referred as gold standard datasets. Our method was able to achieve better score in terms of area under receiver operating characteristic (auROC) curve on 2 out of the 4 datasets. It was also able to achieve higher area under precision recall (auPR) curve on 3 out of the 4 datasets. It has been argued by researchers that auPR metric is more suitable than auROC for comparison of performance on imbalanced datasets such our benchmark datasets. Our reported result shows that, despite of its simplicity in design, CFSBoost's performance is very satisfactory comparing to other literatures. We also provide 5 new possible interactions for each dataset based on CFSBoost's prediction score.


Assuntos
Algoritmos , Biologia Computacional , Simulação por Computador , Descoberta de Drogas , Modelos Químicos , Humanos
20.
Arch Gynecol Obstet ; 300(6): 1565-1582, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31650230

RESUMO

PURPOSE: High rate of preterm birth (birth before 37 weeks of gestation) in the world, its negative outcomes for pregnant women and newborns necessitate to predict preterm birth and identify its main risk factors. Premature deliveries have been divided into provider-initiated (with medical intervention for early terminating the pregnancy) and spontaneous preterm birth (without any intervention) categories in the previous studies. The main aim of this study is proposing methods for prediction of provider-initiated preterm birth and spontaneous premature deliveries and ranking the predictive features. METHODS: Data from national databank of Maternal and neonatal records (IMAN registry) is used in the study. The collected data have information about more than 1,400,000 deliveries with 112 features. Among them, 116,080 preterm births have occurred (from which 11,799 and 104,281 cases belong to provider-initiated preterm birth and spontaneous premature delivery, respectively). The data can be considered as big data due to its large number of data records, large number of the features and unbalanced distribution of the data between three classes of term, provider-initiated and spontaneous preterm birth. Therefore, we need to analyze data based on big data algorithms. In this paper, Map Reduce-based machine learning algorithms named MR-PB-PFS are proposed for this purpose. Map phase use parallel feature selection and classification methods to score the features. Reduce phase aggregates the feature scores obtained in Map phase and assign final scores to the features. Moreover, the classifiers trained in Map phase are aggregated based on two different ensemble rules in Reduce phase. RESULTS: Experimental results show that the best performance of the proposed models for preterm birth prediction is accuracy of 81% and the area under the receiver operating characteristic curve (AUC) of 68%. Top features for predicting term, provider-initiated preterm and spontaneous premature birth identified in this study are having pregnancy risk factors, having gestational diabetes, having cardiovascular disease, maternal underlying diseases, and mother age. Chronic blood pressure is a high rank feature for preterm birth prediction and father nationality is highly important for discriminating provider-initiated from spontaneous premature delivery. CONCLUSIONS: Identifying the pregnant women with high risk of spontaneous premature or therapeutic preterm delivery in our proposed model can help them to: (1) reduce the probability of premature birth with monitoring and management of the main risk factors and/or (2) educate them to care from the premature newborn. Management and monitoring top features discriminating term, provider-initiated preterm and spontaneous premature birth or their associated factors can reduce preterm labor or its negative outcomes.


Assuntos
Big Data , Trabalho de Parto Prematuro/epidemiologia , Nascimento Prematuro/epidemiologia , Adulto , Árvores de Decisões , Feminino , Idade Gestacional , Humanos , Recém-Nascido , Gravidez , Complicações na Gravidez , Estudos Retrospectivos , Fatores de Risco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA