Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
BMC Med Inform Decis Mak ; 24(1): 269, 2024 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-39334295

RESUMO

Parkinson's disease (PD) is classified as a neurological, progressive illness brought on by cell death in the posterior midbrain. Early PD detection will assist doctors in reducing the disease's consequences. A collection of skilled models that may be applied to regression as well as classification is known as artificial intelligence (AI). PD can be detected using a variety of dataset formats, including text, speech, and picture datasets. For the purpose of classifying Parkinson's disease, this study suggests merging deep with machine learning recognition approaches. The three primary components of the suggested approach are designed to enhance the accuracy of Parkinson's disease early diagnosis. These sections cover the topics of categorising, combining, and separating. Convolutional Neural Networks (CNN) as well as attention procedures are used to create feature extractors. The related motion signals are fed to a combination of convolutional neural network and long-short-memory model for feature extraction. Besides, for the classification of patients from non-suffers of Parkinson's disease, Random Forest, Logistic Regression, Support Vector Machine, Extreme Boot Classifier, and voting classifier were used. Our result shows that for the PD handwriting and related motion datasets, using the proposed CNN with an attention and voting classifier yields 99.95% accuracy, 99.99% precision, 99.98% sensitivity, and 99.95% F1-score. Based on these results, it is warranted to conclude that the proposed methodology of feature extraction from photos of handwriting and relating motor symptoms, fusing of those features, and following it with a voting classifier yields excellent results for PD classification.


Assuntos
Redes Neurais de Computação , Doença de Parkinson , Doença de Parkinson/classificação , Doença de Parkinson/diagnóstico , Humanos , Aprendizado de Máquina , Diagnóstico Precoce , Escrita Manual , Votação
2.
Curr Genomics ; 23(2): 83-93, 2022 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-36778978

RESUMO

Background: DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origin of replication, a better and accurate prediction of the origins of replication sites (ORIs) is essential to gain insight into the relationship with gene expression. Objective: In this study, we have developed an efficient predictor called iORI-LAVT for ORIs identification. Methods: This work focuses on extracting feature information from three aspects, including mono-nucleotide encoding, k-mer and ring-function-hydrogen-chemical properties. Subsequently, least absolute shrinkage and selection operator (LASSO) as a feature selection is applied to select the optimal features. Comparing the different combined soft voting classifiers results, the soft voting classifier based on GaussianNB and Logistic Regression is employed as the final classifier. Results: Based on 10-fold cross-validation test, the prediction accuracies of two benchmark datasets are 90.39% and 95.96%, respectively. As for the independent dataset, our method achieves high accuracy of 91.3%. Conclusion: Compared with previous predictors, iORI-LAVT outperforms the existing methods. It is believed that iORI-LAVT predictor is a promising alternative for further research on identifying ORIs.

3.
Medicina (Kaunas) ; 58(12)2022 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-36556946

RESUMO

Background and Objectives: Recently, many studies have focused on the early diagnosis of coronary artery disease (CAD), which is one of the leading causes of cardiac-associated death worldwide. The effectiveness of the most important features influencing disease diagnosis determines the performance of machine learning systems that can allow for timely and accurate treatment. We performed a Hybrid ML framework based on hard ensemble voting optimization (HEVO) to classify patients with CAD using the Z-Alizadeh Sani dataset. All categorical features were converted to numerical forms, the synthetic minority oversampling technique (SMOTE) was employed to overcome imbalanced distribution between two classes in the dataset, and then, recursive feature elimination (RFE) with random forest (RF) was used to obtain the best subset of features. Materials and Methods: After solving the biased distribution in the CAD data set using the SMOTE method and finding the high correlation features that affected the classification of CAD patients. The performance of the proposed model was evaluated using grid search optimization, and the best hyperparameters were identified for developing four applications, namely, RF, AdaBoost, gradient-boosting, and extra trees based on an HEV classifier. Results: Five fold cross-validation experiments with the HEV classifier showed excellent prediction performance results with the 10 best balanced features obtained using SMOTE and feature selection. All evaluation metrics results reached > 98% with the HEV classifier, and the gradient-boosting model was the second best classification model with accuracy = 97% and F1-score = 98%. Conclusions: When compared to modern methods, the proposed method perform well in diagnosing coronary artery disease, and therefore, the proposed method can be used by medical personnel for supplementary therapy for timely, accurate, and efficient identification of CAD cases in suspected patients.


Assuntos
Algoritmos , Doença da Artéria Coronariana , Humanos , Doença da Artéria Coronariana/diagnóstico , Aprendizado de Máquina
4.
Cluster Comput ; : 1-26, 2022 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-36471703

RESUMO

It is difficult to manage massive amounts of data in an overlying environment with a single server. Therefore, it is necessary to comprehend the security provisions for erratic data in a dynamic environment. The authors are concerned about the security risk of vulnerable data in a Mobile Edge based distributive environment. As a result, edge computing appears to be an excellent perspective in which training can be done in an Edge-based environment. The combination of Edge computing and consensus approach of Blockchain in conjunction with machine learning techniques can further improve data security, mitigate the possibility of exposed data, and it reduces the risk of a data breach. As a result, the concept of federated learning provides a path for training the shared data. A dataset was collected that contained several vulnerable, exposed, recovered, and secured data and data security was precepted under the surveillance of two-factor authentication. This paper discusses the evolution of data and security flaws and their corresponding solutions in smart edge computing devices. The proposed model incorporates data security using consensus approach of Blockchain and machine learning techniques that include several classifiers and optimization techniques. Further, the authors applied the proposed algorithms in an edge computing environment by distributing several batches of data to different clients. As a result, the client privacy was maintained by using Blockchain servers. Furthermore, the authors segregated the client data into batches that were trained using the federated learning technique. The results obtained in this paper demonstrate the implementation of a Blockchain-based training model in an edge-based computing environment.

5.
Sensors (Basel) ; 21(14)2021 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-34300655

RESUMO

American foulbrood is a dangerous disease of bee broods found worldwide, caused by the Paenibacillus larvae larvae L. bacterium. In an experiment, the possibility of detecting colonies of this bacterium on MYPGP substrates (which contains yeast extract, Mueller-Hinton broth, glucose, K2HPO4, sodium pyruvate, and agar) was tested using a prototype of a multi-sensor recorder of the MCA-8 sensor signal with a matrix of six semiconductors: TGS 823, TGS 826, TGS 832, TGS 2600, TGS 2602, and TGS 2603 from Figaro. Two twin prototypes of the MCA-8 measurement device, M1 and M2, were used in the study. Each prototype was attached to two laboratory test chambers: a wooden one and a polystyrene one. For the experiment, the strain used was P. l. larvae ATCC 9545, ERIC I. On MYPGP medium, often used for laboratory diagnosis of American foulbrood, this bacterium produces small, transparent, smooth, and shiny colonies. Gas samples from over culture media of one- and two-day-old foulbrood P. l. larvae (with no colonies visible to the naked eye) and from over culture media older than 2 days (with visible bacterial colonies) were examined. In addition, the air from empty chambers was tested. The measurement time was 20 min, including a 10-min testing exposure phase and a 10-min sensor regeneration phase. The results were analyzed in two variants: without baseline correction and with baseline correction. We tested 14 classifiers and found that a prototype of a multi-sensor recorder of the MCA-8 sensor signal was capable of detecting colonies of P. l. larvae on MYPGP substrate with a 97% efficiency and could distinguish between MYPGP substrates with 1-2 days of culture, and substrates with older cultures. The efficacy of copies of the prototypes M1 and M2 was shown to differ slightly. The weighted method with Canberra metrics (Canberra.811) and kNN with Canberra and Manhattan metrics (Canberra. 1nn and manhattan.1nn) proved to be the most effective classifiers.


Assuntos
Semicondutores , Animais , Abelhas , Meios de Cultura , Larva , Estados Unidos
6.
Sensors (Basel) ; 20(1)2020 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-31935953

RESUMO

Machine/Deep Learning (ML/DL) techniques have been applied to large data sets in order to extract relevant information and for making predictions. The performance and the outcomes of different ML/DL algorithms may vary depending upon the data sets being used, as well as on the suitability of algorithms to the data and the application domain under consideration. Hence, determining which ML/DL algorithm is most suitable for a specific application domain and its related data sets would be a key advantage. To respond to this need, a comparative analysis of well-known ML/DL techniques, including Multilayer Perceptron, K-Nearest Neighbors, Decision Tree, Random Forest, and Voting Classifier (or the Ensemble Learning Approach) for the prediction of parking space availability has been conducted. This comparison utilized Santander's parking data set, initiated while working on the H2020 WISE-IoT project. The data set was used in order to evaluate the considered algorithms and to determine the one offering the best prediction. The results of this analysis show that, regardless of the data set size, the less complex algorithms like Decision Tree, Random Forest, and KNN outperform complex algorithms such as Multilayer Perceptron, in terms of higher prediction accuracy, while providing comparable information for the prediction of parking space availability. In addition, in this paper, we are providing Top-K parking space recommendations on the basis of distance between current position of vehicles and free parking spots.

7.
Comput Biol Med ; 168: 107724, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-37989075

RESUMO

BACKGROUND: The most commonly used therapy currently for inflammatory and autoimmune diseases is nonspecific anti-inflammatory drugs, which have various hazardous side effects. Recently, some anti-inflammatory peptides (AIPs) have been found to be a substitute therapy for inflammatory diseases like rheumatoid arthritis and Alzheimer's. Therefore, the identification of these AIPs is an emerging topic that is equally important. METHODS: In this work, we have proposed an identification model for AIPs using a voting classifier. We used eight different feature descriptors and five conventional machine-learning classifiers. The eight feature encodings were concatenated to get a hybrid feature set. The five baseline models trained on the hybrid feature set were integrated via a voting classifier. Finally, a feature selection algorithm was used to select the optimal feature set for the construction of our final model, named IF-AIP. RESULTS: We tested the proposed model on two independent datasets. On independent data 1, the IF-AIP model shows an improvement of 3%-5.6% in terms of accuracies and 6.7%-10.8% in terms of MCC compared to the existing methods. On the independent dataset 2, our model IF-AIP shows an overall improvement of 2.9%-5.7% in terms of accuracy and 8.3%-8.6% in terms of MCC score compared to the existing methods. A comparative performance analysis was conducted between the proposed model and existing methods using a set of 24 novel peptide sequences. Notably, the IF-AIP method exhibited exceptional accuracy, correctly identifying all 24 peptides as AIPs. The source code, pre-trained models, and all datasets are made available at https://github.com/Mir-Saima/IF-AIP.


Assuntos
Aprendizado de Máquina , Peptídeos , Algoritmos , Anti-Inflamatórios/análise , Software
8.
Front Endocrinol (Lausanne) ; 15: 1345573, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38919479

RESUMO

Introduction: Preeclampsia is a disease with an unknown pathogenesis and is one of the leading causes of maternal and perinatal morbidity. At present, early identification of high-risk groups for preeclampsia and timely intervention with aspirin is an effective preventive method against preeclampsia. This study aims to develop a robust and effective preeclampsia prediction model with good performance by machine learning algorithms based on maternal characteristics, biophysical and biochemical markers at 11-13 + 6 weeks' gestation, providing an effective tool for early screening and prediction of preeclampsia. Methods: This study included 5116 singleton pregnant women who underwent PE screening and fetal aneuploidy from a prospective cohort longitudinal study in China. Maternal characteristics (such as maternal age, height, pre-pregnancy weight), past medical history, mean arterial pressure, uterine artery pulsatility index, pregnancy-associated plasma protein A, and placental growth factor were collected as the covariates for the preeclampsia prediction model. Five classification algorithms including Logistic Regression, Extra Trees Classifier, Voting Classifier, Gaussian Process Classifier and Stacking Classifier were applied for the prediction model development. Five-fold cross-validation with an 8:2 train-test split was applied for model validation. Results: We ultimately included 49 cases of preterm preeclampsia and 161 cases of term preeclampsia from the 4644 pregnant women data in the final analysis. Compared with other prediction algorithms, the AUC and detection rate at 10% FPR of the Voting Classifier algorithm showed better performance in the prediction of preterm preeclampsia (AUC=0.884, DR at 10%FPR=0.625) under all covariates included. However, its performance was similar to that of other model algorithms in all PE and term PE prediction. In the prediction of all preeclampsia, the contribution of PLGF was higher than PAPP-A (11.9% VS 8.7%), while the situation was opposite in the prediction of preterm preeclampsia (7.2% VS 16.5%). The performance for preeclampsia or preterm preeclampsia using machine learning algorithms was similar to that achieved by the fetal medicine foundation competing risk model under the same predictive factors (AUCs of 0.797 and 0.856 for PE and preterm PE, respectively). Conclusions: Our models provide an accessible tool for large-scale population screening and prediction of preeclampsia, which helps reduce the disease burden and improve maternal and fetal outcomes.


Assuntos
Aprendizado de Máquina , Pré-Eclâmpsia , Humanos , Feminino , Gravidez , Pré-Eclâmpsia/diagnóstico , Adulto , China/epidemiologia , Estudos Prospectivos , Estudos de Coortes , Estudos Longitudinais , Biomarcadores/sangue , Algoritmos , Fatores de Risco , Prognóstico , Fator de Crescimento Placentário/sangue
9.
Stud Health Technol Inform ; 310: 1462-1463, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269697

RESUMO

Cardiac arrest prediction for multivariate time series data have been developed and obtained high precision performance. However, these algorithms still did not achieved high sensitivity and suffer from a high false-alarm. Therefore, we propose a ensemble approach for prediction satisfying precision-recall result compared than other machine learning methods. As a result, our proposed method obtained an overall area under precision-recall curve of 46.7%. It is possible to more accurately respond rapidly cardiac arrest event.


Assuntos
Algoritmos , Parada Cardíaca , Humanos , Parada Cardíaca/diagnóstico , Aprendizado de Máquina , Fatores de Tempo , Hospitais
10.
PeerJ Comput Sci ; 9: e1684, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38077612

RESUMO

The main cause of stroke is the unexpected blockage of blood flow to the brain. The brain cells die if blood is not supplied to them, resulting in body disability. The timely identification of medical conditions ensures patients receive the necessary treatments and assistance. This early diagnosis plays a crucial role in managing symptoms effectively and enhancing the overall quality of life for individuals affected by the stroke. The research proposed an ensemble machine learning (ML) model that predicts brain stroke while reducing parameters and computational complexity. The dataset was obtained from an open-source website Kaggle and the total number of participants is 3,254. However, this dataset needs a significant class imbalance problem. To address this issue, we utilized Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADAYSN), a technique for oversampling issues. The primary focus of this study centers around developing a stacking and voting approach that exhibits exceptional performance. We propose a stacking ensemble classifier that is more accurate and effective in predicting stroke disease in order to improve the classifier's performance and minimize overfitting problems. To create a final stronger classifier, the study used three tree-based ML classifiers. Hyperparameters are used to train and fine-tune the random forest (RF), decision tree (DT), and extra tree classifier (ETC), after which they were combined using a stacking classifier and a k-fold cross-validation technique. The effectiveness of this method is verified through the utilization of metrics such as accuracy, precision, recall, and F1-score. In addition, we utilized nine ML classifiers with Hyper-parameter tuning to predict the stroke and compare the effectiveness of Proposed approach with these classifiers. The experimental outcomes demonstrated the superior performance of the stacking classification method compared to other approaches. The stacking method achieved a remarkable accuracy of 100% as well as exceptional F1-score, precision, and recall score. The proposed approach demonstrates a higher rate of accurate predictions compared to previous techniques.

11.
Front Bioeng Biotechnol ; 11: 1336255, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38260734

RESUMO

Introduction: Dementia is a condition (a collection of related signs and symptoms) that causes a continuing deterioration in cognitive function, and millions of people are impacted by dementia every year as the world population continues to rise. Conventional approaches for determining dementia rely primarily on clinical examinations, analyzing medical records, and administering cognitive and neuropsychological testing. However, these methods are time-consuming and costly in terms of treatment. Therefore, this study aims to present a noninvasive method for the early prediction of dementia so that preventive steps should be taken to avoid dementia. Methods: We developed a hybrid diagnostic system based on statistical and machine learning (ML) methods that used patient electronic health records to predict dementia. The dataset used for this study was obtained from the Swedish National Study on Aging and Care (SNAC), with a sample size of 43040 and 75 features. The newly constructed diagnostic extracts a subset of useful features from the dataset through a statistical method (F-score). For the classification, we developed an ensemble voting classifier based on five different ML models: decision tree (DT), naive Bayes (NB), logistic regression (LR), support vector machines (SVM), and random forest (RF). To address the problem of ML model overfitting, we used a cross-validation approach to evaluate the performance of the proposed diagnostic system. Various assessment measures, such as accuracy, sensitivity, specificity, receiver operating characteristic (ROC) curve, and Matthew's correlation coefficient (MCC), were used to thoroughly validate the devised diagnostic system's efficiency. Results: According to the experimental results, the proposed diagnostic method achieved the best accuracy of 98.25%, as well as sensitivity of 97.44%, specificity of 95.744%, and MCC of 0.7535. Discussion: The effectiveness of the proposed diagnostic approach is compared to various cutting-edge feature selection techniques and baseline ML models. From experimental results, it is evident that the proposed diagnostic system outperformed the prior feature selection strategies and baseline ML models regarding accuracy.

12.
Genes (Basel) ; 14(9)2023 09 14.
Artigo em Inglês | MEDLINE | ID: mdl-37761941

RESUMO

Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods.


Assuntos
Neoplasias , Transcriptoma , Transcriptoma/genética , Perfilação da Expressão Gênica , Algoritmos , Benchmarking , Análise por Conglomerados , Neoplasias/diagnóstico , Neoplasias/genética
13.
Complex Intell Systems ; 9(3): 2879-2891, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35194546

RESUMO

COVID-19 has caused havoc globally due to its transmission pace among the inhabitants and prolific rise in the number of people contracting the disease worldwide. As a result, the number of people seeking information about the epidemic via Internet media has increased. The impact of the hysteria that has prevailed makes people believe and share everything related to illness without questioning its truthfulness. As a result, it has amplified the misinformation spread on social media networks about the disease. Today, there is an immediate need to restrict disseminating false news, even more than ever before. This paper presents an early fusion-based method for combining key features extracted from context-based embeddings such as BERT, XLNet, and ELMo to enhance context and semantic information collection from social media posts and achieve higher accuracy for false news identification. From the observation, we found that the proposed early fusion-based method outperforms models that work on single embeddings. We also conducted detailed studies using several machine learning and deep learning models to classify misinformation on social media platforms relevant to COVID-19. To facilitate our work, we have utilized the dataset of "CONSTRAINT shared task 2021". Our research has shown that language and ensemble models are well adapted to this role, with a 97% accuracy.

14.
PeerJ Comput Sci ; 9: e1631, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38077602

RESUMO

Background: Tooth decay, also known as dental caries, is a common oral health problem that requires early diagnosis and treatment to prevent further complications. It is a chronic disease that causes the gradual breakdown of the tooth's hard tissues, primarily due to the interaction of bacteria and dietary sugars. Results: While numerous investigations have focused on addressing this issue using image-based datasets, the outcomes have revealed limitations in their effectiveness. In a novel approach, this study focuses on feature-based datasets, coupled with the strategic integration of Principle Component Analysis (PCA) and Chi-square (chi2) for robust feature engineering. In the proposed model, features are generated using PCA, utilizing a voting classifier ensemble consisting of Extreme Gradient Boosting (XGB), Random Forest (RF), and Extra Trees Classifier (ETC) algorithms. Discussion: Extensive experiments were conducted to compare the proposed approach with the chi2 features and machine learning models to evaluate its efficacy for tooth caries detection. The results showed that the proposed voting classifier using PCA features outperformed the other approaches, achieving an accuracy, precision, recall, and F1 score of 97.36%, 96.14%, 96.84%, and 96.65%, respectively. Conclusion: The study demonstrates that the utilization of feature-based datasets and PCA-based feature engineering, along with a voting classifier ensemble, significantly improves tooth caries detection accuracy compared to image-based approaches. The achieved high accuracy, precision, recall, and F1 score emphasize the potential of the proposed model for effective dental caries detection. This study provides new insights into the potential of innovative methodologies to improve dental healthcare by evaluating their effectiveness in addressing prevalent oral health issues.

15.
Comput Biol Chem ; 107: 107973, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37926049

RESUMO

Cardiotocography (CTG) captured the fetal heart rate and the timing of uterine contractions. Throughout pregnancy, CTG intelligent categorization is crucial for monitoring fetal health and preserving proper fetal growth and development. Since CTG provides information on the fetal heartbeat and uterus contractions, which helps determine if the fetus is pathologic or not, obstetricians frequently use it to evaluate a child's physical health during pregnancy. In the past, obstetricians have artificially analyzed CTG data, which is time-consuming and inaccurate. So, developing a fetal health categorization model is crucial as it may help to speed up the diagnosis and treatment and conserve medical resources. The CTG dataset is used in this study. To diagnose the illness, 7 machine learning models are employed, as well as ensemble strategies including voting and stacking classifiers. In order to choose and extract the most significant and critical attributes from the dataset, Feature Selection (FS) techniques like ANOVA and Chi-square, as well as Feature Extraction (FE) strategies like Principal Component Analysis (PCA) and Independent Component Analysis (ICA), are being used. We used the Synthetic Minority Oversampling Technique (SMOTE) approach to balance the dataset because it is unbalanced. In order to forecast the illness, the top 5 models are selected, and these 5 models are used in ensemble methods such as voting and stacking classifiers. The utilization of Stacking Classifiers (SC), which involve Adaboost and Random Forest (RF) as meta-classifiers for disease detection. The performance of the proposed SC with meta-classifier as RF model, which incorporates Chi-square with PCA, outperformed all other state-of-the-art models, achieving scores of 98.79%,98.88%,98.69%,96.32%, and 98.77% for accuracy, precision, recall, specificity, and f1-score respectively.


Assuntos
Cardiotocografia , Feto , Gravidez , Feminino , Criança , Humanos , Cardiotocografia/métodos , Frequência Cardíaca Fetal/fisiologia , Algoritmo Florestas Aleatórias , Aprendizado de Máquina
16.
Comput Biol Med ; 163: 107134, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37379617

RESUMO

Impaired relaxation of cardiomyocytes leads to diastolic dysfunction in the left ventricle. Relaxation velocity is regulated in part by intracellular calcium (Ca2+) cycling, and slower outflux of Ca2+ during diastole translates to reduced relaxation velocity of sarcomeres. Sarcomere length transient and intracellular calcium kinetics are integral parts of characterizing the relaxation behavior of the myocardium. However, a classifier tool that can separate normal cells from cells with impaired relaxation using sarcomere length transient and/or calcium kinetics remains to be developed. In this work, we employed nine different classifiers to classify normal and impaired cells, using ex-vivo measurements of sarcomere kinematics and intracellular calcium kinetics data. The cells were isolated from wild-type mice (referred to as normal) and transgenic mice expressing impaired left ventricular relaxation (referred to as impaired). We utilized sarcomere length transient data with a total of n = 126 cells (n = 60 normal cells and n = 66 impaired cells) and intracellular calcium cycling measurements with a total of n = 116 cells (n = 57 normal cells and n = 59 impaired cells) from normal and impaired cardiomyocytes as inputs to machine learning (ML) models for classification. We trained all ML classifiers with cross-validation method separately using both sets of input features, and compared their performance metrics. The performance of classifiers on test data showed that our soft voting classifier outperformed all other individual classifiers on both sets of input features, with 0.94 and 0.95 area under the receiver operating characteristic curves for sarcomere length transient and calcium transient, respectively, while multilayer perceptron achieved comparable scores of 0.93 and 0.95, respectively. However, the performance of decision tree, and extreme gradient boosting was found to be dependent on the set of input features used for training. Our findings highlight the importance of selecting appropriate input features and classifiers for the accurate classification of normal and impaired cells. Layer-wise relevance propagation (LRP) analysis demonstrated that the time to 50% contraction of the sarcomere had the highest relevance score for sarcomere length transient, whereas time to 50% decay of calcium had the highest relevance score for calcium transient input features. Despite the limited dataset, our study demonstrated satisfactory accuracy, suggesting that the algorithm can be used to classify relaxation behavior in cardiomyocytes when the potential relaxation impairment of the cells is unknown.


Assuntos
Cálcio , Sarcômeros , Camundongos , Animais , Contração Miocárdica , Miócitos Cardíacos , Aprendizado de Máquina
17.
Cancer Inform ; 10: 133-47, 2011 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-21584263

RESUMO

With technological advances now allowing measurement of thousands of genes, proteins and metabolites, researchers are using this information to develop diagnostic and prognostic tests and discern the biological pathways underlying diseases. Often, an investigator's objective is to develop a classification rule to predict group membership of unknown samples based on a small set of features and that could ultimately be used in a clinical setting. While common classification methods such as random forest and support vector machines are effective at separating groups, they do not directly translate into a clinically-applicable classification rule based on a small number of features.We present a simple feature selection and classification method for biomarker detection that is intuitively understandable and can be directly extended for application to a clinical setting. We first use a jackknife procedure to identify important features and then, for classification, we use voting classifiers which are simple and easy to implement. We compared our method to random forest and support vector machines using three benchmark cancer 'omics datasets with different characteristics. We found our jackknife procedure and voting classifier to perform comparably to these two methods in terms of accuracy. Further, the jackknife procedure yielded stable feature sets. Voting classifiers in combination with a robust feature selection method such as our jackknife procedure offer an effective, simple and intuitive approach to feature selection and classification with a clear extension to clinical applications.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA