RESUMO
The continuous improvement of the steelmaking process is a critical issue for steelmakers. In the production of Ca-treated Al-killed steel, the Ca and S contents are controlled for successful inclusion modification treatment. In this study, a machine learning technique was used to build a decision tree classifier and thus identify the process variables that most influence the desired Ca and S contents at the end of ladle furnace refining. The attribute of the root node of the decision tree was correlated with process variables via the Pearson formalism. Thus, the attribute of the root node corresponded to the sulfur distribution coefficient at the end of the refining process, and its value allowed for the discrimination of satisfactory heats from unsatisfactory heats. The variables with higher correlation with the sulfur distribution coefficient were the content of sulfur in both steel and slag at the end of the refining process, as well as the Si content at that stage of the process. As secondary variables, the Si content and the basicity of the slag at the end of the refining process were correlated with the S content in the steel and slag, respectively, at that stage. The analysis showed that the conditions of steel and slag at the beginning of the refining process and the efficient S removal during the refining process are crucial for reaching desired Ca and S contents.
RESUMO
In this paper, the performance of machine learning methods for squirrel cage induction motor broken rotor bar (BRB) fault detection is evaluated. Decision tree classification (DTC), artificial neural network (ANN), and deep learning (DL) methods are developed, applied, and studied to compare their performance in detecting broken rotor bar faults in squirrel cage induction motors. The training data were collected through experimental measurements. The BRB fault features were extracted from measured line-current signatures through a transformation from the time domain to the frequency domain using discrete Fourier Transform (DFT) of the frequency spectrum of the current signal. Eighty percent of the data were used for training the models, and twenty percent were used for testing. A confusion matrix was used to validate the models' performance using accuracy, precision, recall, and f1-scores. The results evidence that the DTC is less load-dependent, and it has better accuracy and precision for both unloaded and loaded squirrel cage induction motors when compared with the DL and ANN methods. The DTC method achieved higher accuracy in the detection of the magnitudes of the twice-frequency sideband components induced in stator currents by BRB faults when compared with the DL and ANN methods. Although the detection accuracy and precision are higher for the loaded motor than the unloaded motor, the DTC method managed to also exhibit a high accuracy for the unloaded current when compared with the DL and ANN methods. The DTC is, therefore, a suitable candidate to detect broken rotor bar faults on trained data for lightly or thoroughly loaded squirrel cage induction motors using the characteristics of the measured line-current signature.
RESUMO
This paper describes the latest development in the classification stage of our Speech Sound Disorder (SSD) Screening algorithm and presents the results achieved by using two classifier models: the Classification and Regression Tree (CART)-based model versus the Single Decision Hyperplane-based Linear Support Vector Machine (SVM) model. For every single speech sound in medial position, 10 features extracted from the audio samples along with an 11th feature representing the validation of the (mis)pronunciation by the Speech Language Pathologist (SLP) were fed into the 2 classifiers to compare and discuss their performance. The accuracy achieved by the two classifiers on a data test size of 30% of the analyzed samples was 98.2% for the Linear SVM classifier, and 100% for the Decision Tree classifier, which are optimal results that encourage our quest for a sound rationale.
Assuntos
Fonética , Máquina de Vetores de Suporte , Algoritmos , Som , Árvores de DecisõesRESUMO
Non-small cell lung cancer is the predominant form of lung cancer and is associated with a poor prognosis. MiRNAs implicated in cancer initiation and progression can be easily detected in liquid biopsy samples and have the potential to serve as non-invasive biomarkers. In this study, we employed next-generation sequencing to globally profile miRNAs in serum samples from 71 early-stage NSCLC patients and 47 non-cancerous pulmonary condition patients. Preliminary analysis of differentially expressed miRNAs revealed 28 upregulated miRNAs in NSCLC compared to the control group. Functional enrichment analyses unveiled their involvement in NSCLC signaling pathways. Subsequently, we developed a gradient-boosting decision tree classifier based on 2588 miRNAs, which demonstrated high accuracy (0.837), sensitivity (0.806), and specificity (0.859) in effectively distinguishing NSCLC from non-cancerous individuals. Shapley Additive exPlanations analysis improved the model metrics by identifying the top 15 miRNAs with the strongest discriminatory value, yielding an AUC of 0.96 ± 0.04, accuracy of 0.896, sensitivity of 0.884, and specificity of 0.903. Our study establishes the potential utility of a non-invasive serum miRNA signature as a supportive tool for early detection of NSCLC while also shedding light on dysregulated miRNAs in NSCLC biology. For enhanced credibility and understanding, further validation in an independent cohort of patients is warranted.
RESUMO
The Internet of Things (IoT) is a transformative technology that is reshaping industries and daily life, leading us towards a connected future that is full of possibilities and innovations. In this paper, we present a robust framework for the application of Internet of Things (IoT) technology in the agricultural sector in Bangladesh. The framework encompasses the integration of IoT, data mining techniques, and cloud monitoring systems to enhance productivity, improve water management, and provide real-time crop forecasting. We conducted rigorous experimentation on the framework. We achieve an accuracy of 87.38% for the proposed model in predicting data harvest. Our findings highlight the effectiveness and transparency of the framework, underscoring the significant potential of the IoT in transforming agriculture and empowering farmers with data-driven decision-making capabilities. The proposed framework might be very impactful in real-life agriculture, especially for monsoon agriculture-based countries like Bangladesh.
Assuntos
Agricultura , Tecnologia , Bangladesh , Agricultura/métodosRESUMO
The growing accessibility of large-scale protein interaction data demands extensive research to understand cell organization and its functioning at the network level. Bioinformatics and data mining researchers have extensively studied network clustering to examine the structural and operational features of protein protein interaction (PPI) networks. Clustering PPI networks has proven useful in numerous research over the past two decades for identifying functional modules, understanding the roles of previously unknown proteins, and other purposes. Protein complexes represent one of the essential cellular components for creating biological activities. Inferring protein complexes has been made more accessible by experimental approaches. We offer a novel method that integrates the classification model with local topological data, making it more reliable and efficient. This article describes a decision tree classifier based on topological characteristics of the subgraph for mining protein complexes. The proposed graph-based algorithm is an effective and efficient way to identify protein complexes from large-scale PPI networks. The performance of the proposed algorithm is observed in protein-protein interaction networks of yeast and human in the Database of Interacting Proteins (DIP) and the Biological General Repository for Interaction Datasets (BioGRID) using widely accepted benchmark protein complexes from the comprehensive resource of mammalian protein complexes (CORUM) and the comprehensive catalogue of yeast protein complexes (CYC2008). The outcomes demonstrate that our method can outperform the best-performing supervised, semi-supervised, and unsupervised approaches to detecting protein complexes.
Assuntos
Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Humanos , Mapeamento de Interação de Proteínas/métodos , Proteínas Fúngicas/metabolismo , Saccharomyces cerevisiae/metabolismo , Algoritmos , Biologia Computacional/métodos , Análise por Conglomerados , Árvores de DecisõesRESUMO
Background: Since its emergence in December 2019, until June 2022, coronavirus 2019 (COVID-19) has impacted populations all around the globe with it having been contracted by ~ 535 M people and leaving ~ 6.31 M dead. This makes identifying and predicating COVID-19 an important healthcare priority. Method and Material: The dataset used in this study was obtained from Shahid Beheshti University of Medical Sciences in Tehran, and includes the information of 29,817 COVID-19 patients who were hospitalized between October 8, 2019 and March 8, 2021. As diabetes has been shown to be a significant factor for poor outcome, we have focused on COVID-19 patients with diabetes, leaving us with 2824 records. Results: The data has been analyzed using a decision tree algorithm and several association rules were mined. Said decision tree was also used in order to predict the release status of patients. We have used accuracy (87.07%), sensitivity (88%), and specificity (80%) as assessment metrics for our model. Conclusion: Initially, this study provided information about the percentages of admitted Covid-19 patients with various underlying disease. It was observed that diabetic patients were the largest population at risk. As such, based on the rules derived from our dataset, we found that age category (51-80), CPR and ICU residency play a pivotal role in the discharge status of diabetic inpatients.
RESUMO
AIMS: This study aimed to clarify the different topographical distribution of tau pathology between progressive supranuclear palsy (PSP) and corticobasal degeneration (CBD) and establish a machine learning-based decision tree classifier. METHODS: Paraffin-embedded sections of the temporal cortex, motor cortex, caudate nucleus, globus pallidus, subthalamic nucleus, substantia nigra, red nucleus, and midbrain tectum from 1020 PSP and 199 CBD cases were assessed by phospho-tau immunohistochemistry. The severity of tau lesions (i.e., neurofibrillary tangle, coiled body, tufted astrocyte or astrocytic plaque, and tau threads) was semi-quantitatively scored in each region. Hierarchical cluster analysis was performed using tau pathology scores. A decision tree classifier was made with tau pathology scores using 914 cases. Cross-validation was done using 305 cases. An additional ten cases were used for a validation study. RESULTS: Cluster analysis displayed two distinct clusters; the first cluster included only CBD, and the other cluster included all PSP and six CBD cases. We built a decision tree, which used only seven decision nodes. The scores of tau threads in the caudate nucleus were the most decisive factor for predicting CBD. In a cross-validation, 302 out of 305 cases were correctly diagnosed. In the pilot validation study, three investigators made a correct diagnosis in all cases using the decision tree. CONCLUSION: Regardless of the morphology of astrocytic tau lesions, semi-quantitative tau pathology scores in select brain regions are sufficient to distinguish PSP and CBD. The decision tree simplifies neuropathologic differential diagnosis of PSP and CBD.
Assuntos
Degeneração Corticobasal/patologia , Árvores de Decisões , Aprendizado de Máquina , Emaranhados Neurofibrilares/patologia , Paralisia Supranuclear Progressiva/patologia , Idoso , Idoso de 80 Anos ou mais , Encéfalo/patologia , Degeneração Corticobasal/diagnóstico , Diagnóstico Diferencial , Feminino , Humanos , Corpos de Inclusão/patologia , Masculino , Pessoa de Meia-Idade , Degeneração Neural/patologia , Emaranhados Neurofibrilares/metabolismo , Paralisia Supranuclear Progressiva/diagnóstico , Proteínas tau/metabolismoRESUMO
Over the last decade, the field of bioinformatics has been increasing rapidly. Robust bioinformatics tools are going to play a vital role in future progress. Scientists working in the field of bioinformatics conduct a large number of researches to extract knowledge from the biological data available. Several bioinformatics issues have evolved as a result of the creation of massive amounts of unbalanced data. The classification of precursor microRNA (pre miRNA) from the imbalanced RNA genome data is one such problem. The examinations proved that pre miRNAs (precursor microRNAs) could serve as oncogene or tumor suppressors in various cancer types. This paper introduces a Hybrid Deep Neural Network framework (H-DNN) for the classification of pre miRNA in imbalanced data. The proposed H-DNN framework is an integration of Deep Artificial Neural Networks (Deep ANN) and Deep Decision Tree Classifiers. The Deep ANN in the proposed H-DNN helps to extract the meaningful features and the Deep Decision Tree Classifier helps to classify the pre miRNA accurately. Experimentation of H-DNN was done with genomes of animals, plants, humans, and Arabidopsis with an imbalance ratio up to 1:5000 and virus with a ratio of 1:400. Experimental results showed an accuracy of more than 99% in all the cases and the time complexity of the proposed H-DNN is also very less when compared with the other existing approaches.
Assuntos
MicroRNAs , Redes Neurais de Computação , Animais , Biologia Computacional/métodos , MicroRNAs/genéticaRESUMO
The rapid spread of novel coronavirus (namely Covid-19) worldwide has alarmed a pandemic since its outbreak in the city of Wuhan, China in December 2019. While the world still tries to wrap its head around as to how to contain the rapid spread of the novel coronavirus, the pandemic has already claimed several thousand lives throughout the world. Yet, the diagnosis of virus spread in humans has proven complexity. A blend of computed tomography imaging, entire genome sequencing, and electron microscopy have been at first adapted to screen and distinguish SARS-CoV-2, the viral etiology of Covid-19. There are a less number of Covid-19 test kits accessible in hospitals because of the expanding cases every day. Accordingly, it is required to utensil a self-exposure framework as a fast substitute analysis to contain Covid-19 spreading among individuals considering the world at large. In the present work, we have elaborated a prudent methodology that helps identify Covid-19 infected people among the normal individuals by utilizing CT scan and chest x-ray images using Artificial Intelligence (AI). The strategy works with a dataset of Covid-19 and normal chest x-ray images. The image diagnosis tool utilizes decision tree classifier for finding novel corona virus infected person. The percentage accuracy of an image is analyzed in terms of precision, recall score and F1 score. The outcome depends on the information accessible in the store of Kaggle and Open-I according to their approved chest X-ray and CT scan images. Interestingly, the test methodology demonstrates that the intended algorithm is robust, accurate and precise. Our technique accomplishes the exactness focused on the AI innovation which provides faster results during both training and inference.
RESUMO
Objective: Factors related to the driver-vehicle-environment system have a significant influence on a driver's decision to perform evasive maneuvers, especially the decision of steering direction (DSD) in critical situations. However, few studies have systematically investigated the relationships between these factors and DSD. The objective of this study is to analyze and model drivers' DSD in critical situations.Methods: Data from the NASS-CDS from 1995 to 2015 were utilized in this study. The decision tree (DT) classifier was utilized to model a driver's DSD for both intersection-related and non-intersection-related subsets, combined with a 10-fold cross-validation technique and grid search approach to evaluate and optimize the model. An analysis of variable importance was also conducted.Results: Two separate DT models of drivers' DSD were obtained based on the optimized hyperparameters, with test accuracies of 84.6% (intersection-related) and 79.2% (non-intersection-related). The variable DIFFANGLE (angle difference between 2 vehicles) ranked as the most important factor influencing drivers' DSD in both models. The variables, in order of importance, were SPEED (travel speed of the subject vehicle) and AGE (driver's age) for the intersection-related model and SPEED, PREMOVE (pre-event movement), TRAFFLOW (trafficway flow), and AGE for the non-intersection-related model. Moreover, an interesting same direction pattern was observed in both DT models.Conclusions: This study employed NASS-CDS data and DT classifiers to analyze and model drivers' DSD behavior. The test accuracies for both classifiers were acceptable. Potential variables influencing drivers' DSD were explored, which improves the research on evasive behavior in lateral movement and promotes further applications for intelligent vehicles using the constructed models.
Assuntos
Acidentes de Trânsito/prevenção & controle , Condução de Veículo/psicologia , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Condução de Veículo/estatística & dados numéricos , Ambiente Construído/estatística & dados numéricos , Bases de Dados Factuais , Árvores de Decisões , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Teóricos , Adulto JovemRESUMO
This paper presents a simple and efficient computer-aided diagnosis method to classify Chronic Myeloid Leukemia (CML) cells based on microscopic image processing. In the proposed method, a novel combination of both typical and new features is introduced for classification of CML cells. Next, an effective decision tree classifier is proposed to classify CML cells into eight groups. The proposed method was evaluated on 1730 CML cell images containing 714 cells of non-cancerous bone marrow aspiration and 1016 cells of cancerous peripheral blood smears. The performance of the proposed classification method was compared to manual labels made by two experts. The average values of accuracy, specificity and sensitivity were 99.0 %, 99.4 % and 98.3 %, respectively for all groups of CML. In addition, Cohen's kappa coefficient demonstrated high conformity, 0.99, between joint diagnostic results of two experts and the obtained results of the proposed approach. According to the obtained results, the suggested method has a high capability to classify effective cells of CML and can be applied as a simple, affordable and reliable computer-aided diagnosis tool to help pathologists to diagnose CML.
RESUMO
Membrane proteins are vital type of proteins that serve as channels, receptors and energy transducers in a cell. They perform various important functions, which are mainly associated with their types. They are also attractive targets of drug discovery for various diseases. So predicting membrane protein types is a crucial and challenging research area in bioinformatics and proteomics. Because of vast investigation of uncharacterized protein sequences in databases, customary biophysical techniques are extremely tedious, costly and vulnerable to mistakes. Subsequently, it is very attractive to build a vigorous, solid, proficient technique to predict membrane protein types. In this work, a novel feature set Exchange Group Based Protein Sequence Representation (EGBPSR) is proposed for classification of membrane proteins with two new feature extraction strategies known as Exchange Group Local Pattern (EGLP) and Amino acid Interval Pattern (AIP). Imbalanced dataset and large dataset are often handled well by decision tree classifiers. Since imbalanced dataset are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification and Regression Tree (CART), ensemble methods such as Adaboost, Random Under Sampling (RUS) boost, Rotation forest and Random forest are analyzed. The overall accuracy achieved in predicting membrane protein types is 96.45%.
Assuntos
Bases de Dados de Proteínas , Proteínas de Membrana/genética , Análise de Sequência de Proteína , Software , Máquina de Vetores de SuporteRESUMO
BACKGROUND: The empirical mode decomposition (EMD) is a technique to analyze the steady-state visual evoked potential (SSVEP) which decomposes the signal into its intrinsic mode functions (IMFs). Although for the limited stimulation frequency range, choosing the effective IMF leads to good results, but extending this range will seriously challenge the method so that even the combination of IMFs is associated with error. METHODS: Stimulation frequencies ranged from 6 to 16 Hz with an interval of 0.5 Hz were generated using Psychophysics toolbox of MATLAB. SSVEP signal was recorded from six subjects. The EMD was used to extract the effective IMFs. Two features, including the frequency related to the peak of spectrum and normalized local energy in this frequency, were extracted for each of six conditions (each IMF, the combination of two consecutive IMFs and the combination of all three IMFs). RESULTS: The instantaneous frequency histogram and the recognition accuracy diagram indicate that for wide stimulation frequency range, not only one IMF, but also the combination of IMFs does not have desirable efficiency. Total recognition accuracy of the proposed method was 79.75%, while the highest results obtained from the EMD-fast Fourier transform (FFT) and the CCA were 72.05% and 77.31%, respectively. CONCLUSION: The proposed method has improved the recognition rate more than 2.4% and 7.7% compared to the CCA and EMD-FFT, respectively, by providing the solution for situations with wide stimulation frequency range.
RESUMO
In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.
RESUMO
Transportation continues to be an integral part of modern life, and the importance of road traffic safety cannot be overstated. Consequently, recent road traffic safety studies have focused on analysis of risk factors that impact fatality and injury level (severity) of traffic accidents. While some of the risk factors, such as drug use and drinking, are widely known to affect severity, an accurate modeling of their influences is still an open research topic. Furthermore, there are innumerable risk factors that are waiting to be discovered or analyzed. A promising approach is to investigate historical traffic accident data that have been collected in the past decades. This study inspects traffic accident reports that have been accumulated by the California Highway Patrol (CHP) since 1973 for which each accident report contains around 100 data fields. Among them, we investigate 25 fields between 2004 and 2010 that are most relevant to car accidents. Using two classification methods, the Naive Bayes classifier and the decision tree classifier, the relative importance of the data fields, i.e., risk factors, is revealed with respect to the resulting severity level. Performances of the classifiers are compared to each other and a binary logistic regression model is used as the basis for the comparisons. Some of the high-ranking risk factors are found to be strongly dependent on each other, and their incremental gains on estimating or modeling severity level are evaluated quantitatively. The analysis shows that only a handful of the risk factors in the data dominate the severity level and that dependency among the top risk factors is an imperative trait to consider for an accurate analysis.