Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
1.
J Med Internet Res ; 26: e62890, 2024 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-39288404

RESUMO

BACKGROUND: Cardiac arrest (CA) is one of the leading causes of death among patients in the intensive care unit (ICU). Although many CA prediction models with high sensitivity have been developed to anticipate CA, their practical application has been challenging due to a lack of generalization and validation. Additionally, the heterogeneity among patients in different ICU subtypes has not been adequately addressed. OBJECTIVE: This study aims to propose a clinically interpretable ensemble approach for the timely and accurate prediction of CA within 24 hours, regardless of patient heterogeneity, including variations across different populations and ICU subtypes. Additionally, we conducted patient-independent evaluations to emphasize the model's generalization performance and analyzed interpretable results that can be readily adopted by clinicians in real-time. METHODS: Patients were retrospectively analyzed using data from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) and the eICU-Collaborative Research Database (eICU-CRD). To address the problem of underperformance, we constructed our framework using feature sets based on vital signs, multiresolution statistical analysis, and the Gini index, with a 12-hour window to capture the unique characteristics of CA. We extracted 3 types of features from each database to compare the performance of CA prediction between high-risk patient groups from MIMIC-IV and patients without CA from eICU-CRD. After feature extraction, we developed a tabular network (TabNet) model using feature screening with cost-sensitive learning. To assess real-time CA prediction performance, we used 10-fold leave-one-patient-out cross-validation and a cross-data set method. We evaluated MIMIC-IV and eICU-CRD across different cohort populations and subtypes of ICU within each database. Finally, external validation using the eICU-CRD and MIMIC-IV databases was conducted to assess the model's generalization ability. The decision mask of the proposed method was used to capture the interpretability of the model. RESULTS: The proposed method outperformed conventional approaches across different cohort populations in both MIMIC-IV and eICU-CRD. Additionally, it achieved higher accuracy than baseline models for various ICU subtypes within both databases. The interpretable prediction results can enhance clinicians' understanding of CA prediction by serving as a statistical comparison between non-CA and CA groups. Next, we tested the eICU-CRD and MIMIC-IV data sets using models trained on MIMIC-IV and eICU-CRD, respectively, to evaluate generalization ability. The results demonstrated superior performance compared with baseline models. CONCLUSIONS: Our novel framework for learning unique features provides stable predictive power across different ICU environments. Most of the interpretable global information reveals statistical differences between CA and non-CA groups, demonstrating its utility as an indicator for clinical decisions. Consequently, the proposed CA prediction system is a clinically validated algorithm that enables clinicians to intervene early based on CA prediction information and can be applied to clinical trials in digital health.


Assuntos
Parada Cardíaca , Unidades de Terapia Intensiva , Aprendizado de Máquina , Humanos , Estudos Retrospectivos , Parada Cardíaca/mortalidade , Masculino , Feminino , Pessoa de Meia-Idade , Idoso
2.
Sci Rep ; 14(1): 18625, 2024 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-39128903

RESUMO

The COVID-19 pandemic has imposed significant challenges on global health, emphasizing the persistent threat of large-scale infectious diseases in the future. This study addresses the need to enhance pooled testing efficiency for large populations. The common approach in pooled testing involves consolidating multiple test samples into a single tube to efficiently detect positivity at a lower cost. However, what is the optimal number of samples to be grouped together in order to minimize costs? i.e. allocating ten individuals per group may not be the most cost-effective strategy. In response, this paper introduces the hierarchical quotient space, an extension of fuzzy equivalence relations, as a method to optimize group allocations. In this study, we propose a cost-sensitive multi-granularity intelligent decision model to further minimize testing costs. This model considers both testing and collection costs, aiming to achieve the lowest total cost through optimal grouping at a single layer. Building upon this foundation, two multi-granularity models are proposed, exploring hierarchical group optimization. The experimental simulations were conducted using MATLAB R2022a on a desktop with Intel i5-10500 CPU and 8G RAM, considering scenarios with a fixed number of individuals and fixed positive probability. The main findings from our simulations demonstrate that the proposed models significantly enhance the efficiency and reduce the overall costs associated with pooled testing. For example, testing costs were reduced by nearly half when the optimal grouping strategy was applied, compared to the traditional method of grouping ten individuals. Additionally, the multi-granularity approach further optimized the hierarchical groupings, leading to substantial cost savings and improved testing efficiency.


Assuntos
COVID-19 , Análise Custo-Benefício , Humanos , COVID-19/epidemiologia , COVID-19/diagnóstico , COVID-19/economia , COVID-19/virologia , SARS-CoV-2/isolamento & purificação , Teste para COVID-19/métodos , Teste para COVID-19/economia , Pandemias/economia , Técnicas de Apoio para a Decisão
3.
Diagnostics (Basel) ; 14(8)2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38667487

RESUMO

This study used artificial intelligence techniques to identify clinical cancer biomarkers for recurrent gastric cancer survivors. From a hospital-based cancer registry database in Taiwan, the datasets of the incidence of recurrence and clinical risk features were included in 2476 gastric cancer survivors. We benchmarked Random Forest using MLP, C4.5, AdaBoost, and Bagging algorithms on metrics and leveraged the synthetic minority oversampling technique (SMOTE) for imbalanced dataset issues, cost-sensitive learning for risk assessment, and SHapley Additive exPlanations (SHAPs) for feature importance analysis in this study. Our proposed Random Forest outperformed the other models with an accuracy of 87.9%, a recall rate of 90.5%, an accuracy rate of 86%, and an F1 of 88.2% on the recurrent category by a 10-fold cross-validation in a balanced dataset. We identified clinical features of recurrent gastric cancer, which are the top five features, stage, number of regional lymph node involvement, Helicobacter pylori, BMI (body mass index), and gender; these features significantly affect the prediction model's output and are worth paying attention to in the following causal effect analysis. Using an artificial intelligence model, the risk factors for recurrent gastric cancer could be identified and cost-effectively ranked according to their feature importance. In addition, they should be crucial clinical features to provide physicians with the knowledge to screen high-risk patients in gastric cancer survivors as well.

4.
Front Cardiovasc Med ; 11: 1276608, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38566962

RESUMO

Background and objectives: Hypertension is one of the most serious risk factors and the leading cause of mortality in patients with cardiovascular diseases (CVDs). It is necessary to accurately predict the mortality of patients suffering from CVDs with hypertension. Therefore, this paper proposes a novel cost-sensitive deep neural network (CSDNN)-based mortality prediction model for out-of-hospital acute myocardial infarction (AMI) patients with hypertension on imbalanced data. Methods: The synopsis of our research is as follows. First, the experimental data is extracted from the Korea Acute Myocardial Infarction Registry-National Institutes of Health (KAMIR-NIH) and preprocessed with several approaches. Then the imbalanced experimental dataset is divided into training data (80%) and test data (20%). After that, we design the proposed CSDNN-based mortality prediction model, which can solve the skewed class distribution between the majority and minority classes in the training data. The threshold moving technique is also employed to enhance the performance of the proposed model. Finally, we evaluate the performance of the proposed model using the test data and compare it with other commonly used machine learning (ML) and data sampling-based ensemble models. Moreover, the hyperparameters of all models are optimized through random search strategies with a 5-fold cross-validation approach. Results and discussion: In the result, the proposed CSDNN model with the threshold moving technique yielded the best results on imbalanced data. Additionally, our proposed model outperformed the best ML model and the classic data sampling-based ensemble model with an AUC of 2.58% and 2.55% improvement, respectively. It aids in decision-making and offers a precise mortality prediction for AMI patients with hypertension.

5.
Chin J Integr Med ; 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38532153

RESUMO

OBJECTIVE: To establish the dynamic treatment strategy of Chinese medicine (CM) for metastatic colorectal cancer (mCRC) by machine learning algorithm, in order to provide a reference for the selection of CM treatment strategies for mCRC. METHODS: From the outpatient cases of mCRC in the Department of Oncology at Xiyuan Hospital, China Academy of Chinese Medical Sciences, 197 cases that met the inclusion criteria were screened. According to different CM intervention strategies, the patients were divided into 3 groups: CM treatment alone, equal emphasis on Chinese and Western medicine treatment (CM combined with local treatment of tumors, oral chemotherapy, or targeted drugs), and CM assisted Western medicine treatment (CM combined with intravenous regimen of Western medicine). The survival time of patients undergoing CM intervention was taken as the final evaluation index. Factors affecting the choice of CM intervention scheme were screened as decision variables. The dynamic CM intervention and treatment strategy for mCRC was explored based on the cost-sensitive classification learning algorithm for survival (CSCLSurv). Patients' survival was estimated using the Kaplan-Meier method, and the survival time of patients who received the model-recommended treatment plan were compared with those who received actual treatment plan. RESULTS: Using the survival time of patients undergoing CM intervention as the evaluation index, a dynamic CM intervention therapy strategy for mCRC was established based on CSCLSurv. Different CM intervention strategies for mCRC can be selected according to dynamic decision variables, such as gender, age, Eastern Cooperative Oncology Group score, tumor site, metastatic site, genotyping, and the stage of Western medicine treatment at the patient's first visit. The median survival time of patients who received the model-recommended treatment plan was 35 months, while those who receive the actual treatment plan was 26.0 months (P=0.06). CONCLUSIONS: The dynamic treatment strategy of CM, based on CSCLSurv for mCRC, plays a certain role in providing clinical hints in CM. It can be further improved in future prospective studies with larger sample sizes.

6.
Technol Health Care ; 32(4): 2733-2753, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38393866

RESUMO

BACKGROUND: Artificial Intelligence (AI) plays a pivotal role in the diagnosis of health conditions ranging from general well-being to critical health issues. In the realm of health diagnostics, an often overlooked but critical aspect is the consideration of cost-sensitive learning, a facet that this study prioritizes over the non-invasive nature of the diagnostic process whereas the other standard metrics such as accuracy and sensitivity reflect weakness in error profile. OBJECTIVE: This research aims to investigate the total cost of misclassification (Total Cost) by decision rule Machine Learning (ML) algorithms implemented in Java platforms such as DecisionTable, JRip, OneR, and PART. An augmented dataset with conjunctiva images along candidates' demographic and anthropometric features under supervised learning is considered with a specific emphasis on cost-sensitive classification. METHODS: The opted decision rule classifiers use the text features, additionally the image feature 'a* value of CIELAB color space' extracted from the conjunctiva digital images as input attributes. The pre-processing consists of amalgamating text and image features on a uniform scale, normalizing. Then the 10-fold cross-validation enables the classification of samples into two categories: the presence or absence of the anemia. This study utilizes the Cost Ratio (ρ) extracted from the cost matrix to meticulously monitor the Total Cost in four different cost ratio methodologies namely Uniform (U), Uniform Inverted (UI), Non-Uniform (NU), and Non-Uniform Inverted (NUI). RESULTS: It has been established that the PART classifier stands out as the top performer in this binary classification task, yielding the lowest mean total cost of 629.9 compared to other selected classifiers. Moreover, it demonstrates a comparatively lower standard deviation 335.9, and lower total cost across all four different cost ratio methodologies. The ranking of algorithm performance goes as follows: PART, JRIP, DecisionTable, and OneR. CONCLUSION: The significance of adopting a cost-sensitive learning approach is emphasized showing the PART classifier's consistent performance within the proposed framework for learning the anemia dataset. This emphasis on cost-sensitive learning not only enhances the recommendations in diagnosis but also holds the potential for substantial cost savings and makes it a noteworthy focal point in the advancement of AI-driven health care.


Assuntos
Algoritmos , Aprendizado de Máquina , Humanos , Anemia/diagnóstico , Anemia/economia , Túnica Conjuntiva , Inteligência Artificial/economia
7.
Sensors (Basel) ; 23(23)2023 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-38067837

RESUMO

In this work, cost-sensitive decision support was developed. Using Batch Data Analytics (BDA) methods of the batch data structure and feature accommodation, the batch process property and sensor data can be accommodated. The batch data structure organises the batch processes' data, and the feature accommodation approach derives statistics from the time series, consequently aligning the time series with the other features. Three machine learning classifiers were implemented for comparison: Logistic Regression (LR), Random Forest Classifier (RFC), and Support Vector Machine (SVM). It is possible to filter out the low-probability predictions by leveraging the classifiers' probability estimations. Consequently, the decision support has a trade-off between accuracy and coverage. Cost-sensitive learning was used to implement a cost matrix, which further aggregates the accuracy-coverage trade into cost metrics. Also, two scenarios were implemented for accommodating out-of-coverage batches. The batch is discarded in one scenario, and the other is processed. The Random Forest classifier was shown to outperform the other classifiers and, compared to the baseline scenario, had a relative cost of 26%. This synergy of methods provides cost-aware decision support for analysing the intricate workings of a multiprocess batch data system.

8.
J Med Internet Res ; 25: e48244, 2023 12 22.
Artigo em Inglês | MEDLINE | ID: mdl-38133922

RESUMO

BACKGROUND: Cardiac arrest (CA) is the leading cause of death in critically ill patients. Clinical research has shown that early identification of CA reduces mortality. Algorithms capable of predicting CA with high sensitivity have been developed using multivariate time series data. However, these algorithms suffer from a high rate of false alarms, and their results are not clinically interpretable. OBJECTIVE: We propose an ensemble approach using multiresolution statistical features and cosine similarity-based features for the timely prediction of CA. Furthermore, this approach provides clinically interpretable results that can be adopted by clinicians. METHODS: Patients were retrospectively analyzed using data from the Medical Information Mart for Intensive Care-IV database and the eICU Collaborative Research Database. Based on the multivariate vital signs of a 24-hour time window for adults diagnosed with heart failure, we extracted multiresolution statistical and cosine similarity-based features. These features were used to construct and develop gradient boosting decision trees. Therefore, we adopted cost-sensitive learning as a solution. Then, 10-fold cross-validation was performed to check the consistency of the model performance, and the Shapley additive explanation algorithm was used to capture the overall interpretability of the proposed model. Next, external validation using the eICU Collaborative Research Database was performed to check the generalization ability. RESULTS: The proposed method yielded an overall area under the receiver operating characteristic curve (AUROC) of 0.86 and area under the precision-recall curve (AUPRC) of 0.58. In terms of the timely prediction of CA, the proposed model achieved an AUROC above 0.80 for predicting CA events up to 6 hours in advance. The proposed method simultaneously improved precision and sensitivity to increase the AUPRC, which reduced the number of false alarms while maintaining high sensitivity. This result indicates that the predictive performance of the proposed model is superior to the performances of the models reported in previous studies. Next, we demonstrated the effect of feature importance on the clinical interpretability of the proposed method and inferred the effect between the non-CA and CA groups. Finally, external validation was performed using the eICU Collaborative Research Database, and an AUROC of 0.74 and AUPRC of 0.44 were obtained in a general intensive care unit population. CONCLUSIONS: The proposed framework can provide clinicians with more accurate CA prediction results and reduce false alarm rates through internal and external validation. In addition, clinically interpretable prediction results can facilitate clinician understanding. Furthermore, the similarity of vital sign changes can provide insights into temporal pattern changes in CA prediction in patients with heart failure-related diagnoses. Therefore, our system is sufficiently feasible for routine clinical use. In addition, regarding the proposed CA prediction system, a clinically mature application has been developed and verified in the future digital health field.


Assuntos
Parada Cardíaca , Insuficiência Cardíaca , Adulto , Humanos , Inteligência Artificial , Estudos Retrospectivos , Parada Cardíaca/diagnóstico , Parada Cardíaca/terapia , Insuficiência Cardíaca/diagnóstico , Hospitais
9.
Med Image Anal ; 88: 102867, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37348167

RESUMO

High throughput nuclear segmentation and classification of whole slide images (WSIs) is crucial to biological analysis, clinical diagnosis and precision medicine. With the advances of CNN algorithms and the continuously growing datasets, considerable progress has been made in nuclear segmentation and classification. However, few works consider how to reasonably deal with nuclear heterogeneity in the following two aspects: imbalanced data distribution and diversified morphology characteristics. The minority classes might be dominated by the majority classes due to the imbalanced data distribution and the diversified morphology characteristics may lead to fragile segmentation results. In this study, a cost-Sensitive MultI-task LEarning (SMILE) framework is conducted to tackle the data heterogeneity problem. Based on the most popular multi-task learning backbone in nuclei segmentation and classification, we propose a multi-task correlation attention (MTCA) to perform feature interaction of multiple high relevant tasks to learn better feature representation. A cost-sensitive learning strategy is proposed to solve the imbalanced data distribution by increasing the penalization for the error classification of the minority classes. Furthermore, we propose a novel post-processing step based on the coarse-to-fine marker-controlled watershed scheme to alleviate fragile segmentation when nuclei are with large size and unclear contour. Extensive experiments show that the proposed method achieves state-of-the-art performances on CoNSeP and MoNuSAC 2020 datasets. The code is available at: https://github.com/panxipeng/nuclear_segandcls.


Assuntos
Algoritmos , Aprendizagem , Humanos , Núcleo Celular , Processamento de Imagem Assistida por Computador , Medicina de Precisão
10.
Math Biosci Eng ; 20(5): 7957-7980, 2023 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-37161181

RESUMO

Circular RNAs (circRNAs) constitute a category of circular non-coding RNA molecules whose abnormal expression is closely associated with the development of diseases. As biological data become abundant, a lot of computational prediction models have been used for circRNA-disease association prediction. However, existing prediction models ignore the non-linear information of circRNAs and diseases when fusing multi-source similarities. In addition, these models fail to take full advantage of the vital feature information of high-similarity neighbor nodes when extracting features of circRNAs or diseases. In this paper, we propose a deep learning model, CDA-SKAG, which introduces a similarity kernel fusion algorithm to integrate multi-source similarity matrices to capture the non-linear information of circRNAs or diseases, and construct a circRNA information space and a disease information space. The model embeds an attention-enhancing layer in the graph autoencoder to enhance the associations between nodes with higher similarity. A cost-sensitive neural network is introduced to address the problem of positive and negative sample imbalance, consequently improving our model's generalization capability. The experimental results show that the prediction performance of our model CDA-SKAG outperformed existing circRNA-disease association prediction models. The results of the case studies on lung and cervical cancer suggest that CDA-SKAG can be utilized as an effective tool to assist in predicting circRNA-disease associations.


Assuntos
Algoritmos , RNA Circular , Redes Neurais de Computação
11.
Comput Biol Med ; 159: 106890, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37116240

RESUMO

BACKGROUND AND OBJECTIVES: The progression of pulmonary diseases is a complex progress. Timely predicting whether the patients will progress to the severe stage or not in its early stage is critical to take appropriate hospital treatment. However, this task suffers from the "insufficient and incomplete" data issue since it is clinically impossible to have adequate training samples for one patient at each day. Besides, the training samples are extremely imbalanced since the patients who will progress to the severe stage is far less than those who will not progress to the non-severe stage. METHOD: We consider the severity prediction of pulmonary diseases as a time estimation problem based on CT scans. To handle the issue of "insufficient and incomplete" training samples, we introduced label distribution learning (LDL). Specifically, we generate a label distribution for each patient, making a CT image contribute to not only the learning of its chronological day, but also the learning of its neighboring days. In addition, a cost-sensitive mechanism is introduced to explore the imbalance data issue. To identify the importance of pulmonary segments in pulmonary disease severity prediction, multi-kernel learning in composite kernel space is further incorporated and particle swarm optimization (PSO) is used to find the optimal kernel weights. RESULTS: We compare the performance of the proposed CS-LD-MKSVR algorithm with several classical machine learning algorithms and deep learning (DL) algorithms. The proposed method has obtained the best classification results on the in-house data, fully indicating its effectiveness in pulmonary disease severity prediction. CONTRIBUTIONS: The severity prediction of pulmonary diseases is considered as a time estimation problem, and label distribution is introduced to describe the conversion time from non-severe stage to severe stage. The cost-sensitive mechanism is also introduced to handle the data imbalance issue to further improve the classification performance.


Assuntos
Algoritmos , Pneumopatias , Humanos , Pneumopatias/diagnóstico por imagem , Aprendizado de Máquina , Tomografia Computadorizada por Raios X
12.
Bioengineering (Basel) ; 10(4)2023 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-37106606

RESUMO

Large hospitals can be complex, with numerous discipline and subspecialty settings. Patients may have limited medical knowledge, making it difficult for them to determine which department to visit. As a result, visits to the wrong departments and unnecessary appointments are common. To address this issue, modern hospitals require a remote system capable of performing intelligent triage, enabling patients to perform self-service triage. To address the challenges outlined above, this study presents an intelligent triage system based on transfer learning, capable of processing multilabel neurological medical texts. The system predicts a diagnosis and corresponding department based on the patient's input. It utilizes the triage priority (TP) method to label diagnostic combinations found in medical records, converting a multilabel problem into a single-label one. The system considers disease severity and reduces the "class overlapping" of the dataset. The BERT model classifies the chief complaint text, predicting a primary diagnosis corresponding to the complaint. To address data imbalance, a composite loss function based on cost-sensitive learning is added to the BERT architecture. The study results indicate that the TP method achieves a classification accuracy of 87.47% on medical record text, outperforming other problem transformation methods. By incorporating the composite loss function, the system's accuracy rate improves to 88.38% surpassing other loss functions. Compared to traditional methods, this system does not introduce significant complexity, yet substantially improves triage accuracy, reduces patient input confusion, and enhances hospital triage capabilities, ultimately improving the patient's medical experience. The findings could provide a reference for intelligent triage development.

13.
Sensors (Basel) ; 23(5)2023 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-36904815

RESUMO

Owing to the remarkable development of deep learning algorithms, defect detection techniques based on deep neural networks have been extensively applied in industrial production. Most existing surface defect detection models assign equal costs to the classification errors among different defect categories but do not strictly distinguish them. However, various errors can generate a great discrepancy in decision risk or classification costs and then produce a cost-sensitive issue that is crucial to the manufacturing process. To address this engineering challenge, we propose a novel supervised classification cost-sensitive learning method (SCCS) and apply it to improve YOLOv5 as CS-YOLOv5, where the classification loss function of object detection was reconstructed according to a new cost-sensitive learning criterion explained by a label-cost vector selection method. In this way, the classification risk information from a cost matrix is directly introduced into the detection model and fully exploited in training. As a result, the developed approach can make low-risk classification decisions for defect detection. It is applicable for direct cost-sensitive learning based on a cost matrix to implement detection tasks. Using two datasets of a painting surface and a hot-rolled steel strip surface, our CS-YOLOv5 model outperforms the original version with respect to cost under different positive classes, coefficients, and weight ratios, but also maintains effective detection performance measured by mAP and F1 scores.

14.
Comput Biol Med ; 154: 106571, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36709518

RESUMO

Melanoma is a deadly malignant skin cancer that generally grows and spreads rapidly. Early detection of melanoma can improve the prognosis of a patient. However, large-scale screening for melanoma is arduous due to human error and the unavailability of trained experts. Accurate automatic melanoma classification from dermoscopy images can help mitigate such issues. However, the classification task is challenging due to class-imbalance, high inter-class, and low intra-class similarity problems. It results in poor sensitivity scores when it comes to the disease classification task. The work proposes a novel knowledge-distilled lightweight Deep-CNN-based framework for melanoma classification to tackle the high inter-class and low intra-class similarity problems. To handle the high class-imbalance problem, the work proposes using Cost-Sensitive Learning with Focal Loss, to achieve better sensitivity scores. As a pre-processing step, an in-painting algorithm is used to remove artifacts from dermoscopy images. New CutOut variants, namely, Sprinkled and microscopic Cutout augmentations, have been employed as regularizers to avoid over-fitting. The robustness of the model has been studied through stratified K-fold cross-validation. Ablation studies with test time augmentation (TTA) and the addition of various noises like salt & pepper, pepper-only, and Gaussian noises have been studied. All the models trained in the work have been evaluated on the SIIM-ISIC Melanoma Classification Challenge - ISIC-2020 dataset. With our EfficientNet-B5 (FL) teacher model, the EfficientNet-B2 student model achieved an Area under the Curve (AUC) of 0.9295, and a sensitivity of 0.8087 on the ISIC-2020 test data. The sensitivity value of 0.8087 for melanoma classification is the current state-of-the-art result in the literature for the ISIC-2020 dataset which is a significant 49.48% increase from the best non-distilled standalone model, EfficientNet B5 (FL) teacher with 0.5410.


Assuntos
Melanoma , Neoplasias Cutâneas , Humanos , Dermoscopia/métodos , Redes Neurais de Computação , Melanoma/diagnóstico por imagem , Melanoma/patologia , Neoplasias Cutâneas/diagnóstico por imagem , Neoplasias Cutâneas/patologia , Algoritmos
15.
Artigo em Inglês | MEDLINE | ID: mdl-36613150

RESUMO

Hospital-Acquired Pressure Injury (HAPI), known as bedsore or decubitus ulcer, is one of the most common health conditions in the United States. Machine learning has been used to predict HAPI. This is insufficient information for the clinical team because knowing who would develop HAPI in the future does not help differentiate the severity of those predicted cases. This research develops an integrated system of multifaceted machine learning models to predict if and when HAPI occurs. Phase 1 integrates Genetic Algorithm with Cost-Sensitive Support Vector Machine (GA-CS-SVM) to handle the high imbalance HAPI dataset to predict if patients will develop HAPI. Phase 2 adopts Grid Search with SVM (GS-SVM) to predict when HAPI will occur for at-risk patients. This helps to prioritize who is at the highest risk and when that risk will be highest. The performance of the developed models is compared with state-of-the-art models in the literature. GA-CS-SVM achieved the best Area Under the Curve (AUC) (75.79 ± 0.58) and G-mean (75.73 ± 0.59), while GS-SVM achieved the best AUC (75.06) and G-mean (75.06). The research outcomes will help prioritize at-risk patients, allocate targeted resources and aid with better medical staff planning to provide intervention to those patients.


Assuntos
Úlcera por Pressão , Humanos , Úlcera por Pressão/epidemiologia , Úlcera por Pressão/etiologia , Aprendizado de Máquina , Máquina de Vetores de Suporte , Área Sob a Curva , Hospitais
16.
ISA Trans ; 136: 245-253, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36379759

RESUMO

Due to the requirement of safety and reliability in power systems, unstable samples in the real system are rarely appeared. The evaluation results of the model trained by these imbalance samples have a certain preference. Generally, the imbalance in quantity is taken into account, while the imbalance in quality is ignored. Faced with such a problem, an imbalanced correction method based on support vector machine (SVM) is proposed. Firstly, the classification hyperplane trained by SVM and the normalized Euclidean distance between each sample and the classification hyperplane are calculated so as to obtain their fault severity. Based on this, training samples can be grouped to multilevel sets. Then, the original stacked sparse auto-encoder (SSAE) are pretrained to quantify the imbalance between two classes of samples in multilevel sets. Subsequently, in order to improve the imbalance of training samples, a cost-sensitive correction matrix is generated according to the imbalanced information of multilevel sets. Finally, the loss function of SSAE is modified by cost-sensitive correction matrix to establish the final classifier. Simulation results in IEEE 39-bus system and the realistic regional power system of Eastern China show the high performance of the proposed imbalanced correction method.

17.
Comput Methods Programs Biomed ; 228: 107238, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36423485

RESUMO

BACKGROUND AND OBJECTIVE: The assessment of the image quality is crucial before the computer-aided diagnosis of fundus images. This task is very challenging. Firstly, the subjective judgments of graders on image quality lead to ambiguous labels. Secondly, despite being treated as classification in existing works, grading has regression properties that cannot be ignored. Solving the ambiguity problem and regression problem in the label space, and extracting discriminative features, have become the keys to quality assessment. METHODS: In this paper, we proposed a framework that can assess the quality of fundus images accurately and reasonably based on deep convolutional neural networks. Drawing on the experience of human graders, a dual-path convolutional neural network with attention blocks is designed to better extract discriminative features and present the bases of decision. Label smoothing and cost-sensitive regularization are designed to solve the label ambiguity problem and the potential regression problem respectively. Besides, a large number of images are annotated by us to further improve the results. RESULTS: We conducted our experiments on the largest retinal image quality assessment dataset with 28,792 retinal images. Our approach achieves 0.8868 precision, 0.8786 recall, 0.8820 F1, and 0.9138 Kappa score. Results show that our approach outperforms state-of-the-art methods. CONCLUSIONS: The promising performances reveal that our methods are beneficial to retinal image quality assessment and have potential in other grading tasks.

18.
Cancers (Basel) ; 14(23)2022 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-36497355

RESUMO

Deep learning-based models have been employed for the detection and classification of skin diseases through medical imaging. However, deep learning-based models are not effective for rare skin disease detection and classification. This is mainly due to the reason that rare skin disease has very a smaller number of data samples. Thus, the dataset will be highly imbalanced, and due to the bias in learning, most of the models give better performances. The deep learning models are not effective in detecting the affected tiny portions of skin disease in the overall regions of the image. This paper presents an attention-cost-sensitive deep learning-based feature fusion ensemble meta-classifier approach for skin cancer detection and classification. Cost weights are included in the deep learning models to handle the data imbalance during training. To effectively learn the optimal features from the affected tiny portions of skin image samples, attention is integrated into the deep learning models. The features from the finetuned models are extracted and the dimensionality of the features was further reduced by using a kernel-based principal component (KPCA) analysis. The reduced features of the deep learning-based finetuned models are fused and passed into ensemble meta-classifiers for skin disease detection and classification. The ensemble meta-classifier is a two-stage model. The first stage performs the prediction of skin disease and the second stage performs the classification by considering the prediction of the first stage as features. Detailed analysis of the proposed approach is demonstrated for both skin disease detection and skin disease classification. The proposed approach demonstrated an accuracy of 99% on skin disease detection and 99% on skin disease classification. In all the experimental settings, the proposed approach outperformed the existing methods and demonstrated a performance improvement of 4% accuracy for skin disease detection and 9% accuracy for skin disease classification. The proposed approach can be used as a computer-aided diagnosis (CAD) tool for the early diagnosis of skin cancer detection and classification in healthcare and medical environments. The tool can accurately detect skin diseases and classify the skin disease into their skin disease family.

19.
J Appl Stat ; 49(13): 3257-3277, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36213775

RESUMO

Logistic regression is estimated by maximizing the log-likelihood objective function formulated under the assumption of maximizing the overall accuracy. That does not apply to the imbalanced data. The resulting models tend to be biased towards the majority class (i.e. non-event), which can bring great loss in practice. One strategy for mitigating such bias is to penalize the misclassification costs of observations differently in the log-likelihood function. Existing solutions require either hard hyperparameter estimating or high computational complexity. We propose a novel penalized log-likelihood function by including penalty weights as decision variables for observations in the minority class (i.e. event) and learning them from data along with model coefficients. In the experiments, the proposed logistic regression model is compared with the existing ones on the statistics of area under receiver operating characteristics (ROC) curve from 10 public datasets and 16 simulated datasets, as well as the training time. A detailed analysis is conducted on an imbalanced credit dataset to examine the estimated probability distributions, additional performance measurements (i.e. type I error and type II error) and model coefficients. The results demonstrate that both the discrimination ability and computation efficiency of logistic regression models are improved using the proposed log-likelihood function as the learning objective.

20.
Sensors (Basel) ; 22(18)2022 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-36146110

RESUMO

Aiming at the problem of class imbalance in the wind turbine blade bolts operation-monitoring dataset, a fault detection method for wind turbine blade bolts based on Gaussian Mixture Model-Synthetic Minority Oversampling Technique-Gaussian Mixture Model (GSG) combined with Cost-Sensitive LightGBM (CS-LightGBM) was proposed. Since it is difficult to obtain the fault samples of blade bolts, the GSG oversampling method was constructed to increase the fault samples in the blade bolt dataset. The method obtains the optimal number of clusters through the BIC criterion, and uses the GMM based on the optimal number of clusters to optimally cluster the fault samples in the blade bolt dataset. According to the density distribution of fault samples in inter-clusters, we synthesized new fault samples using SMOTE in an intra-cluster. This retains the distribution characteristics of the original fault class samples. Then, we used the GMM with the same initial cluster center to cluster the fault class samples that were added to new samples, and removed the synthetic fault class samples that were not clustered into the corresponding clusters. Finally, the synthetic data training set was used to train the CS-LightGBM fault detection model. Additionally, the hyperparameters of CS-LightGBM were optimized by the Bayesian optimization algorithm to obtain the optimal CS-LightGBM fault detection model. The experimental results show that compared with six models including SMOTE-LightGBM, CS-LightGBM, K-means-SMOTE-LightGBM, etc., the proposed fault detection model is superior to the other comparison methods in the false alarm rate, missing alarm rate and F1-score index. The method can well realize the fault detection of large wind turbine blade bolts.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA