Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Comput Med Imaging Graph ; 115: 102379, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38608333

RESUMEN

Deep learning (DL) has demonstrated its innate capacity to independently learn hierarchical features from complex and multi-dimensional data. A common understanding is that its performance scales up with the amount of training data. However, the data must also exhibit variety to enable improved learning. In medical imaging data, semantic redundancy, which is the presence of similar or repetitive information, can occur due to the presence of multiple images that have highly similar presentations for the disease of interest. Also, the common use of augmentation methods to generate variety in DL training could limit performance when indiscriminately applied to such data. We hypothesize that semantic redundancy would therefore tend to lower performance and limit generalizability to unseen data and question its impact on classifier performance even with large data. We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data and demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set, during both internal (recall: 0.7164 vs 0.6597, p<0.05) and external testing (recall: 0.3185 vs 0.2589, p<0.05). Our findings emphasize the importance of information-oriented training sample selection as opposed to the conventional practice of using all available training data.


Asunto(s)
Aprendizaje Profundo , Radiografía Torácica , Semántica , Humanos
2.
PLOS Digit Health ; 3(1): e0000286, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38232121

RESUMEN

Model initialization techniques are vital for improving the performance and reliability of deep learning models in medical computer vision applications. While much literature exists on non-medical images, the impacts on medical images, particularly chest X-rays (CXRs) are less understood. Addressing this gap, our study explores three deep model initialization techniques: Cold-start, Warm-start, and Shrink and Perturb start, focusing on adult and pediatric populations. We specifically focus on scenarios with periodically arriving data for training, thereby embracing the real-world scenarios of ongoing data influx and the need for model updates. We evaluate these models for generalizability against external adult and pediatric CXR datasets. We also propose novel ensemble methods: F-score-weighted Sequential Least-Squares Quadratic Programming (F-SLSQP) and Attention-Guided Ensembles with Learnable Fuzzy Softmax to aggregate weight parameters from multiple models to capitalize on their collective knowledge and complementary representations. We perform statistical significance tests with 95% confidence intervals and p-values to analyze model performance. Our evaluations indicate models initialized with ImageNet-pretrained weights demonstrate superior generalizability over randomly initialized counterparts, contradicting some findings for non-medical images. Notably, ImageNet-pretrained models exhibit consistent performance during internal and external testing across different training scenarios. Weight-level ensembles of these models show significantly higher recall (p<0.05) during testing compared to individual models. Thus, our study accentuates the benefits of ImageNet-pretrained weight initialization, especially when used with weight-level ensembles, for creating robust and generalizable deep learning solutions.

3.
Artículo en Inglés | MEDLINE | ID: mdl-38083689

RESUMEN

Chronic lower back (CLB) pain limits patients' day-to-day activities, increases their missed days of work, and causes emotional distress. Developing adequate and individual-tailored treatment for CLB patients requires a better understanding of pain and protective behaviors, and how these behaviors are modulated or altered by context and subjectivity. In this work, we conducted experiments to investigate 1) the relationship between pain and protective behaviors in patients with CLB pain, 2) whether individual differences and context are relevant factors in the relationship, and 3) the impact of this relationship and its factors on the performance of current automated models for pain and protective behavior perception. Our results show 1) significant association (p - value < 0.05) between pain and protective behaviors in patients with CLB pain and 2) subjectivity and context are influential factors in this association. Further, our results show that considering this association along with its factors significantly (p-value < 0.05) improves the performance of automated pain and protective behaviors perception. These findings highlight the role of this association on pain and protective behaviors perception and raise several questions about the robustness of existing automated models that do not take this association into account.


Asunto(s)
Dolor de Espalda , Dolor de la Región Lumbar , Humanos , Dolor de Espalda/prevención & control , Emociones
4.
ArXiv ; 2023 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-37986725

RESUMEN

Deep learning (DL) has demonstrated its innate capacity to independently learn hierarchical features from complex and multi-dimensional data. A common understanding is that its performance scales up with the amount of training data. Another data attribute is the inherent variety. It follows, therefore, that semantic redundancy, which is the presence of similar or repetitive information, would tend to lower performance and limit generalizability to unseen data. In medical imaging data, semantic redundancy can occur due to the presence of multiple images that have highly similar presentations for the disease of interest. Further, the common use of augmentation methods to generate variety in DL training may be limiting performance when applied to semantically redundant data. We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data. We demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set, during both internal (recall: 0.7164 vs 0.6597, p<0.05) and external testing (recall: 0.3185 vs 0.2589, p<0.05). Our findings emphasize the importance of information-oriented training sample selection as opposed to the conventional practice of using all available training data.

5.
Int J Cardiovasc Imaging ; 39(12): 2437-2450, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37682418

RESUMEN

Current noninvasive estimation of right atrial pressure (RAP) by inferior vena cava (IVC) measurement during echocardiography may have significant inter-rater variability due to different levels of observers' experience. Therefore, there is a need to develop new approaches to decrease the variability of IVC analysis and RAP estimation. This study aims to develop a fully automated artificial intelligence (AI)-based system for automated IVC analysis and RAP estimation. We presented a multi-stage AI system to identify the IVC view, select good quality images, delineate the IVC region and quantify its thickness, enabling temporal tracking of its diameter and collapsibility changes. The automated system was trained and tested on expert manual IVC and RAP reference measurements obtained from 255 patients during routine clinical workflow. The performance was evaluated using Pearson correlation and Bland-Altman analysis for IVC values, as well as macro accuracy and chi-square test for RAP values. Our results show an excellent agreement (r=0.96) between automatically computed versus manually measured IVC values, and Bland-Altman analysis showed a small bias of [Formula: see text]0.33 mm. Further, there is an excellent agreement ([Formula: see text]) between automatically estimated versus manually derived RAP values with a macro accuracy of 0.85. The proposed AI-based system accurately quantified IVC diameter, collapsibility index, both are used for RAP estimation. This automated system could serve as a paradigm to perform IVC analysis in routine echocardiography and support various cardiac diagnostic applications.


Asunto(s)
Inteligencia Artificial , Presión Atrial , Humanos , Valor Predictivo de las Pruebas , Ecocardiografía , Corazón , Vena Cava Inferior/diagnóstico por imagen
6.
Expert Syst Appl ; 229(Pt A)2023 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-37397242

RESUMEN

Lung segmentation in chest X-rays (CXRs) is an important prerequisite for improving the specificity of diagnoses of cardiopulmonary diseases in a clinical decision support system. Current deep learning models for lung segmentation are trained and evaluated on CXR datasets in which the radiographic projections are captured predominantly from the adult population. However, the shape of the lungs is reported to be significantly different across the developmental stages from infancy to adulthood. This might result in age-related data domain shifts that would adversely impact lung segmentation performance when the models trained on the adult population are deployed for pediatric lung segmentation. In this work, our goal is to (i) analyze the generalizability of deep adult lung segmentation models to the pediatric population and (ii) improve performance through a stage-wise, systematic approach consisting of CXR modality-specific weight initializations, stacked ensembles, and an ensemble of stacked ensembles. To evaluate segmentation performance and generalizability, novel evaluation metrics consisting of mean lung contour distance (MLCD) and average hash score (AHS) are proposed in addition to the multi-scale structural similarity index measure (MS-SSIM), the intersection of union (IoU), Dice score, 95% Hausdorff distance (HD95), and average symmetric surface distance (ASSD). Our results showed a significant improvement (p < 0.05) in cross-domain generalization through our approach. This study could serve as a paradigm to analyze the cross-domain generalizability of deep segmentation models for other medical imaging modalities and applications.

7.
IEEE J Biomed Health Inform ; 27(11): 5260-5271, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37440405

RESUMEN

Despite the promising performance of automated pain assessment methods, current methods suffer from performance generalization due to the lack of relatively large, diverse, and annotated pain datasets. Further, the majority of current methods do not allow responsible interaction between the model and user, and do not take different internal and external factors into consideration during the model's design and development. This article aims to provide an efficient cooperative learning framework for the lack of annotated data while facilitating responsible user communication and taking individual differences into consideration during the development of pain assessment models. Our results using body and muscle movement data, collected from wearable devices, demonstrate that the proposed framework is effective in leveraging both the human and the machine to efficiently learn and predict pain.


Asunto(s)
Aprendizaje Automático , Dispositivos Electrónicos Vestibles , Humanos , Dimensión del Dolor , Dolor
8.
IEEE Access ; 11: 21300-21312, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37008654

RESUMEN

Artificial Intelligence (AI)-based medical computer vision algorithm training and evaluations depend on annotations and labeling. However, variability between expert annotators introduces noise in training data that can adversely impact the performance of AI algorithms. This study aims to assess, illustrate and interpret the inter-annotator agreement among multiple expert annotators when segmenting the same lesion(s)/abnormalities on medical images. We propose the use of three metrics for the qualitative and quantitative assessment of inter-annotator agreement: 1) use of a common agreement heatmap and a ranking agreement heatmap; 2) use of the extended Cohen's kappa and Fleiss' kappa coefficients for a quantitative evaluation and interpretation of inter-annotator reliability; and 3) use of the Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm, as a parallel step, to generate ground truth for training AI models and compute Intersection over Union (IoU), sensitivity, and specificity to assess the inter-annotator reliability and variability. Experiments are performed on two datasets, namely cervical colposcopy images from 30 patients and chest X-ray images from 336 tuberculosis (TB) patients, to demonstrate the consistency of inter-annotator reliability assessment and the importance of combining different metrics to avoid bias assessment.

9.
Diagnostics (Basel) ; 13(6)2023 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-36980375

RESUMEN

Domain shift is one of the key challenges affecting reliability in medical imaging-based machine learning predictions. It is of significant importance to investigate this issue to gain insights into its characteristics toward determining controllable parameters to minimize its impact. In this paper, we report our efforts on studying and analyzing domain shift in lung region detection in chest radiographs. We used five chest X-ray datasets, collected from different sources, which have manual markings of lung boundaries in order to conduct extensive experiments toward this goal. We compared the characteristics of these datasets from three aspects: information obtained from metadata or an image header, image appearance, and features extracted from a pretrained model. We carried out experiments to evaluate and compare model performances within each dataset and across datasets in four scenarios using different combinations of datasets. We proposed a new feature visualization method to provide explanations for the applied object detection network on the obtained quantitative results. We also examined chest X-ray modality-specific initialization, catastrophic forgetting, and model repeatability. We believe the observations and discussions presented in this work could help to shed some light on the importance of the analysis of training data for medical imaging machine learning research, and could provide valuable guidance for domain shift analysis.

10.
Diagnostics (Basel) ; 13(4)2023 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-36832235

RESUMEN

Deep learning (DL) models are state-of-the-art in segmenting anatomical and disease regions of interest (ROIs) in medical images. Particularly, a large number of DL-based techniques have been reported using chest X-rays (CXRs). However, these models are reportedly trained on reduced image resolutions for reasons related to the lack of computational resources. Literature is sparse in discussing the optimal image resolution to train these models for segmenting the tuberculosis (TB)-consistent lesions in CXRs. In this study, we investigated the performance variations with an Inception-V3 UNet model using various image resolutions with/without lung ROI cropping and aspect ratio adjustments and identified the optimal image resolution through extensive empirical evaluations to improve TB-consistent lesion segmentation performance. We used the Shenzhen CXR dataset for the study, which includes 326 normal patients and 336 TB patients. We proposed a combinatorial approach consisting of storing model snapshots, optimizing segmentation threshold and test-time augmentation (TTA), and averaging the snapshot predictions, to further improve performance with the optimal resolution. Our experimental results demonstrate that higher image resolutions are not always necessary; however, identifying the optimal image resolution is critical to achieving superior performance.

11.
ArXiv ; 2023 Jan 27.
Artículo en Inglés | MEDLINE | ID: mdl-36789135

RESUMEN

Deep learning (DL) models are state-of-the-art in segmenting anatomical and disease regions of interest (ROIs) in medical images. Particularly, a large number of DL-based techniques have been reported using chest X-rays (CXRs). However, these models are reportedly trained on reduced image resolutions for reasons related to the lack of computational resources. Literature is sparse in discussing the optimal image resolution to train these models for segmenting the Tuberculosis (TB)-consistent lesions in CXRs. In this study, we investigated the performance variations using an Inception-V3 UNet model using various image resolutions with/without lung ROI cropping and aspect ratio adjustments, and (ii) identified the optimal image resolution through extensive empirical evaluations to improve TB-consistent lesion segmentation performance. We used the Shenzhen CXR dataset for the study which includes 326 normal patients and 336 TB patients. We proposed a combinatorial approach consisting of storing model snapshots, optimizing segmentation threshold and test-time augmentation (TTA), and averaging the snapshot predictions, to further improve performance with the optimal resolution. Our experimental results demonstrate that higher image resolutions are not always necessary, however, identifying the optimal image resolution is critical to achieving superior performance.

12.
Artículo en Inglés | MEDLINE | ID: mdl-36780238

RESUMEN

Research in Artificial Intelligence (AI)-based medical computer vision algorithms bear promises to improve disease screening, diagnosis, and subsequently patient care. However, these algorithms are highly impacted by the characteristics of the underlying data. In this work, we discuss various data characteristics, namely Volume, Veracity, Validity, Variety, and Velocity, that impact the design, reliability, and evolution of machine learning in medical computer vision. Further, we discuss each characteristic and the recent works conducted in our research lab that informed our understanding of the impact of these characteristics on the design of medical decision-making algorithms and outcome reliability.

13.
Bioengineering (Basel) ; 9(9)2022 Aug 24.
Artículo en Inglés | MEDLINE | ID: mdl-36134959

RESUMEN

Automated segmentation of tuberculosis (TB)-consistent lesions in chest X-rays (CXRs) using deep learning (DL) methods can help reduce radiologist effort, supplement clinical decision-making, and potentially result in improved patient treatment. The majority of works in the literature discuss training automatic segmentation models using coarse bounding box annotations. However, the granularity of the bounding box annotation could result in the inclusion of a considerable fraction of false positives and negatives at the pixel level that may adversely impact overall semantic segmentation performance. This study evaluates the benefits of using fine-grained annotations of TB-consistent lesions toward training the variants of U-Net models and constructing their ensembles for semantically segmenting TB-consistent lesions in both original and bone-suppressed frontal CXRs. The segmentation performance is evaluated using several ensemble methods such as bitwise- AND, bitwise-OR, bitwise-MAX, and stacking. Extensive empirical evaluations showcased that the stacking ensemble demonstrated superior segmentation performance (Dice score: 0.5743, 95% confidence interval: (0.4055, 0.7431)) compared to the individual constituent models and other ensemble methods. To the best of our knowledge, this is the first study to apply ensemble learning to improve fine-grained TB-consistent lesion segmentation performance.

14.
Med Image Anal ; 80: 102438, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35868819

RESUMEN

Deep learning has a huge potential to transform echocardiography in clinical practice and point of care ultrasound testing by providing real-time analysis of cardiac structure and function. Automated echocardiography analysis is benefited through use of machine learning for tasks such as image quality assessment, view classification, cardiac region segmentation, and quantification of diagnostic indices. By taking advantage of high-performing deep neural networks, we propose a novel and eicient real-time system for echocardiography analysis and quantification. Our system uses a self-supervised modality-specific representation trained using a publicly available large-scale dataset. The trained representation is used to enhance the learning of target echo tasks with relatively small datasets. We also present a novel Trilateral Attention Network (TaNet) for real-time cardiac region segmentation. The proposed network uses a module for region localization and three lightweight pathways for encoding rich low-level, textural, and high-level features. Feature embeddings from these individual pathways are then aggregated for cardiac region segmentation. This network is fine-tuned using a joint loss function and training strategy. We extensively evaluate the proposed system and its components, which are echo view retrieval, cardiac segmentation, and quantification, using four echocardiography datasets. Our experimental results show a consistent improvement in the performance of echocardiography analysis tasks with enhanced computational eiciency that charts a path toward its adoption in clinical practice. Specifically, our results show superior real-time performance in retrieving good quality echo from individual cardiac view, segmenting cardiac chambers with complex overlaps, and extracting cardiac indices that highly agree with the experts' values. The source code of our implementation can be found in the project's GitHub page.


Asunto(s)
Ecocardiografía , Procesamiento de Imagen Asistido por Computador , Ecocardiografía/métodos , Corazón/diagnóstico por imagen , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Redes Neurales de la Computación
15.
Biomedicines ; 10(6)2022 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-35740345

RESUMEN

Deep learning (DL) methods have demonstrated superior performance in medical image segmentation tasks. However, selecting a loss function that conforms to the data characteristics is critical for optimal performance. Further, the direct use of traditional DL models does not provide a measure of uncertainty in predictions. Even high-quality automated predictions for medical diagnostic applications demand uncertainty quantification to gain user trust. In this study, we aim to investigate the benefits of (i) selecting an appropriate loss function and (ii) quantifying uncertainty in predictions using a VGG16-based-U-Net model with the Monto-Carlo (MCD) Dropout method for segmenting Tuberculosis (TB)-consistent findings in frontal chest X-rays (CXRs). We determine an optimal uncertainty threshold based on several uncertainty-related metrics. This threshold is used to select and refer highly uncertain cases to an expert. Experimental results demonstrate that (i) the model trained with a modified Focal Tversky loss function delivered superior segmentation performance (mean average precision (mAP): 0.5710, 95% confidence interval (CI): (0.4021,0.7399)), (ii) the model with 30 MC forward passes during inference further improved and stabilized performance (mAP: 0.5721, 95% CI: (0.4032,0.7410), and (iii) an uncertainty threshold of 0.7 is observed to be optimal to refer highly uncertain cases.

16.
Front Genet ; 13: 864724, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35281798

RESUMEN

Research on detecting Tuberculosis (TB) findings on chest radiographs (or Chest X-rays: CXR) using convolutional neural networks (CNNs) has demonstrated superior performance due to the emergence of publicly available, large-scale datasets with expert annotations and availability of scalable computational resources. However, these studies use only the frontal CXR projections, i.e., the posterior-anterior (PA), and the anterior-posterior (AP) views for analysis and decision-making. Lateral CXRs which are heretofore not studied help detect clinically suspected pulmonary TB, particularly in children. Further, Vision Transformers (ViTs) with built-in self-attention mechanisms have recently emerged as a viable alternative to the traditional CNNs. Although ViTs demonstrated notable performance in several medical image analysis tasks, potential limitations exist in terms of performance and computational efficiency, between the CNN and ViT models, necessitating a comprehensive analysis to select appropriate models for the problem under study. This study aims to detect TB-consistent findings in lateral CXRs by constructing an ensemble of the CNN and ViT models. Several models are trained on lateral CXR data extracted from two large public collections to transfer modality-specific knowledge and fine-tune them for detecting findings consistent with TB. We observed that the weighted averaging ensemble of the predictions of CNN and ViT models using the optimal weights computed with the Sequential Least-Squares Quadratic Programming method delivered significantly superior performance (MCC: 0.8136, 95% confidence intervals (CI): 0.7394, 0.8878, p < 0.05) compared to the individual models and other ensembles. We also interpreted the decisions of CNN and ViT models using class-selective relevance maps and attention maps, respectively, and combined them to highlight the discriminative image regions contributing to the final output. We observed that (i) the model accuracy is not related to disease region of interest (ROI) localization and (ii) the bitwise-AND of the heatmaps of the top-2-performing models delivered significantly superior ROI localization performance in terms of mean average precision [mAP@(0.1 0.6) = 0.1820, 95% CI: 0.0771,0.2869, p < 0.05], compared to other individual models and ensembles. The code is available at https://github.com/sivaramakrishnan-rajaraman/Ensemble-of-CNN-and-ViT-for-TB-detection-in-lateral-CXR.

17.
Med Image Comput Comput Assist Interv ; 13433: 749-759, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-36939418

RESUMEN

Artificial Intelligence (AI)-based methods allow for automatic assessment of pain intensity based on continuous monitoring and processing of subtle changes in sensory signals, including facial expression, body movements, and crying frequency. Currently, there is a large and growing need for expanding current AI-based approaches to the assessment of postoperative pain in the neonatal intensive care unit (NICU). In contrast to acute procedural pain in the clinic, the NICU has neonates emerging from postoperative sedation, usually intubated, and with variable energy reserves for manifesting forceful pain responses. Here, we present a novel multi-modal approach designed, developed, and validated for assessment of neonatal postoperative pain in the challenging NICU setting. Our approach includes a robust network capable of efficient reconstruction of missing modalities (e.g., obscured facial expression due to intubation) using an unsupervised spatio-temporal feature learning with a generative model for learning the joint features. Our approach generates the final pain score along with the intensity using an attentional cross-modal feature fusion. Using experimental dataset from postoperative neonates in the NICU, our pain assessment approach achieves superior performance (AUC 0.906, accuracy 0.820) as compared to the state-of-the-art approaches.

18.
Artículo en Inglés | MEDLINE | ID: mdl-36860349

RESUMEN

Existing works for automated echocardiography view classification are designed under the assumption that the views in the testing set must belong to a limited number of views that have appeared in the training set. Such a design is called closed world classification. This assumption may be too strict for real-world environments that are open and often have unseen examples, drastically weakening the robustness of the classical view classification approaches. In this work, we developed an open world active learning approach for echocardiography view classification, where the network classifies images of known views into their respective classes and identifies images of unknown views. Then, a clustering approach is used to cluster the unknown views into various groups to be labeled by echocardiologists. Finally, the new labeled samples are added to the initial set of known views and used to update the classification network. This process of actively labeling unknown clusters and integrating them into the classification model significantly increases the efficiency of data labeling and the robustness of the classifier. Our results using an echocardiography dataset containing known and unknown views showed the superiority of the proposed approach as compared to the closed world view classification approaches.

19.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 4115-4119, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34892132

RESUMEN

Topological Data Analysis (TDA) has emerged recently as a robust tool to extract and compare the structure of datasets. TDA identifies features in data (e.g., connected components and holes) and assigns a quantitative measure to these features. Several studies reported that topological features extracted by TDA tools provide unique information about the data, discover new insights, and determine which feature is more related to the outcome. On the other hand, the overwhelming success of deep neural networks in learning patterns and relationships has been proven on various data applications including images. To capture the characteristics of both worlds, we propose TDA-Net, a novel ensemble network that fuses topological and deep features for the purpose of enhancing model generalizability and accuracy. We apply the proposed TDA-Net to a critical application, which is the automated detection of COVID-19 from CXR images. Experimental results showed that the proposed network achieved excellent performance and suggested the applicability of our method in practice.


Asunto(s)
COVID-19 , Aprendizaje Profundo , Análisis de Datos , Humanos , SARS-CoV-2 , Rayos X
20.
PLoS One ; 16(12): e0261307, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34968393

RESUMEN

Medical images commonly exhibit multiple abnormalities. Predicting them requires multi-class classifiers whose training and desired reliable performance can be affected by a combination of factors, such as, dataset size, data source, distribution, and the loss function used to train deep neural networks. Currently, the cross-entropy loss remains the de-facto loss function for training deep learning classifiers. This loss function, however, asserts equal learning from all classes, leading to a bias toward the majority class. Although the choice of the loss function impacts model performance, to the best of our knowledge, we observed that no literature exists that performs a comprehensive analysis and selection of an appropriate loss function toward the classification task under study. In this work, we benchmark various state-of-the-art loss functions, critically analyze model performance, and propose improved loss functions for a multi-class classification task. We select a pediatric chest X-ray (CXR) dataset that includes images with no abnormality (normal), and those exhibiting manifestations consistent with bacterial and viral pneumonia. We construct prediction-level and model-level ensembles to improve classification performance. Our results show that compared to the individual models and the state-of-the-art literature, the weighted averaging of the predictions for top-3 and top-5 model-level ensembles delivered significantly superior classification performance (p < 0.05) in terms of MCC (0.9068, 95% confidence interval (0.8839, 0.9297)) metric. Finally, we performed localization studies to interpret model behavior and confirm that the individual models and ensembles learned task-specific features and highlighted disease-specific regions of interest. The code is available at https://github.com/sivaramakrishnan-rajaraman/multiloss_ensemble_models.


Asunto(s)
Algoritmos , Diagnóstico por Imagen , Procesamiento de Imagen Asistido por Computador/clasificación , Área Bajo la Curva , Entropía , Humanos , Pulmón/diagnóstico por imagen , Curva ROC , Tórax/diagnóstico por imagen , Rayos X
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...