Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 133
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Expert Syst Appl ; 229(Pt A)2023 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-37397242

RESUMEN

Lung segmentation in chest X-rays (CXRs) is an important prerequisite for improving the specificity of diagnoses of cardiopulmonary diseases in a clinical decision support system. Current deep learning models for lung segmentation are trained and evaluated on CXR datasets in which the radiographic projections are captured predominantly from the adult population. However, the shape of the lungs is reported to be significantly different across the developmental stages from infancy to adulthood. This might result in age-related data domain shifts that would adversely impact lung segmentation performance when the models trained on the adult population are deployed for pediatric lung segmentation. In this work, our goal is to (i) analyze the generalizability of deep adult lung segmentation models to the pediatric population and (ii) improve performance through a stage-wise, systematic approach consisting of CXR modality-specific weight initializations, stacked ensembles, and an ensemble of stacked ensembles. To evaluate segmentation performance and generalizability, novel evaluation metrics consisting of mean lung contour distance (MLCD) and average hash score (AHS) are proposed in addition to the multi-scale structural similarity index measure (MS-SSIM), the intersection of union (IoU), Dice score, 95% Hausdorff distance (HD95), and average symmetric surface distance (ASSD). Our results showed a significant improvement (p < 0.05) in cross-domain generalization through our approach. This study could serve as a paradigm to analyze the cross-domain generalizability of deep segmentation models for other medical imaging modalities and applications.

2.
Int J Cancer ; 150(5): 741-752, 2022 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-34800038

RESUMEN

There is limited access to effective cervical cancer screening programs in many resource-limited settings, resulting in continued high cervical cancer burden. Human papillomavirus (HPV) testing is increasingly recognized to be the preferable primary screening approach if affordable due to superior long-term reassurance when negative and adaptability to self-sampling. Visual inspection with acetic acid (VIA) is an inexpensive but subjective and inaccurate method widely used in resource-limited settings, either for primary screening or for triage of HPV-positive individuals. A deep learning (DL)-based automated visual evaluation (AVE) of cervical images has been developed to help improve the accuracy and reproducibility of VIA as assistive technology. However, like any new clinical technology, rigorous evaluation and proof of clinical effectiveness are required before AVE is implemented widely. In the current article, we outline essential clinical and technical considerations involved in building a validated DL-based AVE tool for broad use as a clinical test.


Asunto(s)
Aprendizaje Profundo , Detección Precoz del Cáncer/métodos , Neoplasias del Cuello Uterino/diagnóstico , Algoritmos , Femenino , Humanos , Papillomaviridae/aislamiento & purificación , Reproducibilidad de los Resultados , Neoplasias del Cuello Uterino/virología
3.
Gynecol Oncol ; 167(1): 89-95, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36008184

RESUMEN

OBJECTIVE: Colposcopy is an important part of cervical screening/management programs. Colposcopic appearance is often classified, for teaching and telemedicine, based on static images that do not reveal the dynamics of acetowhitening. We compared the accuracy and reproducibility of colposcopic impression based on a single image at one minute after application of acetic acid versus a time-series of 17 sequential images over two minutes. METHODS: Approximately 5000 colposcopic examinations conducted with the DYSIS colposcopic system were divided into 10 random sets, each assigned to a separate expert colposcopist. Colposcopists first classified single two-dimensional images at one minute and then a time-series of 17 sequential images as 'normal,' 'indeterminate,' 'high grade,' or 'cancer'. Ratings were compared to histologic diagnoses. Additionally, 5 colposcopists reviewed a subset of 200 single images and 200 time series to estimate intra- and inter-rater reliability. RESULTS: Of 4640 patients with adequate images, only 24.4% were correctly categorized by single image visual assessment (11% of 64 cancers; 31% of 605 CIN3; 22.4% of 558 CIN2; 23.9% of 3412 < CIN2). Individual colposcopist accuracy was low; Youden indices (sensitivity plus specificity minus one) ranged from 0.07 to 0.24. Use of the time-series increased the proportion of images classified as normal, regardless of histology. Intra-rater reliability was substantial (weighted kappa = 0.64); inter-rater reliability was fair ( weighted kappa = 0.26). CONCLUSION: Substantial variation exists in visual assessment of colposcopic images, even when a 17-image time series showing the two-minute process of acetowhitening is presented. We are currently evaluating whether deep-learning image evaluation can assist classification.


Asunto(s)
Displasia del Cuello del Útero , Neoplasias del Cuello Uterino , Colposcopía/métodos , Detección Precoz del Cáncer , Femenino , Humanos , Embarazo , Reproducibilidad de los Resultados , Factores de Tiempo , Neoplasias del Cuello Uterino/diagnóstico por imagen , Neoplasias del Cuello Uterino/patología , Displasia del Cuello del Útero/diagnóstico por imagen , Displasia del Cuello del Útero/patología
4.
Sensors (Basel) ; 22(24)2022 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-36559994

RESUMEN

We propose a new generative model named adaptive cycle-consistent generative adversarial network, or Ad CycleGAN to perform image translation between normal and COVID-19 positive chest X-ray images. An independent pre-trained criterion is added to the conventional Cycle GAN architecture to exert adaptive control on image translation. The performance of Ad CycleGAN is compared with the Cycle GAN without the external criterion. The quality of the synthetic images is evaluated by quantitative metrics including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Peak Signal-to-Noise Ratio (PSNR), Universal Image Quality Index (UIQI), visual information fidelity (VIF), Frechet Inception Distance (FID), and translation accuracy. The experimental results indicate that the synthetic images generated either by the Cycle GAN or by the Ad CycleGAN have lower MSE and RMSE, and higher scores in PSNR, UIQI, and VIF in homogenous image translation (i.e., Y → Y) compared to the heterogenous image translation process (i.e., X → Y). The synthetic images by Ad CycleGAN through the heterogeneous image translation have significantly higher FID score compared to Cycle GAN (p < 0.01). The image translation accuracy of Ad CycleGAN is higher than that of Cycle GAN when normal images are converted to COVID-19 positive images (p < 0.01). Therefore, we conclude that the Ad CycleGAN with the independent criterion can improve the accuracy of GAN image translation. The new architecture has more control on image synthesis and can help address the common class imbalance issue in machine learning methods and artificial intelligence applications with medical images.


Asunto(s)
Inteligencia Artificial , COVID-19 , Humanos , Rayos X , Procesamiento de Imagen Asistido por Computador/métodos , COVID-19/diagnóstico por imagen , Aprendizaje Automático
5.
J Med Syst ; 46(11): 82, 2022 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-36241922

RESUMEN

There has been an explosive growth in research over the last decade exploring machine learning techniques for analyzing chest X-ray (CXR) images for screening cardiopulmonary abnormalities. In particular, we have observed a strong interest in screening for tuberculosis (TB). This interest has coincided with the spectacular advances in deep learning (DL) that is primarily based on convolutional neural networks (CNNs). These advances have resulted in significant research contributions in DL techniques for TB screening using CXR images. We review the research studies published over the last five years (2016-2021). We identify data collections, methodical contributions, and highlight promising methods and challenges. Further, we discuss and compare studies and identify those that offer extension beyond binary decisions for TB, such as region-of-interest localization. In total, we systematically review 54 peer-reviewed research articles and perform meta-analysis.


Asunto(s)
Aprendizaje Profundo , Tuberculosis , Humanos , Redes Neurales de la Computación , Radiografía , Tuberculosis/diagnóstico por imagen , Rayos X
6.
Int J Cancer ; 147(9): 2416-2423, 2020 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-32356305

RESUMEN

We examined whether automated visual evaluation (AVE), a deep learning computer application for cervical cancer screening, can be used on cervix images taken by a contemporary smartphone camera. A large number of cervix images acquired by the commercial MobileODT EVA system were filtered for acceptable visual quality and then 7587 filtered images from 3221 women were annotated by a group of gynecologic oncologists (so the gold standard is an expert impression, not histopathology). We tested and analyzed on multiple random splits of the images using two deep learning, object detection networks. For all the receiver operating characteristics curves, the area under the curve values for the discrimination of the most likely precancer cases from least likely cases (most likely controls) were above 0.90. These results showed that AVE can classify cervix images with confidence scores that are strongly associated with expert evaluations of severity for the same images. The results on a small subset of images that have histopathologic diagnoses further supported the capability of AVE for predicting cervical precancer. We examined the associations of AVE severity score with gynecologic oncologist impression at all regions where we had a sufficient number of cases and controls, and the influence of a woman's age. The method was found generally resilient to regional variation in the appearance of the cervix. This work suggests that using AVE on smartphones could be a useful adjunct to health-worker visual assessment with acetic acid, a cervical cancer screening method commonly used in low- and middle-resource settings.


Asunto(s)
Cuello del Útero/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Tamizaje Masivo/instrumentación , Teléfono Inteligente/economía , Displasia del Cuello del Útero/diagnóstico , Neoplasias del Cuello Uterino/prevención & control , Biopsia , Cuello del Útero/patología , Conjuntos de Datos como Asunto , Aprendizaje Profundo , Diagnóstico Diferencial , Femenino , Humanos , Tamizaje Masivo/economía , Tamizaje Masivo/métodos , Curva ROC , Displasia del Cuello del Útero/patología , Neoplasias del Cuello Uterino/diagnóstico , Neoplasias del Cuello Uterino/patología
7.
BMC Infect Dis ; 20(1): 825, 2020 Nov 11.
Artículo en Inglés | MEDLINE | ID: mdl-33176716

RESUMEN

BACKGROUND: Light microscopy is often used for malaria diagnosis in the field. However, it is time-consuming and quality of the results depends heavily on the skill of microscopists. Automating malaria light microscopy is a promising solution, but it still remains a challenge and an active area of research. Current tools are often expensive and involve sophisticated hardware components, which makes it hard to deploy them in resource-limited areas. RESULTS: We designed an Android mobile application called Malaria Screener, which makes smartphones an affordable yet effective solution for automated malaria light microscopy. The mobile app utilizes high-resolution cameras and computing power of modern smartphones to screen both thin and thick blood smear images for P. falciparum parasites. Malaria Screener combines image acquisition, smear image analysis, and result visualization in its slide screening process, and is equipped with a database to provide easy access to the acquired data. CONCLUSION: Malaria Screener makes the screening process faster, more consistent, and less dependent on human expertise. The app is modular, allowing other research groups to integrate their methods and models for image processing and machine learning, while acquiring and analyzing their data.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Malaria Falciparum/diagnóstico por imagen , Tamizaje Masivo/métodos , Microscopía/métodos , Plasmodium falciparum/aislamiento & purificación , Teléfono Inteligente , Exactitud de los Datos , Humanos , Aprendizaje Automático , Malaria Falciparum/parasitología , Sensibilidad y Especificidad , Programas Informáticos
8.
J Med Syst ; 42(8): 146, 2018 Jun 29.
Artículo en Inglés | MEDLINE | ID: mdl-29959539

RESUMEN

To detect pulmonary abnormalities such as Tuberculosis (TB), an automatic analysis and classification of chest radiographs can be used as a reliable alternative to more sophisticated and technologically demanding methods (e.g. culture or sputum smear analysis). In target areas like Kenya TB is highly prevalent and often co-occurring with HIV combined with low resources and limited medical assistance. In these regions an automatic screening system can provide a cost-effective solution for a large rural population. Our completely automatic TB screening system is processing the incoming CXRs (chest X-ray) by applying image preprocessing techniques to enhance the image quality followed by an adaptive segmentation based on model selection. The delineated lung regions are described by a multitude of image features. These characteristics are than optimized by a feature selection strategy to provide the best description for the classifier, which will later decide if the analyzed image is normal or abnormal. Our goal is to find the optimal feature set from a larger pool of generic image features, -used originally for problems such as object detection, image retrieval, etc. For performance evaluation measures such as under the curve (AUC) and accuracy (ACC) were considered. Using a neural network classifier on two publicly available data collections, -namely the Montgomery and the Shenzhen dataset, we achieved the maximum area under the curve and accuracy of 0.99 and 97.03%, respectively. Further, we compared our results with existing state-of-the-art systems and to radiologists' decision.


Asunto(s)
Algoritmos , Radiografía , Tuberculosis/diagnóstico por imagen , Automatización , Humanos , Tamizaje Masivo , Esputo
9.
Pattern Recognit ; 63: 468-475, 2017 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-28603299

RESUMEN

Cervical cancer is one of the most common types of cancer in women worldwide. Most deaths due to the disease occur in less developed areas of the world. In this work, we introduce a new image dataset along with expert annotated diagnoses for evaluating image-based cervical disease classification algorithms. A large number of Cervigram® images are selected from a database provided by the US National Cancer Institute. For each image, we extract three complementary pyramid features: Pyramid histogram in L*A*B* color space (PLAB), Pyramid Histogram of Oriented Gradients (PHOG), and Pyramid histogram of Local Binary Patterns (PLBP). Other than hand-crafted pyramid features, we investigate the performance of convolutional neural network (CNN) features for cervical disease classification. Our experimental results demonstrate the effectiveness of both our hand-crafted and our deep features. We intend to release this multi-feature dataset and our extensive evaluations using seven classic classifiers can serve as the baseline.

10.
PLOS Digit Health ; 3(1): e0000286, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38232121

RESUMEN

Model initialization techniques are vital for improving the performance and reliability of deep learning models in medical computer vision applications. While much literature exists on non-medical images, the impacts on medical images, particularly chest X-rays (CXRs) are less understood. Addressing this gap, our study explores three deep model initialization techniques: Cold-start, Warm-start, and Shrink and Perturb start, focusing on adult and pediatric populations. We specifically focus on scenarios with periodically arriving data for training, thereby embracing the real-world scenarios of ongoing data influx and the need for model updates. We evaluate these models for generalizability against external adult and pediatric CXR datasets. We also propose novel ensemble methods: F-score-weighted Sequential Least-Squares Quadratic Programming (F-SLSQP) and Attention-Guided Ensembles with Learnable Fuzzy Softmax to aggregate weight parameters from multiple models to capitalize on their collective knowledge and complementary representations. We perform statistical significance tests with 95% confidence intervals and p-values to analyze model performance. Our evaluations indicate models initialized with ImageNet-pretrained weights demonstrate superior generalizability over randomly initialized counterparts, contradicting some findings for non-medical images. Notably, ImageNet-pretrained models exhibit consistent performance during internal and external testing across different training scenarios. Weight-level ensembles of these models show significantly higher recall (p<0.05) during testing compared to individual models. Thus, our study accentuates the benefits of ImageNet-pretrained weight initialization, especially when used with weight-level ensembles, for creating robust and generalizable deep learning solutions.

11.
Comput Med Imaging Graph ; 115: 102379, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38608333

RESUMEN

Deep learning (DL) has demonstrated its innate capacity to independently learn hierarchical features from complex and multi-dimensional data. A common understanding is that its performance scales up with the amount of training data. However, the data must also exhibit variety to enable improved learning. In medical imaging data, semantic redundancy, which is the presence of similar or repetitive information, can occur due to the presence of multiple images that have highly similar presentations for the disease of interest. Also, the common use of augmentation methods to generate variety in DL training could limit performance when indiscriminately applied to such data. We hypothesize that semantic redundancy would therefore tend to lower performance and limit generalizability to unseen data and question its impact on classifier performance even with large data. We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data and demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set, during both internal (recall: 0.7164 vs 0.6597, p<0.05) and external testing (recall: 0.3185 vs 0.2589, p<0.05). Our findings emphasize the importance of information-oriented training sample selection as opposed to the conventional practice of using all available training data.


Asunto(s)
Aprendizaje Profundo , Radiografía Torácica , Semántica , Humanos
12.
J Natl Cancer Inst ; 116(1): 26-33, 2024 01 10.
Artículo en Inglés | MEDLINE | ID: mdl-37758250

RESUMEN

Novel screening and diagnostic tests based on artificial intelligence (AI) image recognition algorithms are proliferating. Some initial reports claim outstanding accuracy followed by disappointing lack of confirmation, including our own early work on cervical screening. This is a presentation of lessons learned, organized as a conceptual step-by-step approach to bridge the gap between the creation of an AI algorithm and clinical efficacy. The first fundamental principle is specifying rigorously what the algorithm is designed to identify and what the test is intended to measure (eg, screening, diagnostic, or prognostic). Second, designing the AI algorithm to minimize the most clinically important errors. For example, many equivocal cervical images cannot yet be labeled because the borderline between cases and controls is blurred. To avoid a misclassified case-control dichotomy, we have isolated the equivocal cases and formally included an intermediate, indeterminate class (severity order of classes: case>indeterminate>control). The third principle is evaluating AI algorithms like any other test, using clinical epidemiologic criteria. Repeatability of the algorithm at the borderline, for indeterminate images, has proven extremely informative. Distinguishing between internal and external validation is also essential. Linking the AI algorithm results to clinical risk estimation is the fourth principle. Absolute risk (not relative) is the critical metric for translating a test result into clinical use. Finally, generating risk-based guidelines for clinical use that match local resources and priorities is the last principle in our approach. We are particularly interested in applications to lower-resource settings to address health disparities. We note that similar principles apply to other domains of AI-based image analysis for medical diagnostic testing.


Asunto(s)
Inteligencia Artificial , Neoplasias del Cuello Uterino , Femenino , Humanos , Detección Precoz del Cáncer , Neoplasias del Cuello Uterino/diagnóstico , Algoritmos , Procesamiento de Imagen Asistido por Computador
13.
Artículo en Inglés | MEDLINE | ID: mdl-38774479

RESUMEN

For deep learning-based machine learning, not only are large and sufficiently diverse data crucial but their good qualities are equally important. However, in real-world applications, it is very common that raw source data may contain incorrect, noisy, inconsistent, improperly formatted and sometimes missing elements, particularly, when the datasets are large and sourced from many sites. In this paper, we present our work towards preparing and making image data ready for the development of AI-driven approaches for studying various aspects of the natural history of oral cancer. Specifically, we focus on two aspects: 1) cleaning the image data; and 2) extracting the annotation information. Data cleaning includes removing duplicates, identifying missing data, correcting errors, standardizing data sets, and removing personal sensitive information, toward combining data sourced from different study sites. These steps are often collectively referred to as data harmonization. Annotation information extraction includes identifying crucial or valuable texts that are manually entered by clinical providers related to the image paths/names and standardizing of the texts of labels. Both are important for the successful deep learning algorithm development and data analyses. Specifically, we provide details on the data under consideration, describe the challenges and issues we observed that motivated our work, present specific approaches and methods that we used to clean and standardize the image data and extract labelling information. Further, we discuss the ways to increase efficiency of the process and the lessons learned. Research ideas on automating the process with ML-driven techniques are also presented and discussed. Our intent in reporting and discussing such work in detail is to help provide insights in automating or, minimally, increasing the efficiency of these critical yet often under-reported processes.

14.
Front ICT Healthc (2002) ; 519: 679-688, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37396668

RESUMEN

Cervical cancer is a significant disease affecting women worldwide. Regular cervical examination with gynecologists is important for early detection and treatment planning for women with precancers. Precancer is the direct precursor to cervical cancer. However, there is a scarcity of experts and the experts' assessments are subject to variations in interpretation. In this scenario, the development of a robust automated cervical image classification system is important to augment the experts' limitations. Ideally, for such a system the class label prediction will vary according to the cervical inspection objectives. Hence, the labeling criteria may not be the same in the cervical image datasets. Moreover, due to the lack of confirmatory test results and inter-rater labeling variation, many images are left unlabeled. Motivated by these challenges, we propose to develop a pretrained cervix model from heterogeneous and partially labeled cervical image datasets. Self-supervised Learning (SSL) is employed to build the cervical model. Further, considering data-sharing restrictions, we show how federated self-supervised learning (FSSL) can be employed to develop a cervix model without sharing the cervical images. The task-specific classification models are developed by fine-tuning the cervix model. Two partially labeled cervical image datasets labeled with different classification criteria are used in this study. According to our experimental study, the cervix model prepared with dataset-specific SSL boosts classification accuracy by 2.5%↑ than ImageNet pretrained model. The classification accuracy is further boosted by 1.5%↑ when images from both datasets are combined for SSL. We see that in comparison with the dataset-specific cervix model developed with SSL, the FSSL is performing better.

15.
Mil Med ; 2023 Oct 20.
Artículo en Inglés | MEDLINE | ID: mdl-37864817

RESUMEN

The success of deep-learning algorithms in analyzing complex structured and unstructured multidimensional data has caused an exponential increase in the amount of research devoted to the applications of artificial intelligence (AI) in medicine in the past decade. Public release of large language models like ChatGPT the past year has generated an unprecedented storm of excitement and rumors of machine intelligence finally reaching or even surpassing human capability in detecting meaningful signals in complex multivariate data. Such enthusiasm, however, is met with an equal degree of both skepticism and fear over the social, legal, and moral implications of such powerful technology with relatively little safeguards or regulations on its development. The question remains in medicine of how to harness the power of AI to improve patient outcomes by increasing the diagnostic accuracy and treatment precision provided by medical professionals. Military medicine, given its unique mission and resource constraints,can benefit immensely from such technology. However, reaping such benefits hinges on the ability of the rising generations of military medical professionals to understand AI algorithms and their applications. Additionally, they should strongly consider working with them as an adjunct decision-maker and view them as a colleague to access and harness relevant information as opposed to something to be feared. Ideas expressed in this commentary were formulated by a military medical student during a two-month research elective working on a multidisciplinary team of computer scientists and clinicians at the National Library of Medicine advancing the state of the art of AI in medicine. A motivation to incorporate AI in the Military Health System is provided, including examples of applications in military medicine. Rationale is then given for inclusion of AI in education starting in medical school as well as a prudent implementation of these algorithms in a clinical workflow during graduate medical education. Finally, barriers to implementation are addressed along with potential solutions. The end state is not that rising military physicians are technical experts in AI; but rather that they understand how they can leverage its rapidly evolving capabilities to prepare for a future where AI will have a significant role in clinical care. The overall goal is to develop trained clinicians that can leverage these technologies to improve the Military Health System.

16.
Diagnostics (Basel) ; 13(4)2023 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-36832235

RESUMEN

Deep learning (DL) models are state-of-the-art in segmenting anatomical and disease regions of interest (ROIs) in medical images. Particularly, a large number of DL-based techniques have been reported using chest X-rays (CXRs). However, these models are reportedly trained on reduced image resolutions for reasons related to the lack of computational resources. Literature is sparse in discussing the optimal image resolution to train these models for segmenting the tuberculosis (TB)-consistent lesions in CXRs. In this study, we investigated the performance variations with an Inception-V3 UNet model using various image resolutions with/without lung ROI cropping and aspect ratio adjustments and identified the optimal image resolution through extensive empirical evaluations to improve TB-consistent lesion segmentation performance. We used the Shenzhen CXR dataset for the study, which includes 326 normal patients and 336 TB patients. We proposed a combinatorial approach consisting of storing model snapshots, optimizing segmentation threshold and test-time augmentation (TTA), and averaging the snapshot predictions, to further improve performance with the optimal resolution. Our experimental results demonstrate that higher image resolutions are not always necessary; however, identifying the optimal image resolution is critical to achieving superior performance.

17.
ArXiv ; 2023 Jan 27.
Artículo en Inglés | MEDLINE | ID: mdl-36789135

RESUMEN

Deep learning (DL) models are state-of-the-art in segmenting anatomical and disease regions of interest (ROIs) in medical images. Particularly, a large number of DL-based techniques have been reported using chest X-rays (CXRs). However, these models are reportedly trained on reduced image resolutions for reasons related to the lack of computational resources. Literature is sparse in discussing the optimal image resolution to train these models for segmenting the Tuberculosis (TB)-consistent lesions in CXRs. In this study, we investigated the performance variations using an Inception-V3 UNet model using various image resolutions with/without lung ROI cropping and aspect ratio adjustments, and (ii) identified the optimal image resolution through extensive empirical evaluations to improve TB-consistent lesion segmentation performance. We used the Shenzhen CXR dataset for the study which includes 326 normal patients and 336 TB patients. We proposed a combinatorial approach consisting of storing model snapshots, optimizing segmentation threshold and test-time augmentation (TTA), and averaging the snapshot predictions, to further improve performance with the optimal resolution. Our experimental results demonstrate that higher image resolutions are not always necessary, however, identifying the optimal image resolution is critical to achieving superior performance.

18.
Artículo en Inglés | MEDLINE | ID: mdl-36780238

RESUMEN

Research in Artificial Intelligence (AI)-based medical computer vision algorithms bear promises to improve disease screening, diagnosis, and subsequently patient care. However, these algorithms are highly impacted by the characteristics of the underlying data. In this work, we discuss various data characteristics, namely Volume, Veracity, Validity, Variety, and Velocity, that impact the design, reliability, and evolution of machine learning in medical computer vision. Further, we discuss each characteristic and the recent works conducted in our research lab that informed our understanding of the impact of these characteristics on the design of medical decision-making algorithms and outcome reliability.

19.
Diagnostics (Basel) ; 13(6)2023 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-36980375

RESUMEN

Domain shift is one of the key challenges affecting reliability in medical imaging-based machine learning predictions. It is of significant importance to investigate this issue to gain insights into its characteristics toward determining controllable parameters to minimize its impact. In this paper, we report our efforts on studying and analyzing domain shift in lung region detection in chest radiographs. We used five chest X-ray datasets, collected from different sources, which have manual markings of lung boundaries in order to conduct extensive experiments toward this goal. We compared the characteristics of these datasets from three aspects: information obtained from metadata or an image header, image appearance, and features extracted from a pretrained model. We carried out experiments to evaluate and compare model performances within each dataset and across datasets in four scenarios using different combinations of datasets. We proposed a new feature visualization method to provide explanations for the applied object detection network on the obtained quantitative results. We also examined chest X-ray modality-specific initialization, catastrophic forgetting, and model repeatability. We believe the observations and discussions presented in this work could help to shed some light on the importance of the analysis of training data for medical imaging machine learning research, and could provide valuable guidance for domain shift analysis.

20.
ArXiv ; 2023 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-37986725

RESUMEN

Deep learning (DL) has demonstrated its innate capacity to independently learn hierarchical features from complex and multi-dimensional data. A common understanding is that its performance scales up with the amount of training data. Another data attribute is the inherent variety. It follows, therefore, that semantic redundancy, which is the presence of similar or repetitive information, would tend to lower performance and limit generalizability to unseen data. In medical imaging data, semantic redundancy can occur due to the presence of multiple images that have highly similar presentations for the disease of interest. Further, the common use of augmentation methods to generate variety in DL training may be limiting performance when applied to semantically redundant data. We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data. We demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set, during both internal (recall: 0.7164 vs 0.6597, p<0.05) and external testing (recall: 0.3185 vs 0.2589, p<0.05). Our findings emphasize the importance of information-oriented training sample selection as opposed to the conventional practice of using all available training data.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA