Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
1.
EBioMedicine ; 104: 105174, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38821021

RESUMEN

BACKGROUND: Chest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enhancing dataset diversity. This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research. METHODS: The study employed DDPMs to create synthetic CXRs conditioned on demographic and pathological characteristics from the CheXpert dataset. These synthetic images were used to supplement training datasets for pathology classifiers, with the aim of improving their performance. The evaluation involved three datasets (CheXpert, MIMIC-CXR, and Emory Chest X-ray) and various experiments, including supplementing real data with synthetic data, training with purely synthetic data, and mixing synthetic data with external datasets. Performance was assessed using the area under the receiver operating curve (AUROC). FINDINGS: Adding synthetic data to real datasets resulted in a notable increase in AUROC values (up to 0.02 in internal and external test sets with 1000% supplementation, p-value <0.01 in all instances). When classifiers were trained exclusively on synthetic data, they achieved performance levels comparable to those trained on real data with 200%-300% data supplementation. The combination of real and synthetic data from different sources demonstrated enhanced model generalizability, increasing model AUROC from 0.76 to 0.80 on the internal test set (p-value <0.01). INTERPRETATION: Synthetic data supplementation significantly improves the performance and generalizability of pathology classifiers in medical imaging. FUNDING: Dr. Gichoya is a 2022 Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program and declares support from RSNA Health Disparities grant (#EIHD2204), Lacuna Fund (#67), Gordon and Betty Moore Foundation, NIH (NIBIB) MIDRC grant under contracts 75N92020C00008 and 75N92020C00021, and NHLBI Award Number R01HL167811.


Asunto(s)
Diagnóstico por Imagen , Curva ROC , Humanos , Diagnóstico por Imagen/métodos , Algoritmos , Radiografía Torácica/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Bases de Datos Factuales , Área Bajo la Curva , Modelos Estadísticos
2.
Radiology ; 310(3): e232780, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38501952

RESUMEN

Background Mirai, a state-of-the-art deep learning-based algorithm for predicting short-term breast cancer risk, outperforms standard clinical risk models. However, Mirai is a black box, risking overreliance on the algorithm and incorrect diagnoses. Purpose To identify whether bilateral dissimilarity underpins Mirai's reasoning process; create a simplified, intelligible model, AsymMirai, using bilateral dissimilarity; and determine if AsymMirai may approximate Mirai's performance in 1-5-year breast cancer risk prediction. Materials and Methods This retrospective study involved mammograms obtained from patients in the EMory BrEast imaging Dataset, known as EMBED, from January 2013 to December 2020. To approximate 1-5-year breast cancer risk predictions from Mirai, another deep learning-based model, AsymMirai, was built with an interpretable module: local bilateral dissimilarity (localized differences between left and right breast tissue). Pearson correlation coefficients were computed between the risk scores of Mirai and those of AsymMirai. Subgroup analysis was performed in patients for whom AsymMirai's year-over-year reasoning was consistent. AsymMirai and Mirai risk scores were compared using the area under the receiver operating characteristic curve (AUC), and 95% CIs were calculated using the DeLong method. Results Screening mammograms (n = 210 067) from 81 824 patients (mean age, 59.4 years ± 11.4 [SD]) were included in the study. Deep learning-extracted bilateral dissimilarity produced similar risk scores to those of Mirai (1-year risk prediction, r = 0.6832; 4-5-year prediction, r = 0.6988) and achieved similar performance as Mirai. For AsymMirai, the 1-year breast cancer risk AUC was 0.79 (95% CI: 0.73, 0.85) (Mirai, 0.84; 95% CI: 0.79, 0.89; P = .002), and the 5-year risk AUC was 0.66 (95% CI: 0.63, 0.69) (Mirai, 0.71; 95% CI: 0.68, 0.74; P < .001). In a subgroup of 183 patients for whom AsymMirai repeatedly highlighted the same tissue over time, AsymMirai achieved a 3-year AUC of 0.92 (95% CI: 0.86, 0.97). Conclusion Localized bilateral dissimilarity, an imaging marker for breast cancer risk, approximated the predictive power of Mirai and was a key to Mirai's reasoning. © RSNA, 2024 Supplemental material is available for this article See also the editorial by Freitas in this issue.


Asunto(s)
Neoplasias de la Mama , Aprendizaje Profundo , Humanos , Persona de Mediana Edad , Femenino , Neoplasias de la Mama/diagnóstico por imagen , Neoplasias de la Mama/epidemiología , Estudios Retrospectivos , Mamografía , Mama
3.
JMIR Med Educ ; 10: e46500, 2024 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-38376896

RESUMEN

BACKGROUND: Artificial intelligence (AI) and machine learning (ML) are poised to have a substantial impact in the health care space. While a plethora of web-based resources exist to teach programming skills and ML model development, there are few introductory curricula specifically tailored to medical students without a background in data science or programming. Programs that do exist are often restricted to a specific specialty. OBJECTIVE: We hypothesized that a 1-month elective for fourth-year medical students, composed of high-quality existing web-based resources and a project-based structure, would empower students to learn about the impact of AI and ML in their chosen specialty and begin contributing to innovation in their field of interest. This study aims to evaluate the success of this elective in improving self-reported confidence scores in AI and ML. The authors also share our curriculum with other educators who may be interested in its adoption. METHODS: This elective was offered in 2 tracks: technical (for students who were already competent programmers) and nontechnical (with no technical prerequisites, focusing on building a conceptual understanding of AI and ML). Students established a conceptual foundation of knowledge using curated web-based resources and relevant research papers, and were then tasked with completing 3 projects in their chosen specialty: a data set analysis, a literature review, and an AI project proposal. The project-based nature of the elective was designed to be self-guided and flexible to each student's interest area and career goals. Students' success was measured by self-reported confidence in AI and ML skills in pre and postsurveys. Qualitative feedback on students' experiences was also collected. RESULTS: This web-based, self-directed elective was offered on a pass-or-fail basis each month to fourth-year students at Emory University School of Medicine beginning in May 2021. As of June 2022, a total of 19 students had successfully completed the elective, representing a wide range of chosen specialties: diagnostic radiology (n=3), general surgery (n=1), internal medicine (n=5), neurology (n=2), obstetrics and gynecology (n=1), ophthalmology (n=1), orthopedic surgery (n=1), otolaryngology (n=2), pathology (n=2), and pediatrics (n=1). Students' self-reported confidence scores for AI and ML rose by 66% after this 1-month elective. In qualitative surveys, students overwhelmingly reported enthusiasm and satisfaction with the course and commented that the self-direction and flexibility and the project-based design of the course were essential. CONCLUSIONS: Course participants were successful in diving deep into applications of AI in their widely-ranging specialties, produced substantial project deliverables, and generally reported satisfaction with their elective experience. The authors are hopeful that a brief, 1-month investment in AI and ML education during medical school will empower this next generation of physicians to pave the way for AI and ML innovation in health care.


Asunto(s)
Inteligencia Artificial , Educación Médica , Humanos , Curriculum , Internet , Estudiantes de Medicina
4.
Commun Med (Lond) ; 4(1): 21, 2024 Feb 19.
Artículo en Inglés | MEDLINE | ID: mdl-38374436

RESUMEN

BACKGROUND: Breast density is an important risk factor for breast cancer complemented by a higher risk of cancers being missed during screening of dense breasts due to reduced sensitivity of mammography. Automated, deep learning-based prediction of breast density could provide subject-specific risk assessment and flag difficult cases during screening. However, there is a lack of evidence for generalisability across imaging techniques and, importantly, across race. METHODS: This study used a large, racially diverse dataset with 69,697 mammographic studies comprising 451,642 individual images from 23,057 female participants. A deep learning model was developed for four-class BI-RADS density prediction. A comprehensive performance evaluation assessed the generalisability across two imaging techniques, full-field digital mammography (FFDM) and two-dimensional synthetic (2DS) mammography. A detailed subgroup performance and bias analysis assessed the generalisability across participants' race. RESULTS: Here we show that a model trained on FFDM-only achieves a 4-class BI-RADS classification accuracy of 80.5% (79.7-81.4) on FFDM and 79.4% (78.5-80.2) on unseen 2DS data. When trained on both FFDM and 2DS images, the performance increases to 82.3% (81.4-83.0) and 82.3% (81.3-83.1). Racial subgroup analysis shows unbiased performance across Black, White, and Asian participants, despite a separate analysis confirming that race can be predicted from the images with a high accuracy of 86.7% (86.0-87.4). CONCLUSIONS: Deep learning-based breast density prediction generalises across imaging techniques and race. No substantial disparities are found for any subgroup, including races that were never seen during model development, suggesting that density predictions are unbiased.


Women with dense breasts have a higher risk of breast cancer. For dense breasts, it is also more difficult to spot cancer in mammograms, which are the X-ray images commonly used for breast cancer screening. Thus, knowing about an individual's breast density provides important information to doctors and screening participants. This study investigated whether an artificial intelligence algorithm (AI) can be used to accurately determine the breast density by analysing mammograms. The study tested whether such an algorithm performs equally well across different imaging devices, and importantly, across individuals from different self-reported race groups. A large, racially diverse dataset was used to evaluate the algorithm's performance. The results show that there were no substantial differences in the accuracy for any of the groups, providing important assurances that AI can be used safely and ethically for automated prediction of breast density.

5.
Curr Probl Diagn Radiol ; 53(3): 346-352, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38302303

RESUMEN

Breast cancer is the most common type of cancer in women, and early abnormality detection using mammography can significantly improve breast cancer survival rates. Diverse datasets are required to improve the training and validation of deep learning (DL) systems for autonomous breast cancer diagnosis. However, only a small number of mammography datasets are publicly available. This constraint has created challenges when comparing different DL models using the same dataset. The primary contribution of this study is the comprehensive description of a selection of currently available public mammography datasets. The information available on publicly accessible datasets is summarized and their usability reviewed to enable more effective models to be developed for breast cancer detection and to improve understanding of existing models trained using these datasets. This study aims to bridge the existing knowledge gap by offering researchers and practitioners a valuable resource to develop and assess DL models in breast cancer diagnosis.


Asunto(s)
Neoplasias de la Mama , Aprendizaje Profundo , Femenino , Humanos , Mamografía , Neoplasias de la Mama/diagnóstico por imagen , Detección Precoz del Cáncer
6.
Curr Atheroscler Rep ; 26(4): 91-102, 2024 04.
Artículo en Inglés | MEDLINE | ID: mdl-38363525

RESUMEN

PURPOSE OF REVIEW: Bias in artificial intelligence (AI) models can result in unintended consequences. In cardiovascular imaging, biased AI models used in clinical practice can negatively affect patient outcomes. Biased AI models result from decisions made when training and evaluating a model. This paper is a comprehensive guide for AI development teams to understand assumptions in datasets and chosen metrics for outcome/ground truth, and how this translates to real-world performance for cardiovascular disease (CVD). RECENT FINDINGS: CVDs are the number one cause of mortality worldwide; however, the prevalence, burden, and outcomes of CVD vary across gender and race. Several biomarkers are also shown to vary among different populations and ethnic/racial groups. Inequalities in clinical trial inclusion, clinical presentation, diagnosis, and treatment are preserved in health data that is ultimately used to train AI algorithms, leading to potential biases in model performance. Despite the notion that AI models themselves are biased, AI can also help to mitigate bias (e.g., bias auditing tools). In this review paper, we describe in detail implicit and explicit biases in the care of cardiovascular disease that may be present in existing datasets but are not obvious to model developers. We review disparities in CVD outcomes across different genders and race groups, differences in treatment of historically marginalized groups, and disparities in clinical trials for various cardiovascular diseases and outcomes. Thereafter, we summarize some CVD AI literature that shows bias in CVD AI as well as approaches that AI is being used to mitigate CVD bias.


Asunto(s)
Inteligencia Artificial , Enfermedades Cardiovasculares , Femenino , Masculino , Humanos , Enfermedades Cardiovasculares/diagnóstico por imagen , Algoritmos , Sesgo
7.
medRxiv ; 2024 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-38260571

RESUMEN

Background: To create an opportunistic screening strategy by multitask deep learning methods to stratify prediction for coronary artery calcium (CAC) and associated cardiovascular risk with frontal chest x-rays (CXR) and minimal data from electronic health records (EHR). Methods: In this retrospective study, 2,121 patients with available computed tomography (CT) scans and corresponding CXR images were collected internally (Mayo Enterprise) with calculated CAC scores binned into 3 categories (0, 1-99, and 100+) as ground truths for model training. Results from the internal training were tested on multiple external datasets (domestic (EUH) and foreign (VGHTPE)) with significant racial and ethnic differences and classification performance was compared. Findings: Classification performance between 0, 1-99, and 100+ CAC scores performed moderately on both the internal test and external datasets, reaching average f1-score of 0.66 for Mayo, 0.62 for EUH and 0.61 for VGHTPE. For the clinically relevant binary task of 0 vs 400+ CAC classification, the performance of our model on the internal test and external datasets reached an average AUCROC of 0.84. Interpretation: The fusion model trained on CXR performed better (0.84 average AUROC on internal and external dataset) than existing state-of-the-art models on predicting CAC scores only on internal (0.73 AUROC), with robust performance on external datasets. Thus, our proposed model may be used as a robust, first-pass opportunistic screening method for cardiovascular risk from regular chest radiographs. For community use, trained model and the inference code can be downloaded with an academic open-source license from https://github.com/jeong-jasonji/MTL_CAC_classification . Funding: The study was partially supported by National Institute of Health 1R01HL155410-01A1 award.

9.
Crit Care Med ; 52(2): 345-348, 2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38240516
11.
J Am Coll Radiol ; 21(7): 988, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38122882
12.
Int J Med Inform ; 178: 105211, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37690225

RESUMEN

PURPOSE: Chronic obstructive pulmonary disease (COPD) is one of the most common chronic illnesses in the world. Unfortunately, COPD is often difficult to diagnose early when interventions can alter the disease course, and it is underdiagnosed or only diagnosed too late for effective treatment. Currently, spirometry is the gold standard for diagnosing COPD but it can be challenging to obtain, especially in resource-poor countries. Chest X-rays (CXRs), however, are readily available and may have the potential as a screening tool to identify patients with COPD who should undergo further testing or intervention. In this study, we used three CXR datasets alongside their respective electronic health records (EHR) to develop and externally validate our models. METHOD: To leverage the performance of convolutional neural network models, we proposed two fusion schemes: (1) model-level fusion, using Bootstrap aggregating to aggregate predictions from two models, (2) data-level fusion, using CXR image data from different institutions or multi-modal data, CXR image data, and EHR data for model training. Fairness analysis was then performed to evaluate the models across different demographic groups. RESULTS: Our results demonstrate that DL models can detect COPD using CXRs with an area under the curve of over 0.75, which could facilitate patient screening for COPD, especially in low-resource regions where CXRs are more accessible than spirometry. CONCLUSIONS: By using a ubiquitous test, future research could build on this work to detect COPD in patients early who would not otherwise have been diagnosed or treated, altering the course of this highly morbid disease.

13.
Br J Radiol ; 96(1150): 20230023, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37698583

RESUMEN

Various forms of artificial intelligence (AI) applications are being deployed and used in many healthcare systems. As the use of these applications increases, we are learning the failures of these models and how they can perpetuate bias. With these new lessons, we need to prioritize bias evaluation and mitigation for radiology applications; all the while not ignoring the impact of changes in the larger enterprise AI deployment which may have downstream impact on performance of AI models. In this paper, we provide an updated review of known pitfalls causing AI bias and discuss strategies for mitigating these biases within the context of AI deployment in the larger healthcare enterprise. We describe these pitfalls by framing them in the larger AI lifecycle from problem definition, data set selection and curation, model training and deployment emphasizing that bias exists across a spectrum and is a sequela of a combination of both human and machine factors.


Asunto(s)
Inteligencia Artificial , Radiología , Humanos , Sesgo , Progresión de la Enfermedad , Aprendizaje
14.
Front Radiol ; 3: 1181190, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37588666

RESUMEN

Introduction: To date, most mammography-related AI models have been trained using either film or digital mammogram datasets with little overlap. We investigated whether or not combining film and digital mammography during training will help or hinder modern models designed for use on digital mammograms. Methods: To this end, a total of six binary classifiers were trained for comparison. The first three classifiers were trained using images only from Emory Breast Imaging Dataset (EMBED) using ResNet50, ResNet101, and ResNet152 architectures. The next three classifiers were trained using images from EMBED, Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM), and Digital Database for Screening Mammography (DDSM) datasets. All six models were tested only on digital mammograms from EMBED. Results: The results showed that performance degradation to the customized ResNet models was statistically significant overall when EMBED dataset was augmented with CBIS-DDSM/DDSM. While the performance degradation was observed in all racial subgroups, some races are subject to more severe performance drop as compared to other races. Discussion: The degradation may potentially be due to ( 1) a mismatch in features between film-based and digital mammograms ( 2) a mismatch in pathologic and radiological information. In conclusion, use of both film and digital mammography during training may hinder modern models designed for breast cancer screening. Caution is required when combining film-based and digital mammograms or when utilizing pathologic and radiological information simultaneously.

15.
J Am Coll Radiol ; 20(9): 842-851, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37506964

RESUMEN

Despite the expert-level performance of artificial intelligence (AI) models for various medical imaging tasks, real-world performance failures with disparate outputs for various subgroups limit the usefulness of AI in improving patients' lives. Many definitions of fairness have been proposed, with discussions of various tensions that arise in the choice of an appropriate metric to use to evaluate bias; for example, should one aim for individual or group fairness? One central observation is that AI models apply "shortcut learning" whereby spurious features (such as chest tubes and portable radiographic markers on intensive care unit chest radiography) on medical images are used for prediction instead of identifying true pathology. Moreover, AI has been shown to have a remarkable ability to detect protected attributes of age, sex, and race, while the same models demonstrate bias against historically underserved subgroups of age, sex, and race in disease diagnosis. Therefore, an AI model may take shortcut predictions from these correlations and subsequently generate an outcome that is biased toward certain subgroups even when protected attributes are not explicitly used as inputs into the model. As a result, these subgroups became nonprivileged subgroups. In this review, the authors discuss the various types of bias from shortcut learning that may occur at different phases of AI model development, including data bias, modeling bias, and inference bias. The authors thereafter summarize various tool kits that can be used to evaluate and mitigate bias and note that these have largely been applied to nonmedical domains and require more evaluation for medical AI. The authors then summarize current techniques for mitigating bias from preprocessing (data-centric solutions) and during model development (computational solutions) and postprocessing (recalibration of learning). Ongoing legal changes where the use of a biased model will be penalized highlight the necessity of understanding, detecting, and mitigating biases from shortcut learning and will require diverse research teams looking at the whole AI pipeline.


Asunto(s)
Inteligencia Artificial , Radiología , Humanos , Radiografía , Causalidad , Sesgo
16.
J Med Imaging (Bellingham) ; 10(3): 034004, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-37388280

RESUMEN

Purpose: Our study investigates whether graph-based fusion of imaging data with non-imaging electronic health records (EHR) data can improve the prediction of the disease trajectories for patients with coronavirus disease 2019 (COVID-19) beyond the prediction performance of only imaging or non-imaging EHR data. Approach: We present a fusion framework for fine-grained clinical outcome prediction [discharge, intensive care unit (ICU) admission, or death] that fuses imaging and non-imaging information using a similarity-based graph structure. Node features are represented by image embedding, and edges are encoded with clinical or demographic similarity. Results: Experiments on data collected from the Emory Healthcare Network indicate that our fusion modeling scheme performs consistently better than predictive models developed using only imaging or non-imaging features, with area under the receiver operating characteristics curve of 0.76, 0.90, and 0.75 for discharge from hospital, mortality, and ICU admission, respectively. External validation was performed on data collected from the Mayo Clinic. Our scheme highlights known biases in the model prediction, such as bias against patients with alcohol abuse history and bias based on insurance status. Conclusions: Our study signifies the importance of the fusion of multiple data modalities for the accurate prediction of clinical trajectories. The proposed graph structure can model relationships between patients based on non-imaging EHR data, and graph convolutional networks can fuse this relationship information with imaging data to effectively predict future disease trajectory more effectively than models employing only imaging or non-imaging data. Our graph-based fusion modeling frameworks can be easily extended to other prediction tasks to efficiently combine imaging data with non-imaging clinical data.

17.
Clin Imaging ; 101: 137-141, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37336169

RESUMEN

PURPOSE: To evaluate the complexity of diagnostic radiology reports across major imaging modalities and the ability of ChatGPT (Early March 2023 Version, OpenAI, California, USA) to simplify these reports to the 8th grade reading level of the average U.S. adult. METHODS: We randomly sampled 100 radiographs (XR), 100 ultrasound (US), 100 CT, and 100 MRI radiology reports from our institution's database dated between 2022 and 2023 (N = 400). These were processed by ChatGPT using the prompt "Explain this radiology report to a patient in layman's terms in second person: ". Mean report length, Flesch reading ease score (FRES), and Flesch-Kincaid reading level (FKRL) were calculated for each report and ChatGPT output. T-tests were used to determine significance. RESULTS: Mean report length was 164 ± 117 words, FRES was 38.0 ± 11.8, and FKRL was 10.4 ± 1.9. FKRL was significantly higher for CT and MRI than for US and XR. Only 60/400 (15%) had a FKRL <8.5. The mean simplified ChatGPT output length was 103 ± 36 words, FRES was 83.5 ± 5.6, and FKRL was 5.8 ± 1.1. This reflects a mean decrease of 61 words (p < 0.01), increase in FRES of 45.5 (p < 0.01), and decrease in FKRL of 4.6 (p < 0.01). All simplified outputs had FKRL <8.5. DISCUSSION: Our study demonstrates the effective use of ChatGPT when tasked with simplifying radiology reports to below the 8th grade reading level. We report significant improvements in FRES, FKRL, and word count, the last of which requires modality-specific context.


Asunto(s)
Comprensión , Radiología , Adulto , Humanos , Radiografía , Imagen por Resonancia Magnética , Bases de Datos Factuales
18.
IEEE J Biomed Health Inform ; 27(8): 3936-3947, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37167055

RESUMEN

Automated curation of noisy external data in the medical domain has long been in high demand, as AI technologies need to be validated using various sources with clean, annotated data. Identifying the variance between internal and external sources is a fundamental step in curating a high-quality dataset, as the data distributions from different sources can vary significantly and subsequently affect the performance of AI models. The primary challenges for detecting data shifts are - (1) accessing private data across healthcare institutions for manual detection and (2) the lack of automated approaches to learn efficient shift-data representation without training samples. To overcome these problems, we propose an automated pipeline called MedShift to detect top-level shift samples and evaluate the significance of shift data without sharing data between internal and external organizations. MedShift employs unsupervised anomaly detectors to learn the internal distribution and identify samples showing significant shiftness for external datasets, and then compares their performance. To quantify the effects of detected shift data, we train a multi-class classifier that learns internal domain knowledge and evaluates the classification performance for each class in external domains after dropping the shift data. We also propose a data quality metric to quantify the dissimilarity between internal and external datasets. We verify the efficacy of MedShift using musculoskeletal radiographs (MURA) and chest X-ray datasets from multiple external sources. Our experiments show that our proposed shift data detection pipeline can be beneficial for medical centers to curate high-quality datasets more efficiently.

19.
Front Big Data ; 6: 1173038, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37139170

RESUMEN

Data integration is a well-motivated problem in the clinical data science domain. Availability of patient data, reference clinical cases, and datasets for research have the potential to advance the healthcare industry. However, the unstructured (text, audio, or video data) and heterogeneous nature of the data, the variety of data standards and formats, and patient privacy constraint make data interoperability and integration a challenge. The clinical text is further categorized into different semantic groups and may be stored in different files and formats. Even the same organization may store cases in different data structures, making data integration more challenging. With such inherent complexity, domain experts and domain knowledge are often necessary to perform data integration. However, expert human labor is time and cost prohibitive. To overcome the variability in the structure, format, and content of the different data sources, we map the text into common categories and compute similarity within those. In this paper, we present a method to categorize and merge clinical data by considering the underlying semantics behind the cases and use reference information about the cases to perform data integration. Evaluation shows that we were able to merge 88% of clinical data from five different sources.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...