Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
AJR Am J Roentgenol ; 221(3): 377-385, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37073901

RESUMO

BACKGROUND. Reported rates of recommendations for additional imaging (RAIs) in radiology reports are low. Bidirectional encoder representations from transformers (BERT), a deep learning model pretrained to understand language context and ambiguity, has potential for identifying RAIs and thereby assisting large-scale quality improvement efforts. OBJECTIVE. The purpose of this study was to develop and externally validate an artificial intelligence (AI)-based model for identifying radiology reports containing RAIs. METHODS. This retrospective study was performed at a multisite health center. A total of 6300 radiology reports generated at one site from January 1, 2015, to June 30, 2021, were randomly selected and split by 4:1 ratio to create training (n = 5040) and test (n = 1260) sets. A total of 1260 reports generated at the center's other sites (including academic and community hospitals) from April 1 to April 30, 2022, were randomly selected as an external validation group. Referring practitioners and radiologists of varying sub-specialties manually reviewed report impressions for presence of RAIs. A BERT-based technique for identifying RAIs was developed by use of the training set. Performance of the BERT-based model and a previously developed traditional machine learning (TML) model was assessed in the test set. Finally, performance was assessed in the external validation set. The code for the BERT-based RAI model is publicly available. RESULTS. Among a total of 7419 unique patients (4133 women, 3286 men; mean age, 58.8 years), 10.0% of 7560 reports contained RAI. In the test set, the BERT-based model had 94.4% precision, 98.5% recall, and an F1 score of 96.4%. In the test set, the TML model had 69.0% precision, 65.4% recall, and an F1 score of 67.2%. In the test set, accuracy was greater for the BERT-based than for the TML model (99.2% vs 93.1%, p < .001). In the external validation set, the BERT-based model had 99.2% precision, 91.6% recall, an F1 score of 95.2%, and 99.0% accuracy. CONCLUSION. The BERT-based AI model accurately identified reports with RAIs, outperforming the TML model. High performance in the external validation set suggests the potential for other health systems to adapt the model without requiring institution-specific training. CLINICAL IMPACT. The model could potentially be used for real-time EHR monitoring for RAIs and other improvement initiatives to help ensure timely performance of clinically necessary recommended follow-up.


Assuntos
Inteligência Artificial , Radiologia , Masculino , Humanos , Feminino , Pessoa de Meia-Idade , Estudos Retrospectivos , Radiografia , Diagnóstico por Imagem , Processamento de Linguagem Natural
2.
J Breast Imaging ; 6(3): 246-253, 2024 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-38655858

RESUMO

OBJECTIVE: To evaluate the association of mammographic, radiologist, and patient factors on BI-RADS 3 assessment at diagnostic mammography in patients recalled from screening mammography. METHODS: This Institutional Review Board-approved retrospective study of consecutive unique diagnostic mammography examinations in asymptomatic patients recalled from screening mammography March 5, 2014, to December 31, 2019, was conducted in a single large United States health care institution. Mammographic features (mass, calcification, distortion, asymmetry), breast density, prior examination, and BI-RADS assessment were extracted from reports by natural language processing. Patient age, race, and ethnicity were extracted from the electronic health record. Radiologist years in practice, recall rate, and number of interpreted diagnostic mammograms were calculated. A mixed effect logistic regression model evaluated factors associated with likelihood of BI-RADS 3 compared with other BI-RADS assessments. RESULTS: A total of 12 080 diagnostic mammography examinations were performed during the study period, yielding 2010 (16.6%) BI-RADS 3 and 10 070 (83.4%) other BI-RADS assessments. Asymmetry (odds ratio [OR] = 6.49, P <.001) and calcification (OR = 5.59, P <.001) were associated with increased likelihood of BI-RADS 3 assessment; distortion (OR = 0.20, P <.001), dense breast parenchyma (OR = 0.82, P <.001), prior examination (OR = 0.63, P = .01), and increasing patient age (OR = 0.99, P <.001) were associated with decreased likelihood. Mass, patient race or ethnicity, and radiologist factors were not significantly associated with BI-RADS 3 assessment. Malignancy rate for BI-RADS 3 lesions was 1.6%. CONCLUSION: Asymmetry and calcifications had an increased likelihood of BI-RADS 3 assessment at diagnostic evaluation with low likelihood of malignancy, while radiologist features had no association.


Assuntos
Neoplasias da Mama , Mamografia , Humanos , Mamografia/métodos , Feminino , Estudos Retrospectivos , Pessoa de Meia-Idade , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/diagnóstico , Idoso , Adulto , Radiologistas/estatística & dados numéricos , Densidade da Mama , Mama/diagnóstico por imagem , Mama/patologia
3.
Acad Radiol ; 30(5): 798-806, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-35803888

RESUMO

RATIONALE AND OBJECTIVES: Determine whether there are patterns of lesion recall among breast imaging subspecialists interpreting screening mammography, and if so, whether recall patterns correlate to morphologies of screen-detected cancers. MATERIALS AND METHODS: This Institutional Review Board-approved, retrospective review included all screening examinations January 3, 2012-October 1, 2018 interpreted by fifteen breast imaging subspecialists at a large academic medical center and two outpatient imaging centers. Natural language processing identified radiologist recalls by lesion type (mass, calcifications, asymmetry, architectural distortion); proportions of callbacks by lesion types were calculated per radiologist. Hierarchical cluster analysis grouped radiologists based on recall patterns. Groups were compared to overall practice and each other by proportions of lesion types recalled, and overall and lesion-specific positive predictive value-1 (PPV1). RESULTS: Among 161,859 screening mammograms with 13,086 (8.1%) recalls, Hierarchical cluster analysis grouped 15 radiologists into five groups. There was substantial variation in proportions of lesions recalled: calcifications 13%-18% (Chi-square 45.69, p < 0.00001); mass 16%-44% (Chi-square 498.42, p < 0.00001); asymmetry 13%-47% (Chi-square 660.93, p < 0.00001) architectural distortion 6%-20% (Chi-square 283.81, p < 0.00001). Radiologist groups differed significantly in overall PPV1 (range 5.6%-8.8%; Chi-square 17.065, p = 0.0019). PPV1 by lesion type varied among groups: calcifications 9.2%-15.4% (Chi-square 2.56, p = 0.6339); mass 5.6%-8.5% (Chi-square 1.31, p = 0.8597); asymmetry 3.4%-5.9% (Chi-square 2.225, p = 0.6945); architectural distortion 5.6%-10.8% (Chi-square 5.810, p = 0.2138). Proportions of recalled lesions did not consistently correlate to proportions of screen-detected cancer. CONCLUSION: Breast imaging subspecialists have patterns for screening mammography recalls, suggesting differential weighting of imaging findings for perceived malignant potential. Radiologist recall patterns are not always predictive of screen-detected cancers nor lesion-specific PPV1s.


Assuntos
Neoplasias da Mama , Calcinose , Humanos , Feminino , Mamografia/métodos , Neoplasias da Mama/diagnóstico por imagem , Detecção Precoce de Câncer/métodos , Mama/diagnóstico por imagem , Programas de Rastreamento/métodos , Estudos Retrospectivos , Radiologistas
4.
J Am Coll Radiol ; 20(2): 207-214, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36496088

RESUMO

OBJECTIVES: The aim of this study was to compare screening mammography performance metrics for immediate (live) interpretation versus offline interpretation at a cancer center. METHODS: An institutional review board-approved, retrospective comparison of screening mammography metrics at a cancer center for January 1, 2018, to December 31, 2019 (live period), and September 1, 2020, to March 31, 2022 (offline period), was performed. Before July 2020, screening examinations were interpreted while patients waited (live period), and diagnostic workup was performed concurrently. After the coronavirus disease 2019 shutdown from March to mid-June 2020, offline same-day interpretation was instituted. Patients with abnormal screening results returned for separate diagnostic evaluation. Screening metrics of positive predictive value 1 (PPV1), cancer detection rate (CDR), and abnormal interpretation rate (AIR) were compared for 17 radiologists who interpreted during both periods. Statistical significance was assessed using χ2 analysis. RESULTS: In the live period, there were 7,105 screenings, 635 recalls, and 51 screen-detected cancers. In the offline period, there were 7,512 screenings, 586 recalls, and 47 screen-detected cancers. Comparison of live screening metrics versus offline metrics produced the following results: AIR, 8.9% (635 of 7,105) versus 7.8% (586 of 7,512) (P = .01); PPV1, 8.0% (51 of 635) versus 8.0% (47 of 586); and CDR, 7.2/1,000 versus 6.3/1,000 (P = .50). When grouped by >10% AIR or <10% AIR for the live period, the >10% AIR group showed a significant decrease in AIR for offline interpretation (from 12.7% to 9.7%, P < .001), whereas the <10% AIR group showed no significant change (from 7.4% to 6.7%, P = .17). CONCLUSIONS: Conversion to offline screening interpretation from immediate interpretation at a cancer center was associated with lower AIR and similar CDR and PPV1. This effect was seen largely in radiologists with AIR > 10% in the live setting.


Assuntos
Neoplasias da Mama , COVID-19 , Humanos , Feminino , Estudos Retrospectivos , Neoplasias da Mama/diagnóstico por imagem , Detecção Precoce de Câncer/métodos , Mamografia/métodos , Programas de Rastreamento
5.
J Am Coll Radiol ; 20(4): 431-437, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36841320

RESUMO

OBJECTIVE: Determine the rate of documented notification, via an alert, for intra-institutional discrepant radiologist opinions and addended reports and resulting clinical management changes. METHODS: This institutional review board-exempt, retrospective study was performed at a large academic medical center. We defined an intra-institutional discrepant opinion as when a consultant radiologist provides a different interpretation from that formally rendered by a colleague at our institution. We implemented a discrepant opinion policy requiring closed-loop notification of the consulting radiologist's second opinion to the original radiologist, who must acknowledge this alert within 30 days. This study included all discrepant opinion alerts created December 1, 2019, to December 31, 2021, of which two radiologists and an internal medicine physician performed consensus review. Primary outcomes were degree of discrepancy and percent of discrepant opinions leading to change in clinical management. Secondary outcome was report addendum rate compared with an existing peer learning program using Fisher's exact test. RESULTS: Of 114 discrepant opinion alerts among 1,888,147 reports generated during the study period (0.006%), 58 alerts were categorized as major (50.9%), 41 as moderate (36.0%), and 15 as minor discrepancies (13.1%). Clinical management change occurred in 64 of 114 cases (56.1%). Report addendum rate for discrepant opinion alerts was 4-fold higher than for peer learning alerts at our institution (66 of 315 = 21% versus 432 of 8,273 =5.2%; P < .0001). DISCUSSION: Although discrepant intra-institutional radiologist second opinions were rare, they frequently led to changes in clinical management. Capturing these discrepancies by encouraging alert use may help optimize patient care and document what was communicated to the referring or consulting care team by consulting radiologists.


Assuntos
Radiologistas , Encaminhamento e Consulta , Humanos , Estudos Retrospectivos , Centros Médicos Acadêmicos
6.
J Am Coll Radiol ; 19(10): 1162-1169, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35981636

RESUMO

OBJECTIVE: Address model drift in a machine learning (ML) model for predicting diagnostic imaging follow-up using data augmentation with more recent data versus retraining new predictive models. METHODS: This institutional review board-approved retrospective study was conducted January 1, 2016, to December 31, 2020, at a large academic institution. A previously trained ML model was trained on 1,000 radiology reports from 2016 (old data). An additional 1,385 randomly selected reports from 2019 to 2020 (new data) were annotated for follow-up recommendations and randomly divided into two sets: training (n = 900) and testing (n = 485). Support vector machine and random forest (RF) algorithms were constructed and trained using 900 new data reports plus old data (augmented data, new models) and using only new data (new data, new models). The 2016 baseline model was used as comparator as is and trained with augmented data. Recall was compared with baseline using McNemar's test. RESULTS: Follow-up recommendations were contained in 11.3% of reports (157 or 1,385). The baseline model retrained with new data had precision = 0.83 and recall = 0.54; none significantly different from baseline. A new RF model trained with augmented data had significantly better recall versus the baseline model (0.80 versus 0.66, P = .04) and comparable precision (0.90 versus 0.86). DISCUSSION: ML methods for monitoring follow-up recommendations in radiology reports suffer model drift over time. A newly developed RF model achieved better recall with comparable precision versus simply retraining a previously trained original model with augmented data. Thus, regularly assessing and updating these models is necessary using more recent historical data.


Assuntos
Algoritmos , Aprendizado de Máquina , Seguimentos , Radiografia , Estudos Retrospectivos
7.
JAMIA Open ; 5(2): ooac024, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35474718

RESUMO

Objective: Clinical evidence logic statements (CELS) are shareable knowledge artifacts in a semistructured "If-Then" format that can be used for clinical decision support systems. This project aimed to assess factors facilitating CELS representation. Materials and Methods: We described CELS representation of clinical evidence. We assessed factors that facilitate representation, including authoring instruction, evidence structure, and educational level of CELS authors. Five researchers were tasked with representing CELS from published evidence. Represented CELS were compared with the formal representation. After an authoring instruction intervention, the same researchers were asked to represent the same CELS and accuracy was compared with that preintervention using McNemar's test. Moreover, CELS representation accuracy was compared between evidence that is structured versus semistructured, and between CELS authored by specialty-trained versus nonspecialty-trained researchers, using χ2 analysis. Results: 261 CELS were represented from 10 different pieces of published evidence by the researchers pre- and postintervention. CELS representation accuracy significantly increased post-intervention, from 20/261 (8%) to 63/261 (24%, P value < .00001). More CELS were assigned for representation with 379 total CELS subsequently included in the analysis (278 structured and 101 semistructured) postintervention. Representing CELS from structured evidence was associated with significantly higher CELS representation accuracy (P = .002), as well as CELS representation by specialty-trained authors (P = .0004). Discussion: CELS represented from structured evidence had a higher representation accuracy compared with semistructured evidence. Similarly, specialty-trained authors had higher accuracy when representing structured evidence. Conclusion: Authoring instructions significantly improved CELS representation with a 3-fold increase in accuracy. However, CELS representation remains a challenging task.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA