Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
J Med Syst ; 48(1): 41, 2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38632172

RESUMO

Polypharmacy remains an important challenge for patients with extensive medical complexity. Given the primary care shortage and the increasing aging population, effective polypharmacy management is crucial to manage the increasing burden of care. The capacity of large language model (LLM)-based artificial intelligence to aid in polypharmacy management has yet to be evaluated. Here, we evaluate ChatGPT's performance in polypharmacy management via its deprescribing decisions in standardized clinical vignettes. We inputted several clinical vignettes originally from a study of general practicioners' deprescribing decisions into ChatGPT 3.5, a publicly available LLM, and evaluated its capacity for yes/no binary deprescribing decisions as well as list-based prompts in which the model was prompted to choose which of several medications to deprescribe. We recorded ChatGPT responses to yes/no binary deprescribing prompts and the number and types of medications deprescribed. In yes/no binary deprescribing decisions, ChatGPT universally recommended deprescribing medications regardless of ADL status in patients with no overlying CVD history; in patients with CVD history, ChatGPT's answers varied by technical replicate. Total number of medications deprescribed ranged from 2.67 to 3.67 (out of 7) and did not vary with CVD status, but increased linearly with severity of ADL impairment. Among medication types, ChatGPT preferentially deprescribed pain medications. ChatGPT's deprescribing decisions vary along the axes of ADL status, CVD history, and medication type, indicating some concordance of internal logic between general practitioners and the model. These results indicate that specifically trained LLMs may provide useful clinical support in polypharmacy management for primary care physicians.


Assuntos
Doenças Cardiovasculares , Desprescrições , Clínicos Gerais , Humanos , Idoso , Polimedicação , Inteligência Artificial
2.
J Am Coll Radiol ; 20(10): 990-997, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37356806

RESUMO

OBJECTIVE: Despite rising popularity and performance, studies evaluating the use of large language models for clinical decision support are lacking. Here, we evaluate ChatGPT (Generative Pre-trained Transformer)-3.5 and GPT-4's (OpenAI, San Francisco, California) capacity for clinical decision support in radiology via the identification of appropriate imaging services for two important clinical presentations: breast cancer screening and breast pain. METHODS: We compared ChatGPT's responses to the ACR Appropriateness Criteria for breast pain and breast cancer screening. Our prompt formats included an open-ended (OE) and a select all that apply (SATA) format. Scoring criteria evaluated whether proposed imaging modalities were in accordance with ACR guidelines. Three replicate entries were conducted for each prompt, and the average of these was used to determine final scores. RESULTS: Both ChatGPT-3.5 and ChatGPT-4 achieved an average OE score of 1.830 (out of 2) for breast cancer screening prompts. ChatGPT-3.5 achieved a SATA average percentage correct of 88.9%, compared with ChatGPT-4's average percentage correct of 98.4% for breast cancer screening prompts. For breast pain, ChatGPT-3.5 achieved an average OE score of 1.125 (out of 2) and a SATA average percentage correct of 58.3%, as compared with an average OE score of 1.666 (out of 2) and a SATA average percentage correct of 77.7%. DISCUSSION: Our results demonstrate the eventual feasibility of using large language models like ChatGPT for radiologic decision making, with the potential to improve clinical workflow and responsible use of radiology services. More use cases and greater accuracy are necessary to evaluate and implement such tools.


Assuntos
Neoplasias da Mama , Mastodinia , Radiologia , Humanos , Feminino , Neoplasias da Mama/diagnóstico por imagem , Tomada de Decisões
3.
Radiology ; 307(5): e222044, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37219444

RESUMO

Radiologic tests often contain rich imaging data not relevant to the clinical indication. Opportunistic screening refers to the practice of systematically leveraging these incidental imaging findings. Although opportunistic screening can apply to imaging modalities such as conventional radiography, US, and MRI, most attention to date has focused on body CT by using artificial intelligence (AI)-assisted methods. Body CT represents an ideal high-volume modality whereby a quantitative assessment of tissue composition (eg, bone, muscle, fat, and vascular calcium) can provide valuable risk stratification and help detect unsuspected presymptomatic disease. The emergence of "explainable" AI algorithms that fully automate these measurements could eventually lead to their routine clinical use. Potential barriers to widespread implementation of opportunistic CT screening include the need for buy-in from radiologists, referring providers, and patients. Standardization of acquiring and reporting measures is needed, in addition to expanded normative data according to age, sex, and race and ethnicity. Regulatory and reimbursement hurdles are not insurmountable but pose substantial challenges to commercialization and clinical use. Through demonstration of improved population health outcomes and cost-effectiveness, these opportunistic CT-based measures should be attractive to both payers and health care systems as value-based reimbursement models mature. If highly successful, opportunistic screening could eventually justify a practice of standalone "intended" CT screening.


Assuntos
Inteligência Artificial , Radiologia , Humanos , Algoritmos , Radiologistas , Programas de Rastreamento/métodos , Radiologia/métodos
4.
Clin Imaging ; 95: 47-51, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36610270

RESUMO

PURPOSE: To assess feasibility of automated segmentation and measurement of tracheal collapsibility for detecting tracheomalacia on inspiratory and expiratory chest CT images. METHODS: Our study included 123 patients (age 67 ± 11 years; female: male 69:54) who underwent clinically indicated chest CT examinations in both inspiration and expiration phases. A thoracic radiologist measured anteroposterior length of trachea in inspiration and expiration phase image at the level of maximum collapsibility or aortic arch (in absence of luminal change). Separately, another investigator separately processed the inspiratory and expiratory DICOM CT images with Airway Segmentation component of a commercial COPD software (IntelliSpace Portal, Philips Healthcare). Upon segmentation, the software automatically estimated average lumen diameter (in mm) and lumen area (sq.mm) both along the entire length of trachea and at the level of aortic arch. Data were analyzed with independent t-tests and area under the receiver operating characteristic curve (AUC). RESULTS: Of the 123 patients, 48 patients had tracheomalacia and 75 patients did not. Ratios of inspiration to expiration phases average lumen area and lumen diameter from the length of trachea had the highest AUC of 0.93 (95% CI = 0.88-0.97) for differentiating presence and absence of tracheomalacia. A decrease of ≥25% in average lumen diameter had sensitivity of 82% and specificity of 87% for detecting tracheomalacia. A decrease of ≥40% in the average lumen area had sensitivity and specificity of 86% for detecting tracheomalacia. CONCLUSION: Automatic segmentation and measurement of tracheal dimension over the entire tracheal length is more accurate than a single-level measurement for detecting tracheomalacia.


Assuntos
Traqueomalácia , Humanos , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Traqueomalácia/diagnóstico por imagem , Traqueia/diagnóstico por imagem , Tomografia Computadorizada por Raios X/métodos , Sensibilidade e Especificidade , Curva ROC
5.
Sci Rep ; 13(1): 189, 2023 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-36604467

RESUMO

Non-contrast head CT (NCCT) is extremely insensitive for early (< 3-6 h) acute infarct identification. We developed a deep learning model that detects and delineates suspected early acute infarcts on NCCT, using diffusion MRI as ground truth (3566 NCCT/MRI training patient pairs). The model substantially outperformed 3 expert neuroradiologists on a test set of 150 CT scans of patients who were potential candidates for thrombectomy (60 stroke-negative, 90 stroke-positive middle cerebral artery territory only infarcts), with sensitivity 96% (specificity 72%) for the model versus 61-66% (specificity 90-92%) for the experts; model infarct volume estimates also strongly correlated with those of diffusion MRI (r2 > 0.98). When this 150 CT test set was expanded to include a total of 364 CT scans with a more heterogeneous distribution of infarct locations (94 stroke-negative, 270 stroke-positive mixed territory infarcts), model sensitivity was 97%, specificity 99%, for detection of infarcts larger than the 70 mL volume threshold used for patient selection in several major randomized controlled trials of thrombectomy treatment.


Assuntos
Aprendizado Profundo , Acidente Vascular Cerebral , Humanos , Tomografia Computadorizada por Raios X , Acidente Vascular Cerebral/diagnóstico por imagem , Imageamento por Ressonância Magnética , Infarto da Artéria Cerebral Média
6.
PLoS One ; 17(4): e0267213, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35486572

RESUMO

A standardized objective evaluation method is needed to compare machine learning (ML) algorithms as these tools become available for clinical use. Therefore, we designed, built, and tested an evaluation pipeline with the goal of normalizing performance measurement of independently developed algorithms, using a common test dataset of our clinical imaging. Three vendor applications for detecting solid, part-solid, and groundglass lung nodules in chest CT examinations were assessed in this retrospective study using our data-preprocessing and algorithm assessment chain. The pipeline included tools for image cohort creation and de-identification; report and image annotation for ground-truth labeling; server partitioning to receive vendor "black box" algorithms and to enable model testing on our internal clinical data (100 chest CTs with 243 nodules) from within our security firewall; model validation and result visualization; and performance assessment calculating algorithm recall, precision, and receiver operating characteristic curves (ROC). Algorithm true positives, false positives, false negatives, recall, and precision for detecting lung nodules were as follows: Vendor-1 (194, 23, 49, 0.80, 0.89); Vendor-2 (182, 270, 61, 0.75, 0.40); Vendor-3 (75, 120, 168, 0.32, 0.39). The AUCs for detection of solid (0.61-0.74), groundglass (0.66-0.86) and part-solid (0.52-0.86) nodules varied between the three vendors. Our ML model validation pipeline enabled testing of multi-vendor algorithms within the institutional firewall. Wide variations in algorithm performance for detection as well as classification of lung nodules justifies the premise for a standardized objective ML algorithm evaluation process.


Assuntos
Neoplasias Pulmonares , Algoritmos , Humanos , Neoplasias Pulmonares/diagnóstico , Aprendizado de Máquina , Estudos Retrospectivos , Tomografia Computadorizada por Raios X/métodos
7.
Acad Radiol ; 2021 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-34836775

RESUMO

Concerns over need for CT radiation dose optimization and reduction led to improved scanner efficiency and introduction of several reconstruction techniques and image processing-based software. The latest technologies use artificial intelligence (AI) for CT dose optimization and image quality improvement. While CT dose optimization has and can benefit from AI, variations in scanner technologies, reconstruction methods, and scan protocols can lead to substantial variations in radiation doses and image quality across and within different scanners. These variations in turn can influence performance of AI algorithms being deployed for tasks such as detection, segmentation, characterization, and quantification. We review the complex relationship between AI and CT radiation dose.

8.
J Am Coll Radiol ; 17(12): 1653-1662, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32592660

RESUMO

OBJECTIVE: We developed deep learning algorithms to automatically assess BI-RADS breast density. METHODS: Using a large multi-institution patient cohort of 108,230 digital screening mammograms from the Digital Mammographic Imaging Screening Trial, we investigated the effect of data, model, and training parameters on overall model performance and provided crowdsourcing evaluation from the attendees of the ACR 2019 Annual Meeting. RESULTS: Our best-performing algorithm achieved good agreement with radiologists who were qualified interpreters of mammograms, with a four-class κ of 0.667. When training was performed with randomly sampled images from the data set versus sampling equal number of images from each density category, the model predictions were biased away from the low-prevalence categories such as extremely dense breasts. The net result was an increase in sensitivity and a decrease in specificity for predicting dense breasts for equal class compared with random sampling. We also found that the performance of the model degrades when we evaluate on digital mammography data formats that differ from the one that we trained on, emphasizing the importance of multi-institutional training sets. Lastly, we showed that crowdsourced annotations, including those from attendees who routinely read mammograms, had higher agreement with our algorithm than with the original interpreting radiologists. CONCLUSION: We demonstrated the possible parameters that can influence the performance of the model and how crowdsourcing can be used for evaluation. This study was performed in tandem with the development of the ACR AI-LAB, a platform for democratizing artificial intelligence.


Assuntos
Neoplasias da Mama , Crowdsourcing , Aprendizado Profundo , Inteligência Artificial , Densidade da Mama , Neoplasias da Mama/diagnóstico por imagem , Feminino , Humanos , Mamografia
9.
J Digit Imaging ; 33(2): 334-340, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31515753

RESUMO

The purpose of this study was to assess if clinical indications, patient location, and imaging sites predict the viewing pattern of referring physicians for CT and MR of the head, chest, and abdomen. Our study included 166,953 CT/MR images of head/chest/abdomen in 2016-2017 in the outpatient (OP, n = 83,981 CT/MR), inpatient (IP, n = 51,052), and emergency (ED, n = 31,920) settings. There were 125,329 CT/MR performed in the hospital setting and 41,624 in one of the nine off-campus locations. We extracted information regarding body region (head/chest/abdomen), patient location, and imaging site from the electronic medical records (EPIC). We recorded clinical indications and the number of times referring physicians viewed CT/MR (defined as the number of separate views of imaging in the EPIC). Data were analyzed with the Microsoft SQL and SPSS statistical software. About 33% of IP CT and MR studies are viewed > 6 times compared to 7% for OP and 19% of ED studies (p < 0.001). Conversely, most OP studies (55%) were viewed 1-2 times only, compared to 21% for IP and 38% for ED studies (p < 0.001). In-hospital exams are viewed (≥ 6 views; 39% studies) more frequently than off-campus imaging (≥ 6 views; 17% studies) (p < 0.001). For head CT/MR, certain clinical indications (i.e., stroke) had higher viewing rates compared to other clinical indications such as malignancy, headache, and dizziness. Conversely, for chest CT, dyspnea-hypoxia had much higher viewing rates (> 6 times) in IP (55%) and ED (46%) than in OP settings (22%). Patient location and imaging site regardless of clinical indications have a profound effect on viewing patterns of referring physicians. Understanding viewing patterns of the referring physicians can help guide interpretation priorities and finding communication for imaging exams based on patient location, imaging site, and clinical indications. The information can help in the efficient delivery of patient care.


Assuntos
Médicos , Tomografia Computadorizada por Raios X , Abdome , Comunicação , Registros Eletrônicos de Saúde , Humanos
10.
Radiology ; 262(2): 544-9, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22084210

RESUMO

PURPOSE: To measure the proportion of high-cost imaging generated by a radiologist's recommendation and to identify the imaging findings resulting in follow-up. MATERIALS AND METHODS: This retrospective HIPAA-compliant study had institutional review board approval, with waiver of informed consent. A recommended examination was defined as one performed within a single episode of care (defined as fewer than 60 days after the initial imaging) following a radiologist's recommendation in a prior examination report. Chest and abdominal computed tomography (CT), brain and lumbar spine magnetic resonance (MR) imaging, and body positron emission tomography were included for analysis. From a database of all radiology examinations (approximately 200,000) at one institution over a 6-month period, a computerized search identified all high-cost examinations that were preceded by an examination containing a radiologist recommendation. Medical records were reviewed to verify accuracy of the recommending-recommended examination pairs and to determine the reason for the radiologist's recommendation. For proportions, 95% confidence intervals were calculated. RESULTS: Overall, 1558 of 29,232 (5.3%) high-cost examinations followed a radiologist's recommendation. Chest CT was the high-cost examination most often resulting from a radiologist's recommendation (878 of 9331, 9.4%), followed by abdominal CT (390 of 10,258, 3.8%) and brain MR imaging (222 of 6436, 3.4%). The examination types with the highest numbers of follow-up examinations were chest radiography (n=431), chest CT (n=410), abdominal CT (n=214), and abdominal ultrasonography (n=120). The most common findings resulting in follow-up were pulmonary nodules or masses (559 of 1558, 35.9%), other pulmonary abnormalities (150 of 1558, 9.6%), adenopathy (103 of 1558, 6.6%), renal lesions (101 of 1558, 6.5%), and negative examination findings (101 of 1558, 6.5%). CONCLUSION: Radiologists' recommendations account for only a small proportion of outpatient high-cost imaging examinations. Pulmonary nodule follow-up is the most common cause for radiologist-generated examinations.


Assuntos
Diagnóstico por Imagem/economia , Custos de Cuidados de Saúde/estatística & dados numéricos , Padrões de Prática Médica/economia , Serviço Hospitalar de Radiologia/economia , Encaminhamento e Consulta/economia , Boston , Diagnóstico por Imagem/estatística & dados numéricos , Seguimentos , Humanos , Padrões de Prática Médica/estatística & dados numéricos , Encaminhamento e Consulta/estatística & dados numéricos
11.
J Digit Imaging ; 22(6): 629-40, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18543033

RESUMO

The purpose of our study was to demonstrate the use of Natural Language Processing (Leximer), along with Online Analytic Processing, (NLP-OLAP), for extraction of finding trends in a large radiology practice. Prior studies have validated the Natural Language Processing (NLP) program, Leximer for classifying unstructured radiology reports based on the presence of positive radiology findings (F (POS)) and negative radiology findings (F (NEG)). The F (POS) included new relevant radiology findings and any change in status from prior imaging. Electronic radiology reports from 1995-2002 and data from analysis of these reports with NLP-Leximer were saved in a data warehouse and exported to a multidimensional structure called the Radcube. Various relational queries on the data in the Radcube were performed using OLAP technique. Thus, NLP-OLAP was applied to determine trends of F (POS) in different radiology exams for different patient and examination attributes. Pivot tables were exported from NLP-OLAP interface to Microsoft Excel for statistical analysis. Radcube allowed rapid and comprehensive analysis of F (POS) and F (NEG) trends in a large radiology report database. Trends of F (POS) were extracted for different patient attributes such as age groups, gender, clinical indications, diseases with ICD codes, patient types (inpatient, ambulatory), imaging characteristics such as imaging modalities, referring physicians, radiology subspecialties, and body regions. Data analysis showed substantial differences between F (POS) rates for different imaging modalities ranging from 23.1% (mammography, 49,163/212,906) to 85.8% (nuclear medicine, 93,852/109,374; p < 0.0001). In conclusion, NLP-OLAP can help in analysis of yield of different radiology exams from a large radiology report database.


Assuntos
Diagnóstico por Imagem/métodos , Armazenamento e Recuperação da Informação , Sistemas Computadorizados de Registros Médicos , Processamento de Linguagem Natural , Sistemas de Informação em Radiologia , Bases de Dados Factuais , Processamento Eletrônico de Dados , Feminino , Humanos , Modelos Logísticos , Masculino , Administração da Prática Médica/organização & administração , Probabilidade , Intensificação de Imagem Radiográfica/métodos , Sensibilidade e Especificidade
12.
J Am Coll Radiol ; 5(3): 197-204, 2008 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-18312968

RESUMO

PURPOSE: The study purpose was to describe the use of natural language processing (NLP) and online analytic processing (OLAP) for assessing patterns in recommendations in unstructured radiology reports on the basis of patient and imaging characteristics, such as age, gender, referring physicians, radiology subspecialty, modality, indications, diseases, and patient status (inpatient vs outpatient). MATERIALS AND METHODS: A database of 4,279,179 radiology reports from a single tertiary health care center during a 10-year period (1995-2004) was created. The database includes reports of computed tomography, magnetic resonance imaging, fluoroscopy, nuclear medicine, ultrasound, radiography, mammography, angiography, special procedures, and unclassified imaging tests with patient demographics. A clinical data mining and analysis NLP program (Leximer, Nuance Inc, Burlington, Massachusetts) in conjunction with OLAP was used for classifying reports into those with recommendations (I(REC)) and without recommendations (N(REC)) for imaging and determining I(REC) rates for different patient age groups, gender, imaging modalities, indications, diseases, subspecialties, and referring physicians. In addition, temporal trends for I(REC) were also determined. RESULTS: There was a significant difference in the I(REC) rates in different age groups, varying between 4.8% (10-19 years) and 9.5% (>70 years) (P <.0001). Significant variations in I(REC) rates were observed for different imaging modalities, with the highest rates for computed tomography (17.3%, 100,493/581,032). The I(REC) rates varied significantly for different subspecialties and among radiologists within a subspecialty (P < .0001). For most modalities, outpatients had a higher rate of recommendations when compared with inpatients. CONCLUSION: The radiology reports database analyzed with NLP in conjunction with OLAP revealed considerable differences between recommendation trends for different imaging modalities and other patient and imaging characteristics.


Assuntos
Tomada de Decisões Assistida por Computador , Diagnóstico por Imagem/métodos , Diretrizes para o Planejamento em Saúde , Processamento de Linguagem Natural , Adolescente , Adulto , Fatores Etários , Idoso , Angiografia/métodos , Criança , Pré-Escolar , Estudos Transversais , Diagnóstico por Imagem/normas , Feminino , Humanos , Lactente , Imageamento por Ressonância Magnética/métodos , Masculino , Pessoa de Meia-Idade , Controle de Qualidade , Radiologia/normas , Serviço Hospitalar de Radiologia , Sistema de Registros , Estudos Retrospectivos , Fatores de Risco , Sensibilidade e Especificidade , Fatores Sexuais , Tomografia Computadorizada por Raios X/métodos , Ultrassonografia Doppler/métodos , Estados Unidos
13.
Radiology ; 242(3): 857-64, 2007 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-17325070

RESUMO

PURPOSE: To retrospectively measure repeat rates for high-cost imaging studies, determining their causes and trends, and the impact of radiologist recommendations for a repeat examination on imaging volume. MATERIALS AND METHODS: This HIPAA-compliant study had institutional review board approval, with waiver of informed consent. Repeat examination was defined as a same-modality examination performed in the same patient within 0 days to 7 months of a first examination. From a database of all radiology examinations (>2.9 million) at one institution from May 1996 to June 2003, a computerized search identified head, spine, chest, and abdominal computed tomographic (CT), brain and spine magnetic resonance (MR) imaging, pelvic ultrasonography (US), and nuclear cardiology examinations with a prior examination of the same type within 7 months. Examination pairs were subdivided into studies repeated at less than 2 weeks, between 2 weeks and 2 months, or between 2 and 7 months. Automated classification of radiology reports revealed whether a repeat examination from June 2002 to June 2003 had been preceded by a radiologist recommendation on the prior report. Trends over time were analyzed with linear regression, and 95% confidence intervals were calculated. RESULTS: Between July 2002 and June 2003, 31 111 of 100 335 examinations (31%) were repeat examinations. Body CT (9057 of 20 177 [45%] chest and 8319 of 22 438 [37%] abdomen) and brain imaging (6823 of 18 378 [37%] CT and 3427 of 11 455 [30%] MR imaging) represented the highest repeat categories. Among five high-cost, high-volume imaging examinations, 6426 of 85 014 (8%) followed a report with a radiologist recommendation. Most common indications for examination repetition were neurologic surveillance within 2 weeks and cancer follow-up at 2-7 months. From 1997 to mid-2003, MR imaging and CT repeat rates increased (0.71% per year [P < .01] and 1.87% per year [P < .01], respectively). CONCLUSION: Repeat examinations account for nearly one-third of high-cost radiology examinations and represent an increasing proportion of such examinations. Most repeat examinations are initiated clinically without a recommendation by a radiologist.


Assuntos
Diagnóstico por Imagem/economia , Diagnóstico por Imagem/estatística & dados numéricos , Custos de Cuidados de Saúde/estatística & dados numéricos , Serviço Hospitalar de Radiologia/economia , Serviço Hospitalar de Radiologia/estatística & dados numéricos , Encaminhamento e Consulta/economia , Encaminhamento e Consulta/estatística & dados numéricos , Estados Unidos
14.
Radiology ; 234(2): 323-9, 2005 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-15591435

RESUMO

PURPOSE: To validate the accuracy of Lexicon Mediated Entropy Reduction (LEXIMER), a new information theory-based computer algorithm developed by the authors for independent analysis and classification of unstructured radiology reports based on the presence of clinically important findings (F(T), where (T) represents "true") and recommendations for subsequent action (R(T)). MATERIALS AND METHODS: The study was approved by the Human Research Committee of the institutional review board. Consecutive de-identified radiology reports (n = 1059) comprising results of barium studies (n = 99), computed tomography (n = 107), mammography (n = 90), magnetic resonance imaging (n = 108), nuclear medicine (n = 99), positron emission tomography (n = 106), radiography (n = 212), ultrasonography (n = 131), and vascular procedures (n = 107) were independently analyzed by two radiologists and then with LEXIMER to categorize the reports into F(T) and F(T)0 (containing or not containing clinically important findings) categories and R(T) and R(T)0 (containing or not containing recommendations for subsequent action) categories. Accuracy, sensitivity, specificity, and positive and negative predictive values of LEXIMER for placing reports into F(T) and F(T)0 and R(T) and R(T)0 categories were assessed by using appropriate statistical tests. RESULTS: There was strong interobserver concordance between the two radiologists for placing radiology reports into F(T) and R(T) categories (kappa = 0.9, P < .01). For the LEXIMER program, accuracy, sensitivity, specificity, and positive and negative predictive values, respectively, were 97.5% (95% confidence interval [CI]: 96.6%, 98.5%), 98.9% (95% CI: 97.9%, 99.6%), 94.9% (95% CI: 93.1%, 96.0%), 97.5% (95% CI: 96.6%, 98.0%), and 97.7% (95% CI: 95.8%, 98.8%) for placing radiology reports into F(T) and F(T)0 categories and 99.6% (95% CI: 99.2%, 99.9%), 98.2% (95% CI: 95.0%, 99.6%), 99.9% (95% CI: 99.4%, 99.99%), 99.4% (95% CI: 96.3%, 99.9%), and 99.7% (95% CI: 98.9%, 99.9%) for placing reports into R(T) and R(T)0 categories. CONCLUSION: LEXIMER is an accurate automated engine for evaluating the percentage positivity of clinically important findings and rates of recommendation for subsequent action in unstructured radiology reports.


Assuntos
Algoritmos , Tomada de Decisões Assistida por Computador , Sistemas de Informação em Radiologia , Reações Falso-Negativas , Reações Falso-Positivas , Humanos , Imageamento por Ressonância Magnética , Mamografia , Variações Dependentes do Observador , Cintilografia , Sensibilidade e Especificidade , Tomografia Computadorizada de Emissão de Fóton Único , Tomografia Computadorizada por Raios X , Ultrassonografia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA