Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 114
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Headache ; 64(4): 400-409, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38525734

RESUMO

OBJECTIVE: To develop a natural language processing (NLP) algorithm that can accurately extract headache frequency from free-text clinical notes. BACKGROUND: Headache frequency, defined as the number of days with any headache in a month (or 4 weeks), remains a key parameter in the evaluation of treatment response to migraine preventive medications. However, due to the variations and inconsistencies in documentation by clinicians, significant challenges exist to accurately extract headache frequency from the electronic health record (EHR) by traditional NLP algorithms. METHODS: This was a retrospective cross-sectional study with patients identified from two tertiary headache referral centers, Mayo Clinic Arizona and Mayo Clinic Rochester. All neurology consultation notes written by 15 specialized clinicians (11 headache specialists and 4 nurse practitioners) between 2012 and 2022 were extracted and 1915 notes were used for model fine-tuning (90%) and testing (10%). We employed four different NLP frameworks: (1) ClinicalBERT (Bidirectional Encoder Representations from Transformers) regression model, (2) Generative Pre-Trained Transformer-2 (GPT-2) Question Answering (QA) model zero-shot, (3) GPT-2 QA model few-shot training fine-tuned on clinical notes, and (4) GPT-2 generative model few-shot training fine-tuned on clinical notes to generate the answer by considering the context of included text. RESULTS: The mean (standard deviation) headache frequency of our training and testing datasets were 13.4 (10.9) and 14.4 (11.2), respectively. The GPT-2 generative model was the best-performing model with an accuracy of 0.92 (0.91, 0.93, 95% confidence interval [CI]) and R2 score of 0.89 (0.87, 0.90, 95% CI), and all GPT-2-based models outperformed the ClinicalBERT model in terms of exact matching accuracy. Although the ClinicalBERT regression model had the lowest accuracy of 0.27 (0.26, 0.28), it demonstrated a high R2 score of 0.88 (0.85, 0.89), suggesting the ClinicalBERT model can reasonably predict the headache frequency within a range of ≤ ± 3 days, and the R2 score was higher than the GPT-2 QA zero-shot model or GPT-2 QA model few-shot training fine-tuned model. CONCLUSION: We developed a robust information extraction model based on a state-of-the-art large language model, a GPT-2 generative model that can extract headache frequency from EHR free-text clinical notes with high accuracy and R2 score. It overcame several challenges related to different ways clinicians document headache frequency that were not easily achieved by traditional NLP models. We also showed that GPT-2-based frameworks outperformed ClinicalBERT in terms of accuracy in extracting headache frequency from clinical notes. To facilitate research in the field, we released the GPT-2 generative model and inference code with open-source license of community use in GitHub. Additional fine-tuning of the algorithm might be required when applied to different health-care systems for various clinical use cases.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Estudos Retrospectivos , Estudos Transversais , Masculino , Feminino , Cefaleia , Adulto , Pessoa de Meia-Idade , Algoritmos
2.
J Biomed Inform ; 149: 104548, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38043883

RESUMO

BACKGROUND: A major hurdle for the real time deployment of the AI models is ensuring trustworthiness of these models for the unseen population. More often than not, these complex models are black boxes in which promising results are generated. However, when scrutinized, these models begin to reveal implicit biases during the decision making, particularly for the minority subgroups. METHOD: We develop an efficient adversarial de-biasing approach with partial learning by incorporating the existing concept activation vectors (CAV) methodology, to reduce racial disparities while preserving the performance of the targeted task. CAV is originally a model interpretability technique which we adopted to identify convolution layers responsible for learning race and only fine-tune up to that layer instead of fine-tuning the complete network, limiting the drop in performance RESULTS:: The methodology has been evaluated on two independent medical image case-studies - chest X-ray and mammograms, and we also performed external validation on a different racial population. On the external datasets for the chest X-ray use-case, debiased models (averaged AUC 0.87 ) outperformed the baseline convolution models (averaged AUC 0.57 ) as well as the models trained with the popular fine-tuning strategy (averaged AUC 0.81). Moreover, the mammogram models is debiased using a single dataset (white, black and Asian) and improved the performance on an external datasets (averaged AUC 0.8 to 0.86 ) with completely different population (primarily Hispanic patients). CONCLUSION: In this study, we demonstrated that the adversarial models trained only with internal data performed equally or often outperformed the standard fine-tuning strategy with data from an external setting. The adversarial training approach described can be applied regardless of predictor's model architecture, as long as the convolution model is trained using a gradient-based method. We release the training code with academic open-source license - https://github.com/ramon349/JBI2023_TCAV_debiasing.


Assuntos
Inteligência Artificial , Tomada de Decisão Clínica , Diagnóstico por Imagem , Grupos Raciais , Humanos , Mamografia , Grupos Minoritários , Viés , Disparidades em Assistência à Saúde
3.
BMC Womens Health ; 24(1): 359, 2024 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-38907193

RESUMO

BACKGROUND: Breast imaging clinics in the United States (U.S.) are increasingly implementing breast cancer risk assessment (BCRA) to align with evolving guideline recommendations but with limited uptake of risk-reduction care. Effectively communicating risk information to women is central to implementation efforts, but remains understudied in the U.S. This study aims to characterize, and identify factors associated with women's interest in and preferences for breast cancer risk communication. METHODS: This is a cross-sectional survey study of U.S. women presenting for a mammogram between January and March of 2021 at a large, tertiary breast imaging clinic. Survey items assessed women's interest in knowing their risk and preferences for risk communication if considered to be at high risk in hypothetical situations. Multivariable logistic regression modeling assessed factors associated with women's interest in knowing their personal risk and preferences for details around exact risk estimates. RESULTS: Among 1119 women, 72.7% were interested in knowing their breast cancer risk. If at high risk, 77% preferred to receive their exact risk estimate and preferred verbal (52.9% phone/47% in-person) vs. written (26.5% online/19.5% letter) communications. Adjusted regression analyses found that those with a primary family history of breast cancer were significantly more interested in knowing their risk (OR 1.5, 95% CI 1.0, 2.1, p = 0.04), while those categorized as "more than one race or other" were significantly less interested in knowing their risk (OR 0.4, 95% CI 0.2, 0.9, p = 0.02). Women 60 + years of age were significantly less likely to prefer exact estimates of their risk (OR 0.6, 95% CI 0.5, 0.98, p < 0.01), while women with greater than a high school education were significantly more likely to prefer exact risk estimates (OR 2.5, 95% CI 1.5, 4.2, p < 0.001). CONCLUSION: U.S. women in this study expressed strong interest in knowing their risk and preferred to receive exact risk estimates verbally if found to be at high risk. Sociodemographic and family history influenced women's interest and preferences for risk communication. Breast imaging centers implementing risk assessment should consider strategies tailored to women's preferences to increase interest in risk estimates and improve risk communication.


Assuntos
Neoplasias da Mama , Mamografia , Preferência do Paciente , Humanos , Feminino , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/psicologia , Neoplasias da Mama/diagnóstico por imagem , Estudos Transversais , Pessoa de Meia-Idade , Preferência do Paciente/estatística & dados numéricos , Preferência do Paciente/psicologia , Estados Unidos , Adulto , Mamografia/estatística & dados numéricos , Mamografia/psicologia , Medição de Risco/métodos , Idoso , Comunicação , Inquéritos e Questionários , Centros de Atenção Terciária , Conhecimentos, Atitudes e Prática em Saúde
4.
Gastroenterology ; 163(6): 1531-1546.e8, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-35985511

RESUMO

BACKGROUND & AIMS: To examine whether quantitative pathologic analysis of digitized hematoxylin and eosin slides of colorectal carcinoma (CRC) correlates with clinicopathologic features, molecular alterations, and prognosis. METHODS: A quantitative segmentation algorithm (QuantCRC) was applied to 6468 digitized hematoxylin and eosin slides of CRCs. Fifteen parameters were recorded from each image and tested for associations with clinicopathologic features and molecular alterations. A prognostic model was developed to predict recurrence-free survival using data from the internal cohort (n = 1928) and validated on an internal test (n = 483) and external cohort (n = 938). RESULTS: There were significant differences in QuantCRC according to stage, histologic subtype, grade, venous/lymphatic/perineural invasion, tumor budding, CD8 immunohistochemistry, mismatch repair status, KRAS mutation, BRAF mutation, and CpG methylation. A prognostic model incorporating stage, mismatch repair, and QuantCRC resulted in a Harrell's concordance (c)-index of 0.714 (95% confidence interval [CI], 0.702-0.724) in the internal test and 0.744 (95% CI, 0.741-0.754) in the external cohort. Removing QuantCRC from the model reduced the c-index to 0.679 (95% CI, 0.673-0.694) in the external cohort. Prognostic risk groups were identified, which provided a hazard ratio of 2.24 (95% CI, 1.33-3.87, P = .004) for low vs high-risk stage III CRCs and 2.36 (95% CI, 1.07-5.20, P = .03) for low vs high-risk stage II CRCs, in the external cohort after adjusting for established risk factors. The predicted median 36-month recurrence rate for high-risk stage III CRCs was 32.7% vs 13.4% for low-risk stage III and 15.8% for high-risk stage II vs 5.4% for low-risk stage II CRCs. CONCLUSIONS: QuantCRC provides a powerful adjunct to routine pathologic reporting of CRC. A prognostic model using QuantCRC improves prediction of recurrence-free survival.


Assuntos
Neoplasias Colorretais , Neoplasias Testiculares , Humanos , Masculino , Neoplasias Colorretais/genética , Reparo de Erro de Pareamento de DNA , Amarelo de Eosina-(YS) , Hematoxilina
5.
J Digit Imaging ; 36(1): 105-113, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36344632

RESUMO

Improving detection and follow-up of recommendations made in radiology reports is a critical unmet need. The long and unstructured nature of radiology reports limits the ability of clinicians to assimilate the full report and identify all the pertinent information for prioritizing the critical cases. We developed an automated NLP pipeline using a transformer-based ClinicalBERT++ model which was fine-tuned on 3 M radiology reports and compared against the traditional BERT model. We validated the models on both internal hold-out ED cases from EUH as well as external cases from Mayo Clinic. We also evaluated the model by combining different sections of the radiology reports. On the internal test set of 3819 reports, the ClinicalBERT++ model achieved 0.96 f1-score while the BERT also achieved the same performance using the reason for exam and impression sections. However, ClinicalBERT++ outperformed BERT on the external test dataset of 2039 reports and achieved the highest performance for classifying critical finding reports (0.81 precision and 0.54 recall). The ClinicalBERT++ model has been successfully applied to large-scale radiology reports from 5 different sites. Automated NLP system that can analyze free-text radiology reports, along with the reason for the exam, to identify critical radiology findings and recommendations could enable automated alert notifications to clinicians about the need for clinical follow-up. The clinical significance of our proposed model is that it could be used as an additional layer of safeguard to clinical practice and reduce the chance of important findings reported in a radiology report is not overlooked by clinicians as well as provide a way to retrospectively track large hospital databases for evaluating the documentation of the critical findings.


Assuntos
Processamento de Linguagem Natural , Radiologia , Humanos , Estudos Retrospectivos , Radiografia , Relatório de Pesquisa
6.
J Biomed Inform ; 126: 103969, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34864210

RESUMO

With clinical trials unable to detect all potential adverse reactions to drugs and medical devices prior to their release into the market, accurate post-market surveillance is critical to ensure their safety and efficacy. Electronic health records (EHR) contain rich observational patient data, making them a valuable source to actively monitor the safety of drugs and devices. While structured EHR data and spontaneous reporting systems often underreport the complexities of patient encounters and outcomes, free-text clinical notes offer greater detail about a patient's status. Previous studies have proposed machine learning methods to detect adverse events from clinical notes, but suffer from manually extracted features, reliance on costly hand-labeled data, and lack of validation on external datasets. To address these challenges, we develop a weakly-supervised machine learning framework for adverse event detection from unstructured clinical notes and evaluate it on insulin pump failure as a test case. Our model accurately detected cases of pump failure with 0.842 PR AUC on the holdout test set and 0.815 PR AUC when validated on an external dataset. Our approach allowed us to leverage a large dataset with far less hand-labeled data and can be easily transferred to additional adverse events for scalable post-market surveillance.


Assuntos
Registros Eletrônicos de Saúde , Aprendizado de Máquina , Humanos
7.
Am J Emerg Med ; 51: 388-392, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34839182

RESUMO

BACKGROUND: The Mortality Probability Model (MPM) is used in research and quality improvement to adjust for severity of illness and can also inform triage decisions. However, a limitation for its automated use or application is that it includes the variable "intracranial mass effect" (IME), which requires human engagement with the electronic health record (EHR). We developed and tested a natural language processing (NLP) algorithm to identify IME from CT head reports. METHODS: We obtained initial CT head reports from adult patients who were admitted to the ICU from our ED between 10/2013 and 9/2016. Each head CT head report was labeled yes/no IME by at least two of five independent labelers. The reports were then randomly divided 80/20 into training and test sets. All reports were preprocessed to remove linguistic and style variability, and a dictionary was created to map similar common terms. We tested three vectorization strategies: Term Frequency-Inverse Document frequency (TF-IDF), Word2Vec, and Universal Sentence Encoder to convert the report text to a numerical vector. This vector served as the input to a classification-tree-based ensemble machine learning algorithm (XGBoost). After training, model performance was assessed in the test set using the area under the receiver operating characteristic curve (AUROC). We also divided the continuous range of scores into positive/inconclusive/negative categories for IME. RESULTS: Of the 1202 CT reports in the training set, 308 (25.6%) reports were manually labeled as "yes" for IME. Of the 355 reports in the test set, 108 (30.4%) were labeled as "yes" for IME. The TF-IDF vectorization strategy as an input for the XGBoost model had the best AUROC:-- 0.9625 (95% CI 0.9443-0.9807). TF-IDF score categories were defined and had the following likelihood ratios: "positive" (TF-IDF score > 0.5) LR = 24.59; "inconclusive" (TF-IDF 0.05-0.5) LR = 0.99; and "negative" (TF-IDF < 0.05) LR = 0.05. 82% of reports were classified as either "positive" or "negative". In the test set, only 4 of 199 (2.0%) reports with a "negative" classification were false negatives and only 8 of 93 (8.6%) reports classified as "positive" were false positives. CONCLUSION: NLP can accurately identify IME from free-text reports of head CTs in approximately 80% of records, adequate to allow automatic calculation of MPM based on EHR data for many applications.


Assuntos
Neoplasias Encefálicas/diagnóstico por imagem , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Tomografia Computadorizada por Raios X , Área Sob a Curva , Humanos , Modelos Logísticos , Aprendizado de Máquina , Curva ROC
8.
J Digit Imaging ; 35(2): 137-152, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35022924

RESUMO

In recent years, generative adversarial networks (GANs) have gained tremendous popularity for various imaging related tasks such as artificial image generation to support AI training. GANs are especially useful for medical imaging-related tasks where training datasets are usually limited in size and heavily imbalanced against the diseased class. We present a systematic review, following the PRISMA guidelines, of recent GAN architectures used for medical image analysis to help the readers in making an informed decision before employing GANs in developing medical image classification and segmentation models. We have extracted 54 papers that highlight the capabilities and application of GANs in medical imaging from January 2015 to August 2020 and inclusion criteria for meta-analysis. Our results show four main architectures of GAN that are used for segmentation or classification in medical imaging. We provide a comprehensive overview of recent trends in the application of GANs in clinical diagnosis through medical image segmentation and classification and ultimately share experiences for task-based GAN implementations.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Humanos , Processamento de Imagem Assistida por Computador/métodos
9.
J Digit Imaging ; 35(3): 524-533, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35149938

RESUMO

Scoliosis is a condition of abnormal lateral spinal curvature affecting an estimated 2 to 3% of the US population, or seven million people. The Cobb angle is the standard measurement of spinal curvature in scoliosis but is known to have high interobserver and intraobserver variability. Thus, the objective of this study was to build and validate a system for automatic quantitative evaluation of the Cobb angle and to compare AI generated and human reports in the clinical setting. After IRB was obtained, we retrospectively collected 2150 frontal view scoliosis radiographs at a tertiary referral center (January 1, 2019, to January 1, 2021, ≥ 16 years old, no hardware). The dataset was partitioned into 1505 train (70%), 215 validation (10%), and 430 test images (20%). All thoracic and lumbar vertebral bodies were segmented with bounding boxes, generating approximately 36,550 object annotations that were used to train a Faster R-CNN Resnet-101 object detection model. A controller algorithm was written to localize vertebral centroid coordinates and derive the Cobb properties (angle and endplate) of dominant and secondary curves. AI-derived Cobb angle measurements were compared to the clinical report measurements, and the Spearman rank-order demonstrated significant correlation (0.89, p < 0.001). Mean difference between AI and clinical report angle measurements was 7.34° (95% CI: 5.90-8.78°), which is similar to published literature (up to 10°). We demonstrate the feasibility of an AI system to automate measurement of level-by-level spinal angulation with performance comparable to radiologists.


Assuntos
Escoliose , Adolescente , Inteligência Artificial , Humanos , Vértebras Lombares/diagnóstico por imagem , Aprendizado de Máquina , Reprodutibilidade dos Testes , Estudos Retrospectivos , Escoliose/diagnóstico por imagem
10.
J Biomed Inform ; 123: 103918, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34560275

RESUMO

OBJECTIVE: With increasing patient complexity whose data are stored in fragmented health information systems, automated and time-efficient ways of gathering important information from the patients' medical history are needed for effective clinical decision making. Using COVID-19 as a case study, we developed a query-bot information retrieval system with user-feedback to allow clinicians to ask natural questions to retrieve data from patient notes. MATERIALS AND METHODS: We applied clinicalBERT, a pre-trained contextual language model, to our dataset of patient notes to obtain sentence embeddings, using K-Means to reduce computation time for real-time interaction. Rocchio algorithm was then employed to incorporate user-feedback and improve retrieval performance. RESULTS: In an iterative feedback loop experiment, MAP for final iteration was 0.93/0.94 as compared to initial MAP of 0.66/0.52 for generic and 1./1. compared to 0.79/0.83 for COVID-19 specific queries confirming that contextual model handles the ambiguity in natural language queries and feedback helps to improve retrieval performance. User-in-loop experiment also outperformed the automated pseudo relevance feedback method. Moreover, the null hypothesis which assumes identical precision between initial retrieval and relevance feedback was rejected with high statistical significance (p â‰ª 0.05). Compared to Word2Vec, TF-IDF and bioBERT models, clinicalBERT works optimally considering the balance between response precision and user-feedback. DISCUSSION: Our model works well for generic as well as COVID-19 specific queries. However, some generic queries are not answered as well as others because clustering reduces query performance and vague relations between queries and sentences are considered non-relevant. We also tested our model for queries with the same meaning but different expressions and demonstrated that these query variations yielded similar performance after incorporation of user-feedback. CONCLUSION: In conclusion, we develop an NLP-based query-bot that handles synonyms and natural language ambiguity in order to retrieve relevant information from the patient chart. User-feedback is critical to improve model performance.


Assuntos
COVID-19 , Algoritmos , Retroalimentação , Humanos , Armazenamento e Recuperação da Informação , SARS-CoV-2
11.
Clin Diabetes ; 39(3): 284-292, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-34421204

RESUMO

This retrospective cohort study evaluated diabetes device utilization and the effectiveness of these devices for newly diagnosed type 1 diabetes. Investigators examined the use of continuous glucose monitoring (CGM) systems, self-monitoring of blood glucose (SMBG), continuous subcutaneous insulin infusion (CSII), and multiple daily injection (MDI) insulin regimens and their effects on A1C. The researchers identified 6,250 patients with type 1 diabetes, of whom 32% used CGM and 37.1% used CSII. A higher adoption rate of either CGM or CSII in newly diagnosed type 1 diabetes was noted among White patients and those with private health insurance. CGM users had lower A1C levels than nonusers (P = 0.039), whereas no difference was noted between CSII users and nonusers (P = 0.057). Furthermore, CGM use combined with CSII yielded lower A1C than MDI regimens plus SMBG (P <0.001).

12.
J Digit Imaging ; 34(4): 1005-1013, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34405297

RESUMO

Real-time execution of machine learning (ML) pipelines on radiology images is difficult due to limited computing resources in clinical environments, whereas running them in research clusters requires efficient data transfer capabilities. We developed Niffler, an open-source Digital Imaging and Communications in Medicine (DICOM) framework that enables ML and processing pipelines in research clusters by efficiently retrieving images from the hospitals' PACS and extracting the metadata from the images. We deployed Niffler at our institution (Emory Healthcare, the largest healthcare network in the state of Georgia) and retrieved data from 715 scanners spanning 12 sites, up to 350 GB/day continuously in real-time as a DICOM data stream over the past 2 years. We also used Niffler to retrieve images bulk on-demand based on user-provided filters to facilitate several research projects. This paper presents the architecture and three such use cases of Niffler. First, we executed an IVC filter detection and segmentation pipeline on abdominal radiographs in real-time, which was able to classify 989 test images with an accuracy of 96.0%. Second, we applied the Niffler Metadata Extractor to understand the operational efficiency of individual MRI systems based on calculated metrics. We benchmarked the accuracy of the calculated exam time windows by comparing Niffler against the Clinical Data Warehouse (CDW). Niffler accurately identified the scanners' examination timeframes and idling times, whereas CDW falsely depicted several exam overlaps due to human errors. Third, with metadata extracted from the images by Niffler, we identified scanners with misconfigured time and reconfigured five scanners. Our evaluations highlight how Niffler enables real-time ML and processing pipelines in a research cluster.


Assuntos
Sistemas de Informação em Radiologia , Radiologia , Data Warehousing , Humanos , Aprendizado de Máquina , Radiografia
13.
Entropy (Basel) ; 23(3)2021 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-33800820

RESUMO

Datasets displaying temporal dependencies abound in science and engineering applications, with Markov models representing a simplified and popular view of the temporal dependence structure. In this paper, we consider Bayesian settings that place prior distributions over the parameters of the transition kernel of a Markov model, and seek to characterize the resulting, typically intractable, posterior distributions. We present a Probably Approximately Correct (PAC)-Bayesian analysis of variational Bayes (VB) approximations to tempered Bayesian posterior distributions, bounding the model risk of the VB approximations. Tempered posteriors are known to be robust to model misspecification, and their variational approximations do not suffer the usual problems of over confident approximations. Our results tie the risk bounds to the mixing and ergodic properties of the Markov data generating model. We illustrate the PAC-Bayes bounds through a number of example Markov models, and also consider the situation where the Markov model is misspecified.

14.
J Digit Imaging ; 33(6): 1393-1400, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32495125

RESUMO

The aim of this study is to develop an automated classification method for Brain Tumor Reporting and Data System (BT-RADS) categories from unstructured and structured brain magnetic resonance imaging (MR) reports. This retrospective study included 1410 BT-RADS structured reports dated from January 2014 to December 2017 and a test set of 109 unstructured brain MR reports dated from January 2010 to December 2014. Text vector representations and semantic word embeddings were generated from individual report sections (i.e., "History," "Findings," etc.) using Tf-idf statistics and a fine-tuned word2vec model, respectively. Section-wise ensemble models were trained using gradient boosting (XGBoost), elastic net regularization, and random forests, and classification accuracy was evaluated on an independent test set of unstructured brain MR reports and a validation set of BT-RADS structured reports. Section-wise ensemble models using XGBoost and word2vec semantic word embeddings were more accurate than those using Tf-idf statistics when classifying unstructured reports, with an f1 score of 0.72. In contrast, models using traditional Tf-idf statistics outperformed the word2vec semantic approach for categorization from structured reports, with an f1 score of 0.98. Proposed natural language processing pipeline is capable of inferring BT-RADS report scores from unstructured reports after training on structured report data. Our study provides a detailed experimentation process and may provide guidance for the development of RADS-focused information extraction (IE) applications from structured and unstructured radiology reports.


Assuntos
Processamento de Linguagem Natural , Semântica , Encéfalo/diagnóstico por imagem , Humanos , Espectroscopia de Ressonância Magnética , Estudos Retrospectivos
16.
J Biomed Inform ; 92: 103137, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30807833

RESUMO

We propose an efficient natural language processing approach for inferring the BI-RADS final assessment categories by analyzing only the mammogram findings reported by the mammographer in narrative form. The proposed hybrid method integrates semantic term embedding with distributional semantics, producing a context-aware vector representation of unstructured mammography reports. A large corpus of unannotated mammography reports (300,000) was used to learn the context of the key-terms using a distributional semantics approach, and the trained model was applied to generate context-aware vector representations of the reports annotated with BI-RADS category (22,091). The vectorized reports were utilized to train a supervised classifier to derive the BI-RADS assessment class. Even though the majority of the proposed embedding pipeline is unsupervised, the classifier was able to recognize substantial semantic information for deriving the BI-RADS categorization not only on a holdout internal testset and also on an external validation set (1900 reports). Our proposed method outperforms a recently published domain-specific rule-based system and could be relevant for evaluating concordance between radiologists. With minimal requirement for task specific customization, the proposed method can be easily transferable to a different domain to support large scale text mining or derivation of patient phenotype.


Assuntos
Mama/diagnóstico por imagem , Mineração de Dados/métodos , Aprendizado Profundo , Mamografia , Processamento de Linguagem Natural , Feminino , Humanos , Interpretação de Imagem Radiográfica Assistida por Computador , Semântica
17.
J Digit Imaging ; 32(4): 544-553, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31222557

RESUMO

Radiological measurements are reported in free text reports, and it is challenging to extract such measures for treatment planning such as lesion summarization and cancer response assessment. The purpose of this work is to develop and evaluate a natural language processing (NLP) pipeline that can extract measurements and their core descriptors, such as temporality, anatomical entity, imaging observation, RadLex descriptors, series number, image number, and segment from a wide variety of radiology reports (MR, CT, and mammogram). We created a hybrid NLP pipeline that integrates rule-based feature extraction modules and conditional random field (CRF) model for extraction of the measurements from the radiology reports and links them with clinically relevant features such as anatomical entities or imaging observations. The pipeline was trained on 1117 CT/MR reports, and performance of the system was evaluated on an independent set of 100 expert-annotated CT/MR reports and also tested on 25 mammography reports. The system detected 813 out of 806 measurements in the CT/MR reports; 784 were true positives, 29 were false positives, and 0 were false negatives. Similarly, from the mammography reports, 96% of the measurements with their modifiers were extracted correctly. Our approach could enable the development of computerized applications that can utilize summarized lesion measurements from radiology report of varying modalities and improve practice by tracking the same lesions along multiple radiologic encounters.


Assuntos
Registros Eletrônicos de Saúde , Interpretação de Imagem Assistida por Computador/métodos , Processamento de Linguagem Natural , Sistemas de Informação em Radiologia , Algoritmos , Humanos , Imageamento por Ressonância Magnética/métodos , Mamografia/métodos , Tomografia Computadorizada por Raios X/métodos
18.
J Biomed Inform ; 78: 78-86, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29329701

RESUMO

To date, the methods developed for automated extraction of information from radiology reports are mainly rule-based or dictionary-based, and, therefore, require substantial manual effort to build these systems. Recent efforts to develop automated systems for entity detection have been undertaken, but little work has been done to automatically extract relations and their associated named entities in narrative radiology reports that have comparable accuracy to rule-based methods. Our goal is to extract relations in a unsupervised way from radiology reports without specifying prior domain knowledge. We propose a hybrid approach for information extraction that combines dependency-based parse tree with distributed semantics for generating structured information frames about particular findings/abnormalities from the free-text mammography reports. The proposed IE system obtains a F1-score of 0.94 in terms of completeness of the content in the information frames, which outperforms a state-of-the-art rule-based system in this domain by a significant margin. The proposed system can be leveraged in a variety of applications, such as decision support and information retrieval, and may also easily scale to other radiology domains, since there is no need to tune the system with hand-crafted information extraction rules.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Mamografia/métodos , Sistemas de Informação em Radiologia , Semântica , Algoritmos , Curadoria de Dados , Humanos , Processamento de Linguagem Natural
19.
J Biomed Inform ; 77: 11-20, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29175548

RESUMO

We proposed an unsupervised hybrid method - Intelligent Word Embedding (IWE) that combines neural embedding method with a semantic dictionary mapping technique for creating a dense vector representation of unstructured radiology reports. We applied IWE to generate embedding of chest CT radiology reports from two healthcare organizations and utilized the vector representations to semi-automate report categorization based on clinically relevant categorization related to the diagnosis of pulmonary embolism (PE). We benchmark the performance against a state-of-the-art rule-based tool, PeFinder and out-of-the-box word2vec. On the Stanford test set, the IWE model achieved average F1 score 0.97, whereas the PeFinder scored 0.9 and the original word2vec scored 0.94. On UPMC dataset, the IWE model's average F1 score was 0.94, when the PeFinder scored 0.92 and word2vec scored 0.85. The IWE model had lowest generalization error with highest F1 scores. Of particular interest, the IWE model (trained on the Stanford dataset) outperformed PeFinder on the UPMC dataset which was used originally to tailor the PeFinder model.


Assuntos
Aprendizado de Máquina , Interpretação de Imagem Radiográfica Assistida por Computador , Radiografia Torácica/métodos , Humanos , Processamento de Linguagem Natural , Redes Neurais de Computação , Valor Preditivo dos Testes , Embolia Pulmonar , Radiografia Torácica/tendências , Semântica , Tomografia Computadorizada por Raios X
20.
J Biomed Inform ; 84: 123-135, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29981490

RESUMO

BACKGROUND: The majority of current medical CBIR systems perform retrieval based only on "imaging signatures" generated by extracting pixel-level quantitative features, and only rarely has a feedback mechanism been incorporated to improve retrieval performance. In addition, current medical CBIR approaches do not routinely incorporate semantic terms that model the user's high-level expectations, and this can limit CBIR performance. METHOD: We propose a retrieval framework that exploits a hybrid feature space (HFS) that is built by integrating low-level image features and high-level semantic terms, through rounds of relevance feedback (RF) and performs similarity-based retrieval to support semi-automatic image interpretation. The novelty of the proposed system is that it can impute the semantic features of the query image by reformulating the query vector representation in the HFS via user feedback. We implemented our framework as a prototype that performs the retrieval over a database of 811 radiographic images that contains 69 unique types of bone tumors. RESULTS: We evaluated the system performance by conducting independent reading sessions with two subspecialist musculoskeletal radiologists. For the test set, the proposed retrieval system at fourth RF iteration of the sessions conducted with both the radiologists achieved mean average precision (MAP) value ∼0.90 where the initial MAP with baseline CBIR was 0.20. In addition, we also achieved high prediction accuracy (>0.8) for the majority of the semantic features automatically predicted by the system. CONCLUSION: Our proposed framework addresses some limitations of existing CBIR systems by incorporating user feedback and simultaneously predicting the semantic features of the query image. This obviates the need for the user to provide those terms and makes CBIR search more efficient for inexperience users/trainees. Encouraging results achieved in the current study highlight possible new directions in radiological image interpretation employing semantic CBIR combined with relevance feedback of visual similarity.


Assuntos
Neoplasias Ósseas/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Semântica , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Recém-Nascido , Armazenamento e Recuperação da Informação , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Distribuição Normal , Radiologia/métodos , Reprodutibilidade dos Testes , Software , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA