Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 73
Filtrar
1.
J Med Internet Res ; 26: e53367, 2024 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-38573752

RESUMO

BACKGROUND: Real-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records. OBJECTIVE: This study sought to validate and test an artificial intelligence (AI)-based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients. We specifically study patients presenting to the emergency department (ED) who can be sentinel cases in an outbreak. METHODS: Subjects in this retrospective cohort study are patients who are 21 years of age and younger, who presented to a pediatric ED at a large academic children's hospital between March 1, 2020, and May 31, 2022. The ED notes for all patients were processed with an NLP pipeline tuned to detect the mention of 11 COVID-19 symptoms based on Centers for Disease Control and Prevention (CDC) criteria. For a gold standard, 3 subject matter experts labeled 226 ED notes and had strong agreement (F1-score=0.986; positive predictive value [PPV]=0.972; and sensitivity=1.0). F1-score, PPV, and sensitivity were used to compare the performance of both NLP and the International Classification of Diseases, 10th Revision (ICD-10) coding to the gold standard chart review. As a formative use case, variations in symptom patterns were measured across SARS-CoV-2 variant eras. RESULTS: There were 85,678 ED encounters during the study period, including 4% (n=3420) with patients with COVID-19. NLP was more accurate at identifying encounters with patients that had any of the COVID-19 symptoms (F1-score=0.796) than ICD-10 codes (F1-score =0.451). NLP accuracy was higher for positive symptoms (sensitivity=0.930) than ICD-10 (sensitivity=0.300). However, ICD-10 accuracy was higher for negative symptoms (specificity=0.994) than NLP (specificity=0.917). Congestion or runny nose showed the highest accuracy difference (NLP: F1-score=0.828 and ICD-10: F1-score=0.042). For encounters with patients with COVID-19, prevalence estimates of each NLP symptom differed across variant eras. Patients with COVID-19 were more likely to have each NLP symptom detected than patients without this disease. Effect sizes (odds ratios) varied across pandemic eras. CONCLUSIONS: This study establishes the value of AI-based NLP as a highly effective tool for real-time COVID-19 symptom detection in pediatric patients, outperforming traditional ICD-10 methods. It also reveals the evolving nature of symptom prevalence across different virus variants, underscoring the need for dynamic, technology-driven approaches in infectious disease surveillance.


Assuntos
Biovigilância , COVID-19 , Médicos , SARS-CoV-2 , Estados Unidos , Humanos , Criança , Inteligência Artificial , Estudos Retrospectivos , COVID-19/diagnóstico , COVID-19/epidemiologia
2.
J Biomed Inform ; 95: 103219, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31150777

RESUMO

Clinical narratives are a valuable source of information for both patient care and biomedical research. Given the unstructured nature of medical reports, specific automatic techniques are required to extract relevant entities from such texts. In the natural language processing (NLP) community, this task is often addressed by using supervised methods. To develop such methods, both reliably-annotated corpora and elaborately designed features are needed. Despite the recent advances on corpora collection and annotation, research on multiple domains and languages is still limited. In addition, to compute the features required for supervised classification, suitable language- and domain-specific tools are needed. In this work, we propose a novel application of recurrent neural networks (RNNs) for event extraction from medical reports written in Italian. To train and evaluate the proposed approach, we annotated a corpus of 75 cardiology reports for a total of 4365 mentions of relevant events and their attributes (e.g., the polarity). For the annotation task, we developed specific annotation guidelines, which are provided together with this paper. The RNN-based classifier was trained on a training set including 3335 events (60 documents). The resulting model was integrated into an NLP pipeline that uses a dictionary lookup approach to search for relevant concepts inside the text. A test set of 1030 events (15 documents) was used to evaluate and compare different pipeline configurations. As a main result, using the RNN-based classifier instead of the dictionary lookup approach allowed increasing recall from 52.4% to 88.9%, and precision from 81.1% to 88.2%. Further, using the two methods in combination, we obtained final recall, precision, and F1 score of 91.7%, 88.6%, and 90.1%, respectively. These experiments indicate that integrating a well-performing RNN-based classifier with a standard knowledge-based approach can be a good strategy to extract information from clinical text in non-English languages.


Assuntos
Mineração de Dados/métodos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Cardiopatias , Humanos , Itália , Redes Neurais de Computação , Semântica
3.
Nano Lett ; 18(6): 3449-3453, 2018 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-29767985

RESUMO

We use resonant soft X-ray holography to image the insulator-metal phase transition in vanadium dioxide with element and polarization specificity and nanometer spatial resolution. We observe that nanoscale inhomogeneity in the film results in spatial-dependent transition pathways between the insulating and metallic states. Additional nanoscale phases form in the vicinity of defects which are not apparent in the initial or final states of the system, which would be missed in area-integrated X-ray absorption measurements. These intermediate phases are vital to understand the phase transition in VO2, and our results demonstrate how resonant imaging can be used to understand the electronic properties of phase-separated correlated materials obtained by X-ray absorption.

6.
Nat Mater ; 14(10): 991-5, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26213898

RESUMO

The extreme electro-optical contrast between crystalline and amorphous states in phase-change materials is routinely exploited in optical data storage and future applications include universal memories, flexible displays, reconfigurable optical circuits, and logic devices. Optical contrast is believed to arise owing to a change in crystallinity. Here we show that the connection between optical properties and structure can be broken. Using a combination of single-shot femtosecond electron diffraction and optical spectroscopy, we simultaneously follow the lattice dynamics and dielectric function in the phase-change material Ge2Sb2Te5 during an irreversible state transformation. The dielectric function changes by 30% within 100 fs owing to a rapid depletion of electrons from resonantly bonded states. This occurs without perturbing the crystallinity of the lattice, which heats with a 2-ps time constant. The optical changes are an order of magnitude larger than those achievable with silicon and present new routes to manipulate light on an ultrafast timescale without structural changes.

7.
J Craniofac Surg ; 26(6): 1992-6, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26147021

RESUMO

INTRODUCTION: Osseous defects of the craniofacial skeleton occur frequently in congenital, posttraumatic, and postoncologic deformities. The field of scaffold-based bone engineering emerged to address the limitations of using autologous bone for reconstruction of such circumstances. In this work, the authors evaluate 2 modifications of three-dimensional collagen-glycosaminoglycan scaffolds in an effort to optimize structural integrity and osteogenic induction. METHODS: Human mesenchymal stem cells (hMSCs) were cultured in osteogenic media on nonmineralized collagen-glycosaminoglycan (C-GAG) and nanoparticulate mineralized collagen-glycosaminoglycan (MC-GAG) type I scaffolds, in the absence and presence of cross-linking. At 1, 7, and 14 days, mRNA expression was analyzed using quantitative real-time -reverse-transcriptase polymerase chain reaction for osteocalcin (OCN) and bone sialoprotein (BSP). Structural contraction was measured by the ability of the scaffolds to maintain their original dimensions. Mineralization was detected by microcomputed tomographic (micro-CT) imaging at 8 weeks. Statistical analyses were performed with Student t-test. RESULTS: Nanoparticulate mineralization of collagen-glycosaminoglycan scaffolds increased expression of both OCN and BSP. Cross-linking of both C-GAG and MC-GAG resulted in decreased osteogenic gene expression; however, structural contraction was significantly decreased after cross-linking. Human mesenchymal stem cells-directed mineralization, detected by micro-CT, was increased in nanoparticulate mineralized scaffolds, although the density of mineralization was decreased in the presence of cross-linking. CONCLUSIONS: Optimization of scaffold material is an essential component of moving toward clinically translatable engineered bone. Our current study demonstrates that the combination of nanoparticulate mineralization and chemical cross-linking of C-GAG scaffolds generates a highly osteogenic and structurally stable scaffold.


Assuntos
Regeneração Óssea/fisiologia , Sulfatos de Condroitina/química , Colágeno Tipo I/química , Minerais/química , Osteogênese/fisiologia , Engenharia Tecidual/métodos , Alicerces Teciduais/química , Calcificação Fisiológica/fisiologia , Compostos de Cálcio/química , Hidróxido de Cálcio/química , Fosfatos de Cálcio/química , Técnicas de Cultura de Células , Células Cultivadas , Reagentes de Ligações Cruzadas/química , Humanos , Sialoproteína de Ligação à Integrina/análise , Células-Tronco Mesenquimais/fisiologia , Nanopartículas/química , Nitratos/química , Osteocalcina/análise , Ácidos Fosfóricos/química , Microtomografia por Raio-X/métodos
8.
Nano Lett ; 14(4): 1995-9, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24588125

RESUMO

Measurement and understanding of the microscopic pathways materials follow as they transform is crucial for the design and synthesis of new metastable phases of matter. Here we employ femtosecond single-shot X-ray diffraction techniques to measure the pathways underlying solid-solid phase transitions in cadmium sulfide nanorods, a model system for a general class of martensitic transformations. Using picosecond rise-time laser-generated shocks to trigger the transformation, we directly observe the transition state dynamics associated with the wurtzite-to-rocksalt structural phase transformation in cadmium sulfide with atomic-scale resolution. A stress-dependent transition path is observed. At high peak stresses, the majority of the sample is converted directly into the rocksalt phase with no evidence of an intermediate prior to rocksalt formation. At lower peak stresses, a transient five-coordinated intermediate structure is observed consistent with previous first principles modeling.

9.
J Am Med Inform Assoc ; 31(8): 1638-1647, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-38860521

RESUMO

OBJECTIVE: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app "listener" that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API). METHODS: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and artificial intelligence (AI) for processing unstructured text. RESULTS: Cumulus relies on containerized, cloud-hosted software, installed within a healthcare organization's security envelope. Cumulus accesses EHR data via the Bulk FHIR interface and streamlines automated processing and sharing. The modular design enables use of the latest AI and natural language processing tools and supports provider autonomy and administrative simplicity. In an initial test, Cumulus was deployed across 5 healthcare systems each partnered with public health. Cumulus output is patient counts which were aggregated into a table stratifying variables of interest to enable population health studies. All code is available open source. A policy stipulating that only aggregate data leave the institution greatly facilitated data sharing agreements. DISCUSSION AND CONCLUSION: Cumulus addresses barriers to data sharing based on (1) federally required support for standard APIs, (2) increasing use of cloud computing, and (3) advances in AI. There is potential for scalability to support learning across myriad network configurations and use cases.


Assuntos
Inteligência Artificial , Registros Eletrônicos de Saúde , Humanos , Software , Computação em Nuvem , Interoperabilidade da Informação em Saúde , Disseminação de Informação
10.
medRxiv ; 2024 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-38370642

RESUMO

Objective: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app 'listener' that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API). Methods: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and AI for processing unstructured text. Results: Cumulus relies on containerized, cloud-hosted software, installed within a healthcare organization's security envelope. Cumulus accesses EHR data via the Bulk FHIR interface and streamlines automated processing and sharing. The modular design enables use of the latest AI and natural language processing tools and supports provider autonomy and administrative simplicity. In an initial test, Cumulus was deployed across five healthcare systems each partnered with public health. Cumulus output is patient counts which were aggregated into a table stratifying variables of interest to enable population health studies. All code is available open source. A policy stipulating that only aggregate data leave the institution greatly facilitated data sharing agreements. Discussion and Conclusion: Cumulus addresses barriers to data sharing based on (1) federally required support for standard APIs (2), increasing use of cloud computing, and (3) advances in AI. There is potential for scalability to support learning across myriad network configurations and use cases.

11.
Ann Plast Surg ; 71(1): 84-7, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23123614

RESUMO

BACKGROUND: Sternal dehiscence is a grave complication after open heart surgery. Sternal debridement and flap coverage are the mainstays of therapy, but no consensus exists regarding the appropriate level of debridement. More recently, the use of vacuum-assisted closure devices has been advocated as a bridge to definitive closure, but indications for use remain incompletely defined. MATERIALS AND METHODS: A retrospective review of all chest wall reconstructions performed from January 2000 to December 2010 was conducted. The type of operative management was evaluated to assess morbidity, mortality, and length of hospital stay. RESULTS: Fifty-four patients underwent chest wall reconstruction for poststernotomy mediastinitis. Of these patients, 24 underwent conservative sternal debridement with flap closure, 24 underwent radical sternectomy including resection of the costal cartilages followed by flap closure, and 6 underwent radical sternectomy with vacuum-assisted closure therapy followed by flap closure in a delayed fashion. There were 15 patients in the conservative group and 8 patients in the radical sternectomy group who developed postoperative complications (62.5% vs 33.3%, P < 0.05). The conservative sternectomy group had more serious complications requiring reoperation compared to the radical sternectomy group (86.7% vs 25.0%, P < 0.05). The most common complication in the former group was flap dehiscence (8/15, 53.3%), whereas that in the latter group was a superficial wound infection (6/8, 75.0%). There was no significant difference in mortality (25.0% vs 25.0%, P > 0.05%) or length of hospital stay. CONCLUSIONS: Radical sternectomy including the costal cartilages is associated with lower rates of surgical morbidity and reoperation, but not mortality.


Assuntos
Procedimentos Cirúrgicos Cardíacos , Mediastinite/cirurgia , Procedimentos de Cirurgia Plástica/métodos , Complicações Pós-Operatórias/cirurgia , Esterno/cirurgia , Deiscência da Ferida Operatória/cirurgia , Parede Torácica/cirurgia , Procedimentos Cirúrgicos Cardíacos/efeitos adversos , Humanos , Tempo de Internação , Tratamento de Ferimentos com Pressão Negativa , Estudos Retrospectivos , Infecção da Ferida Cirúrgica/cirurgia
12.
Ann Plast Surg ; 70(4): 432-4, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23486132

RESUMO

INTRODUCTION: Every year, nearly 1.2 million people are affected by nonmelanoma skin cancers (NMSCs) in the United States. Most published data focus on comparing the efficacy of Mohs micrographic surgery (MMS) versus traditional surgical excision (TSE) for NMSCs in H-zone lesions of the face. There is paucity of data regarding the 2 treatments in other areas such as the non-H-zone areas of the face, the trunk, and extremities. Our study focused on the efficacy of the 2 treatments in areas of the body where the skin was not of premium. METHOD: A retrospective chart review was performed of patients with NMSCs treated with TSE at the West Los Angeles Veterans Affairs Hospital between 2000 and 2008. Patients with at least a 3-year follow-up were selected for the study. Institutional review board approval was obtained before commencement of the study. Age, sex, and race-matched patients were selected in the MMS group. Data collected included demographic data, tumor characteristics, surgical treatment, reconstructions, recurrence rates, complications, and follow-up course. Data were analyzed using SigmaStat 3.5. RESULTS: A total of 588 patients were treated for NMSCs at our institute between 2000 and 2008, of which 289 patients had non-H-zone, extremity, and trunk lesions. The follow-up period for these patients was at least 3 years. Average age of this group was 67.1 (11.4) with 89.9% being males. Age, sex, and race-matched group of 200 patients treated with MMS for NMSCs were randomly chosen from the same time range. Average size of lesions was 17.4 (16.9) mm in the TSE group and 1.1 (0.4) mm in the MMS group (P < 0.05). Primary reconstruction was performed in non-premium areas (ie, non-H-zone areas of the face, the trunk, and extremities) in 98.7% patients in the TSE group and 61.5% patients in the MMS group (P < 0.05). Secondary reconstructive rate was 1.3% in TSE compared to 37.5% in MMS. Overall recurrence rate was 4.8% (compared to 3% with MMS). Of the 29 patients who had recurrences within the TSE group, 27 were H-zone lesions and 2 were non-H-zone lesions. DISCUSSION: One of the primary goals of NMSC management is to treat the lesion with adequate oncologic margins, while preserving maximal function and cosmesis. Our data look at the non-premium areas to quantify the clinical efficacy of TSE versus MMS. The size of lesions treated by TSE was significantly larger than those treated by MMS in all areas of the body. The primary closure rates were significantly higher and secondary procedure rates significantly lower in the TSE group compared to the MMS group, in non-premium areas. Our data suggest that patients with NMSCs may be more effectively treated with TSE than MMS in non-premium areas of the body. Additional studies are ongoing, including economic modeling and cost analysis.


Assuntos
Cirurgia de Mohs , Neoplasias Cutâneas/patologia , Neoplasias Cutâneas/cirurgia , Idoso , Procedimentos Cirúrgicos Dermatológicos , Feminino , Humanos , Masculino , Estudos Retrospectivos
13.
Proc Conf Assoc Comput Linguist Meet ; 2023: 125-130, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37786810

RESUMO

Text in electronic health records is organized into sections, and classifying those sections into section categories is useful for downstream tasks. In this work, we attempt to improve the transferability of section classification models by combining the dataset-specific knowledge in supervised learning models with the world knowledge inside large language models (LLMs). Surprisingly, we find that zero-shot LLMs out-perform supervised BERT-based models applied to out-of-domain data. We also find that their strengths are synergistic, so that a simple ensemble technique leads to additional performance gains.

14.
medRxiv ; 2023 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-36711461

RESUMO

Objective: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR). Materials and Methods: Statistical classifiers were trained on feature representations derived from unstructured text in patient electronic health records (EHRs). We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier. Results: On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 90.8% (79/87) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier identified an additional 960 positive cases that did not have SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19. Discussion: Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned. Conclusion: COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor intensive labeling efforts.

15.
JAMIA Open ; 6(3): ooad047, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37425487

RESUMO

Objective: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR). Materials and Methods: Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier. Results: On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19. Discussion: Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned. Conclusion: COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts.

16.
medRxiv ; 2023 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-37162963

RESUMO

Objective: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for one institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective", "Object", "Assessment" and "Plan") framework with improved transferability. Materials and methods: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain adaptive pretraining (DAPT) and task adaptive pretraining (TAPT). We added out-of-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. Results: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across three datasets. This improvement was equivalent to adding 50.2 in-domain annotated samples. Discussion: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. Conclusion: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.

17.
medRxiv ; 2023 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-37034815

RESUMO

Objective: To implement an open source, free, and easily deployable high throughput natural language processing module to extract concepts from clinician notes and map them to Fast Healthcare Interoperability Resources (FHIR). Materials and Methods: Using a popular open-source NLP tool (Apache cTAKES), we create FHIR resources that use modifier extensions to represent negation and NLP sourcing, and another extension to represent provenance of extracted concepts. Results: The SMART Text2FHIR Pipeline is an open-source tool, released through standard package managers, and publicly available container images that implement the mappings, enabling ready conversion of clinical text to FHIR. Discussion: With the increased data liquidity because of new interoperability regulations, NLP processes that can output FHIR can enable a common language for transporting structured and unstructured data. This framework can be valuable for critical public health or clinical research use cases. Conclusion: Future work should include mapping more categories of NLP-extracted information into FHIR resources and mappings from additional open-source NLP tools.

18.
J Am Med Inform Assoc ; 31(1): 89-97, 2023 12 22.
Artigo em Inglês | MEDLINE | ID: mdl-37725927

RESUMO

OBJECTIVE: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for 1 institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective," "Object," "Assessment," and "Plan") framework with improved transferability. MATERIALS AND METHODS: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain-adaptive pretraining and task-adaptive pretraining. We added in-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. RESULTS: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across 3 datasets. This improvement was equivalent to adding 35 in-domain annotated samples. DISCUSSION: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. CONCLUSION: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.


Assuntos
Instalações de Saúde , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Redes Neurais de Computação , Tamanho da Amostra
19.
AMIA Annu Symp Proc ; 2023: 514-520, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38222416

RESUMO

Objective: To implement an open source, free, and easily deployable high throughput natural language processing module to extract concepts from clinician notes and map them to Fast Healthcare Interoperability Resources (FHIR). Materials and Methods: Using a popular open-source NLP tool (Apache cTAKES), we create FHIR resources that use modifier extensions to represent negation and NLP sourcing, and another extension to represent provenance of extracted concepts. Results: The SMART Text2FHIR Pipeline is an open-source tool, released through standard package managers, and publicly available container images that implement the mappings, enabling ready conversion of clinical text to FHIR. Discussion: With the increased data liquidity because of new interoperability regulations, NLP processes that can output FHIR can enable a common language for transporting structured and unstructured data. This framework can be valuable for critical public health or clinical research use cases. Conclusion: Future work should include mapping more categories of NLP-extracted information into FHIR resources and mappings from additional open-source NLP tools.


Assuntos
Atenção à Saúde , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural , APACHE
20.
JCO Clin Cancer Inform ; 7: e2300048, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37506330

RESUMO

PURPOSE: Radiotherapy (RT) toxicities can impair survival and quality of life, yet remain understudied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. METHODS: Our corpus consisted of a gold-labeled data set of 1,524 clinical notes from 124 patients with lung cancer treated with RT, manually annotated for Common Terminology Criteria for Adverse Events (CTCAE) v5.0 esophagitis grade, and a silver-labeled data set of 2,420 notes from 1,832 patients from whom toxicity grades had been collected as structured data during clinical care. We fine-tuned statistical and pretrained Bidirectional Encoder Representations from Transformers-based models for three esophagitis classification tasks: task 1, no esophagitis versus grade 1-3; task 2, grade ≤1 versus >1; and task 3, no esophagitis versus grade 1 versus grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT. RESULTS: Fine-tuning of PubMedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for tasks 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by ≥2% for all tasks. Silver-labeled data improved the macro-F1 by ≥3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for tasks 1, 2, and 3, respectively, without additional fine-tuning. CONCLUSION: To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinical notes. This provides proof of concept for NLP-based automated detailed toxicity monitoring in expanded domains.


Assuntos
Neoplasias Esofágicas , Esofagite , Humanos , Processamento de Linguagem Natural , Qualidade de Vida , Prata , Esofagite/diagnóstico , Esofagite/etiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA