Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 127
Filtrar
1.
Methods ; 222: 19-27, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38141869

RESUMO

The International Classification of Diseases (ICD) serves as a global healthcare administration standard, with one of its editions being ICD-10-CM, an enhanced diagnostic classification system featuring numerous new codes for specific anatomic sites, co-morbidities, and causes. These additions facilitate conveying the complexities of various diseases. Currently, ICD-10 coding is widely adopted worldwide. However, public hospitals in Pakistan have yet to implement it and automate the coding process. In this research, we implemented ICD-10-CM coding for a private database and named it Clinical Pool of Liver Transplant (CPLT). Additionally, we proposed a novel deep learning model called Deep Recurrent-Convolution Neural Network with a lambda-scaled Attention module (DRCNN-ATT) using the CPLT database to achieve automatic ICD-10-CM coding. DRCNN-ATT combines a bi-directional long short-term memory network (bi-LSTM), a multi-scale convolutional neural network (MS-CNN), and a lambda-scaled attention module. Experimental results demonstrate that deep recurrent convolutional neural network (DRCNN) faces attention score vanishing problem with a standard attention module for automatic ICD coding. However, adding a lambda-scaled attention module resolves this issue. We evaluated DRCNN-ATT model using two distinct datasets: a private CPLT dataset and a public MIMIC III top 50 dataset. The results indicate that the DRCNN-ATT model outperformed various baselines by generating 0.862 micro F1 and 0.25 macro F1 scores on CPLT dataset and 0.705 micro F1 and 0.655 macro F1 scores on MIMIC III top 50 dataset. Furthermore, we also deployed our model for automatic ICD-10-CM coding using ngrok and the Flask APIs, which receives input, processes it, and then returns the results.


Assuntos
Aprendizado Profundo , Classificação Internacional de Doenças , Redes Neurais de Computação
2.
BMC Psychiatry ; 24(1): 430, 2024 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-38858711

RESUMO

OBJECTIVE: In a growing list of countries, patients are granted access to their clinical notes ("open notes") as part of their online record access. Especially in the field of mental health, open notes remain controversial with some clinicians perceiving open notes as a tool for improving therapeutic outcomes by increasing patient involvement, while others fear that patients might experience psychological distress and perceived stigmatization, particularly when reading clinicians' notes. More research is needed to optimize the benefits and mitigate the risks. METHODS: Using a qualitative research design, we conducted semi-structured interviews with psychiatrists practicing in Germany, to explore what conditions they believe need to be in place to ensure successful implementation of open notes in psychiatric practice as well as expected subsequent changes to their workload and treatment outcomes. Data were analyzed using thematic analysis. RESULTS: We interviewed 18 psychiatrists; interviewees believed four key conditions needed to be in place prior to implementation of open notes including careful consideration of (1) diagnoses and symptom severity, (2) the availability of additional time for writing clinical notes and discussing them with patients, (3) available resources and system compatibility, and (4) legal and data protection aspects. As a result of introducing open notes, interviewees expected changes in documentation, treatment processes, and doctor-physician interaction. While open notes were expected to improve transparency and trust, participants anticipated negative unintended consequences including the risk of deteriorating therapeutic relationships due to note access-related misunderstandings and conflicts. CONCLUSION: Psychiatrists practiced in Germany where open notes have not yet been established as part of the healthcare data infrastructure. Interviewees were supportive of open notes but had some reservations. They found open notes to be generally beneficial but anticipated effects to vary depending on patient characteristics. Clear guidelines for managing access, time constraints, usability, and privacy are crucial. Open notes were perceived to increase transparency and patient involvement but were also believed to raise issues of stigmatization and conflicts.


Assuntos
Atitude do Pessoal de Saúde , Psiquiatria , Pesquisa Qualitativa , Humanos , Masculino , Feminino , Alemanha , Adulto , Pessoa de Meia-Idade , Relações Médico-Paciente , Registros Eletrônicos de Saúde , Transtornos Mentais/psicologia , Transtornos Mentais/terapia , Psiquiatras
3.
J Med Internet Res ; 26: e49704, 2024 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-39405109

RESUMO

BACKGROUND: Studies have shown that patients have difficulty understanding medical jargon in electronic health record (EHR) notes, particularly patients with low health literacy. In creating the NoteAid dictionary of medical jargon for patients, a panel of medical experts selected terms they perceived as needing definitions for patients. OBJECTIVE: This study aims to determine whether experts and laypeople agree on what constitutes medical jargon. METHODS: Using an observational study design, we compared the ability of medical experts and laypeople to identify medical jargon in EHR notes. The laypeople were recruited from Amazon Mechanical Turk. Participants were shown 20 sentences from EHR notes, which contained 325 potential jargon terms as identified by the medical experts. We collected demographic information about the laypeople's age, sex, race or ethnicity, education, native language, and health literacy. Health literacy was measured with the Single Item Literacy Screener. Our evaluation metrics were the proportion of terms rated as jargon, sensitivity, specificity, Fleiss κ for agreement among medical experts and among laypeople, and the Kendall rank correlation statistic between the medical experts and laypeople. We performed subgroup analyses by layperson characteristics. We fit a beta regression model with a logit link to examine the association between layperson characteristics and whether a term was classified as jargon. RESULTS: The average proportion of terms identified as jargon by the medical experts was 59% (1150/1950, 95% CI 56.1%-61.8%), and the average proportion of terms identified as jargon by the laypeople overall was 25.6% (22,480/87,750, 95% CI 25%-26.2%). There was good agreement among medical experts (Fleiss κ=0.781, 95% CI 0.753-0.809) and fair agreement among laypeople (Fleiss κ=0.590, 95% CI 0.589-0.591). The beta regression model had a pseudo-R2 of 0.071, indicating that demographic characteristics explained very little of the variability in the proportion of terms identified as jargon by laypeople. Using laypeople's identification of jargon as the gold standard, the medical experts had high sensitivity (91.7%, 95% CI 90.1%-93.3%) and specificity (88.2%, 95% CI 86%-90.5%) in identifying jargon terms. CONCLUSIONS: To ensure coverage of possible jargon terms, the medical experts were loose in selecting terms for inclusion. Fair agreement among laypersons shows that this is needed, as there is a variety of opinions among laypersons about what is considered jargon. We showed that medical experts could accurately identify jargon terms for annotation that would be useful for laypeople.


Assuntos
Registros Eletrônicos de Saúde , Letramento em Saúde , Humanos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Feminino , Masculino , Adulto , Letramento em Saúde/estatística & dados numéricos , Pessoa de Meia-Idade , Terminologia como Assunto
4.
J Med Internet Res ; 26: e53367, 2024 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-38573752

RESUMO

BACKGROUND: Real-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records. OBJECTIVE: This study sought to validate and test an artificial intelligence (AI)-based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients. We specifically study patients presenting to the emergency department (ED) who can be sentinel cases in an outbreak. METHODS: Subjects in this retrospective cohort study are patients who are 21 years of age and younger, who presented to a pediatric ED at a large academic children's hospital between March 1, 2020, and May 31, 2022. The ED notes for all patients were processed with an NLP pipeline tuned to detect the mention of 11 COVID-19 symptoms based on Centers for Disease Control and Prevention (CDC) criteria. For a gold standard, 3 subject matter experts labeled 226 ED notes and had strong agreement (F1-score=0.986; positive predictive value [PPV]=0.972; and sensitivity=1.0). F1-score, PPV, and sensitivity were used to compare the performance of both NLP and the International Classification of Diseases, 10th Revision (ICD-10) coding to the gold standard chart review. As a formative use case, variations in symptom patterns were measured across SARS-CoV-2 variant eras. RESULTS: There were 85,678 ED encounters during the study period, including 4% (n=3420) with patients with COVID-19. NLP was more accurate at identifying encounters with patients that had any of the COVID-19 symptoms (F1-score=0.796) than ICD-10 codes (F1-score =0.451). NLP accuracy was higher for positive symptoms (sensitivity=0.930) than ICD-10 (sensitivity=0.300). However, ICD-10 accuracy was higher for negative symptoms (specificity=0.994) than NLP (specificity=0.917). Congestion or runny nose showed the highest accuracy difference (NLP: F1-score=0.828 and ICD-10: F1-score=0.042). For encounters with patients with COVID-19, prevalence estimates of each NLP symptom differed across variant eras. Patients with COVID-19 were more likely to have each NLP symptom detected than patients without this disease. Effect sizes (odds ratios) varied across pandemic eras. CONCLUSIONS: This study establishes the value of AI-based NLP as a highly effective tool for real-time COVID-19 symptom detection in pediatric patients, outperforming traditional ICD-10 methods. It also reveals the evolving nature of symptom prevalence across different virus variants, underscoring the need for dynamic, technology-driven approaches in infectious disease surveillance.


Assuntos
Biovigilância , COVID-19 , Médicos , SARS-CoV-2 , Estados Unidos , Humanos , Criança , Inteligência Artificial , Estudos Retrospectivos , COVID-19/diagnóstico , COVID-19/epidemiologia
5.
BMC Med Inform Decis Mak ; 24(1): 296, 2024 Oct 10.
Artigo em Inglês | MEDLINE | ID: mdl-39390479

RESUMO

BACKGROUND: Social and behavioral determinants of health (SBDH) are associated with a variety of health and utilization outcomes, yet these factors are not routinely documented in the structured fields of electronic health records (EHR). The objective of this study was to evaluate different machine learning approaches for detection of SBDH from the unstructured clinical notes in the EHR. METHODS: Latent Semantic Indexing (LSI) was applied to 2,083,180 clinical notes corresponding to 46,146 patients in the MIMIC-III dataset. Using LSI, patients were ranked based on conceptual relevance to a set of keywords (lexicons) pertaining to 15 different SBDH categories. For Generative Pretrained Transformer (GPT) models, API requests were made with a Python script to connect to the OpenAI services in Azure, using gpt-3.5-turbo-1106 and gpt-4-1106-preview models. Prediction of SBDH categories were performed using a logistic regression model that included age, gender, race and SBDH ICD-9 codes. RESULTS: LSI retrieved patients according to 15 SBDH domains, with an overall average PPV ≥ 83%. Using manually curated gold standard (GS) sets for nine SBDH categories, the macro-F1 score of LSI (0.74) was better than ICD-9 (0.71) and GPT-3.5 (0.54), but lower than GPT-4 (0.80). Due to document size limitations, only a subset of the GS cases could be processed by GPT-3.5 (55.8%) and GPT-4 (94.2%), compared to LSI (100%). Using common GS subsets for nine different SBDH categories, the macro-F1 of ICD-9 combined with either LSI (mean 0.88, 95% CI 0.82-0.93), GPT-3.5 (0.86, 0.82-0.91) or GPT-4 (0.88, 0.83-0.94) was not significantly different. After including age, gender, race and ICD-9 in a logistic regression model, the AUC for prediction of six out of the nine SBDH categories was higher for LSI compared to GPT-4.0. CONCLUSIONS: These results demonstrate that the LSI approach performs comparable to more recent large language models, such as GPT-3.5 and GPT-4.0, when using the same set of documents. Importantly, LSI is robust, deterministic, and does not have document-size limitations or cost implications, which make it more amenable to real-world applications in health systems.


Assuntos
Registros Eletrônicos de Saúde , Semântica , Determinantes Sociais da Saúde , Humanos , Aprendizado de Máquina , Masculino , Feminino , Adulto , Pessoa de Meia-Idade
6.
BMC Med Inform Decis Mak ; 24(1): 154, 2024 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-38835009

RESUMO

BACKGROUND: Extracting research of domain criteria (RDoC) from high-risk populations like those with post-traumatic stress disorder (PTSD) is crucial for positive mental health improvements and policy enhancements. The intricacies of collecting, integrating, and effectively leveraging clinical notes for this purpose introduce complexities. METHODS: In our study, we created a natural language processing (NLP) workflow to analyze electronic medical record (EMR) data and identify and extract research of domain criteria using a pre-trained transformer-based natural language model, all-mpnet-base-v2. We subsequently built dictionaries from 100,000 clinical notes and analyzed 5.67 million clinical notes from 38,807 PTSD patients from the University of Pittsburgh Medical Center. Subsequently, we showcased the significance of our approach by extracting and visualizing RDoC information in two use cases: (i) across multiple patient populations and (ii) throughout various disease trajectories. RESULTS: The sentence transformer model demonstrated high F1 macro scores across all RDoC domains, achieving the highest performance with a cosine similarity threshold value of 0.3. This ensured an F1 score of at least 80% across all RDoC domains. The study revealed consistent reductions in all six RDoC domains among PTSD patients after psychotherapy. We found that 60.6% of PTSD women have at least one abnormal instance of the six RDoC domains as compared to PTSD men (51.3%), with 45.1% of PTSD women with higher levels of sensorimotor disturbances compared to men (41.3%). We also found that 57.3% of PTSD patients have at least one abnormal instance of the six RDoC domains based on our records. Also, veterans had the higher abnormalities of negative and positive valence systems (60% and 51.9% of veterans respectively) compared to non-veterans (59.1% and 49.2% respectively). The domains following first diagnoses of PTSD were associated with heightened cue reactivity to trauma, suicide, alcohol, and substance consumption. CONCLUSIONS: The findings provide initial insights into RDoC functioning in different populations and disease trajectories. Natural language processing proves valuable for capturing real-time, context dependent RDoC instances from extensive clinical notes.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Transtornos de Estresse Pós-Traumáticos , Humanos , Transtornos de Estresse Pós-Traumáticos/terapia , Masculino , Feminino , Adulto , Pessoa de Meia-Idade
7.
J Nurs Scholarsh ; 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38739091

RESUMO

INTRODUCTION: Home healthcare (HHC) enables patients to receive healthcare services within their homes to manage chronic conditions and recover from illnesses. Recent research has identified disparities in HHC based on race or ethnicity. Social determinants of health (SDOH) describe the external factors influencing a patient's health, such as access to care and social support. Individuals from racially or ethnically minoritized communities are known to be disproportionately affected by SDOH. Existing evidence suggests that SDOH are documented in clinical notes. However, no prior study has investigated the documentation of SDOH across individuals from different racial or ethnic backgrounds in the HHC setting. This study aimed to (1) describe frequencies of SDOH documented in clinical notes by race or ethnicity and (2) determine associations between race or ethnicity and SDOH documentation. DESIGN: Retrospective data analysis. METHODS: We conducted a cross-sectional secondary data analysis of 86,866 HHC episodes representing 65,693 unique patients from one large HHC agency in New York collected between January 1, 2015, and December 31, 2017. We reported the frequency of six SDOH (physical environment, social environment, housing and economic circumstances, food insecurity, access to care, and education and literacy) documented in clinical notes across individuals reported as Asian/Pacific Islander, Black, Hispanic, multi-racial, Native American, or White. We analyzed differences in SDOH documentation by race or ethnicity using logistic regression models. RESULTS: Compared to patients reported as White, patients across other racial or ethnic groups had higher frequencies of SDOH documented in their clinical notes. Our results suggest that race or ethnicity is associated with SDOH documentation in HHC. CONCLUSION: As the study of SDOH in HHC continues to evolve, our results provide a foundation to evaluate social information in the HHC setting and understand how it influences the quality of care provided. CLINICAL RELEVANCE: The results of this exploratory study can help clinicians understand the differences in SDOH across individuals from different racial and ethnic groups and serve as a foundation for future research aimed at fostering more inclusive HHC documentation practices.

8.
J Gen Intern Med ; 38(9): 2123-2129, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-36854867

RESUMO

BACKGROUND: Ambulatory diagnostic errors are increasingly being recognized as an important quality and safety issue, and while measures of diagnostic quality have been sought, tools to evaluate diagnostic assessments in the medical record are lacking. OBJECTIVE: To develop and test a tool to measure diagnostic assessment note quality in primary care urgent encounters and identify common elements and areas for improvement in diagnostic assessment. DESIGN: Retrospective chart review of urgent care encounters at an urban academic setting. PARTICIPANTS: Primary care physicians. MAIN MEASURES: The Assessing the Assessment (ATA) instrument was evaluated for inter-rater reliability, internal consistency, and findings from its application to EHR notes. KEY RESULTS: ATA had reasonable performance characteristics (kappa 0.63, overall Cronbach's alpha 0.76). Variability in diagnostic assessment was seen in several domains. Two components of situational awareness tended to be well-documented ("Don't miss diagnoses" present in 84% of charts, red flag symptoms in 87%), while Psychosocial context was present only 18% of the time. CONCLUSIONS: The ATA tool is a promising framework for assessing and identifying areas for improvement in diagnostic assessments documented in clinical encounters.


Assuntos
Assistência Ambulatorial , Prontuários Médicos , Humanos , Reprodutibilidade dos Testes , Estudos Retrospectivos , Erros de Diagnóstico/prevenção & controle
9.
Methods ; 205: 97-105, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35781051

RESUMO

The International Classification of Diseases (ICD), which is endorsed by the World Health Organization, is a diagnostic classification standard. ICD codes store, retrieve, and analyze health information to make clinical decisions. Currently, ICD coding has been adopted by more than 137 countries. However, in Pakistan, very few hospitals have implemented ICD coding and conducted different epidemiological studies. Moreover, none of them have reported the spectrum of liver disease burden based on ICD coding, nor implemented automated ICD coding. In this study, we annotated ICD codes for the database of the liver transplant unit of the Pir Abdul Qadir Shah Jeelani Institute of Medical Sciences. We named this database Medical Information Mart for Liver Transplantation (MIMLT). The results revealed that the database contains 34 ICD codes, of which V70.8 is the most frequent code. Furthermore, we determined the spectrum of liver disease burden in liver recipients based on ICD coding. We found that chronic hepatitis C (070.54) is the most frequent indication for liver transplantation. Additionally, we implemented automated ICD coding utilizing the MIMLT database and proposed a novel Deep Recurrent Convolutional Neural Network with Transfer Learning through pre-trained Embeddings (DRCNNTLe) model, which is an extended version of our DRCNN-HP model. DRCNNTLe extracts robust text representations from its pre-trained embedding layer, which is trained on a large domain-specific MIMIC III database corpus. The results indicate that utilizing pre-trained word embeddings, which are trained on large domain-specific corpora can significantly improve the performance of the DRCNNTLe model and provide state-of-the-art results when the target database is small.


Assuntos
Codificação Clínica , Registros Eletrônicos de Saúde , Classificação Internacional de Doenças , Aprendizado de Máquina , Redes Neurais de Computação
10.
J Biomed Inform ; 146: 104482, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37652343

RESUMO

OBJECTIVE: Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions. METHODS: In this scoping review, we aim to provide a comprehensive overview of the current state of the field and identify key concepts, types of studies, and research gaps with a focus on biomedical images and texts joint learning, mainly because these two were the most commonly available data types in MDL research. RESULT: This study reviewed the current uses of multimodal deep learning on five tasks: (1) Report generation, (2) Visual question answering, (3) Cross-modal retrieval, (4) Computer-aided diagnosis, and (5) Semantic segmentation. CONCLUSION: Our results highlight the diverse applications and potential of MDL and suggest directions for future research in the field. We hope our review will facilitate the collaboration of natural language processing (NLP) and medical imaging communities and support the next generation of decision-making and computer-assisted diagnostic system development.


Assuntos
Aprendizado Profundo , Diagnóstico por Imagem , Semântica , Processamento de Linguagem Natural , Diagnóstico por Computador
11.
J Biomed Inform ; 140: 104336, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36958461

RESUMO

A clinical sentiment is a judgment, thought or attitude promoted by an observation with respect to the health of an individual. Sentiment analysis has drawn attention in the healthcare domain for secondary use of data from clinical narratives, with a variety of applications including predicting the likelihood of emerging mental illnesses or clinical outcomes. The current state of research has not yet been summarized. This study presents results from a scoping review aiming at providing an overview of sentiment analysis of clinical narratives in order to summarize existing research and identify open research gaps. The scoping review was carried out in line with the PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) guideline. Studies were identified by searching 4 electronic databases (e.g., PubMed, IEEE Xplore) in addition to conducting backward and forward reference list checking of the included studies. We extracted information on use cases, methods and tools applied, used datasets and performance of the sentiment analysis approach. Of 1,200 citations retrieved, 29 unique studies were included in the review covering a period of 8 years. Most studies apply general domain tools (e.g. TextBlob) and sentiment lexicons (e.g. SentiWordNet) for realizing use cases such as prediction of clinical outcomes; others proposed new domain-specific sentiment analysis approaches based on machine learning. Accuracy values between 71.5-88.2% are reported. Data used for evaluation and test are often retrieved from MIMIC databases or i2b2 challenges. Latest developments related to artificial neural networks are not yet fully considered in this domain. We conclude that future research should focus on developing a gold standard sentiment lexicon, adapted to the specific characteristics of clinical narratives. Efforts have to be made to either augment existing or create new high-quality labeled data sets of clinical narratives. Last, the suitability of state-of-the-art machine learning methods for natural language processing and in particular transformer-based models should be investigated for their application for sentiment analysis of clinical narratives.


Assuntos
Transtornos Mentais , Análise de Sentimentos , Humanos , Algoritmos , Atitude , Narração
12.
J Intensive Care Med ; 38(7): 630-634, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36740933

RESUMO

BACKGROUND: Using History and Physical Examination (H&P) notes, we investigated potential racial differences in documented chief complaints and problems among sepsis patients admitted to the intensive care unit. METHODS: Patient records from Medical Information Mart for Intensive Care (MIMIC-III) dataset indicating a diagnosis of sepsis were included. First recorded clinical notes for each hospital admission were assessed; free text information was specifically extracted on (1) chief complaints, and (2) problems recorded in the Assessment & Plan (A&P) section. The top 10 for each were compared between Black and White patients. RESULTS: In initial H&P notes of 17 434 sepsis patients (n = 1229 Black and n = 9806 White), the top 10 most common chief complaints were somewhat similar between Black and White patients. However, relative differences existed in terms of ranking, specifically for altered mental status which was more commonly reported in Black versus White patients (11.7% vs 7.8% P < .001). Among text in the A&P, sepsis was documented significantly less frequently among Black versus White patients: 11.8% versus 14.3%, P = .001. Racial differences were not detected in vital signs and laboratory values. CONCLUSIONS: This analysis supports the hypothesis that there may be racial differences in early sepsis presentation and possible provider interpretation of these complaints.


Assuntos
Disparidades em Assistência à Saúde , Sepse , Humanos , Hospitalização , Grupos Raciais , Estudos Retrospectivos , Sepse/diagnóstico , Brancos , Negro ou Afro-Americano
13.
BMC Med Inform Decis Mak ; 23(1): 86, 2023 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-37147628

RESUMO

BACKGROUND: Computational text phenotyping is the practice of identifying patients with certain disorders and traits from clinical notes. Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts. METHODS: We propose a method using ontologies and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT). The ontology-driven framework includes two steps: (i) Text-to-UMLS, extracting phenotypes by contextually linking mentions to concepts in Unified Medical Language System (UMLS), with a Named Entity Recognition and Linking (NER+L) tool, SemEHR, and weak supervision with customised rules and contextual mention representation; (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). The weakly supervised approach is proposed to learn a phenotype confirmation model to improve Text-to-UMLS linking, without annotated data from domain experts. We evaluated the approach on three clinical datasets, MIMIC-III discharge summaries, MIMIC-III radiology reports, and NHS Tayside brain imaging reports from two institutions in the US and the UK, with annotations. RESULTS: The improvements in the precision were pronounced (by over 30% to 50% absolute score for Text-to-UMLS linking), with almost no loss of recall compared to the existing NER+L tool, SemEHR. Results on radiology reports from MIMIC-III and NHS Tayside were consistent with the discharge summaries. The overall pipeline processing clinical notes can extract rare disease cases, mostly uncaptured in structured data (manually assigned ICD codes). CONCLUSION: The study provides empirical evidence for the task by applying a weakly supervised NLP pipeline on clinical notes. The proposed weak supervised deep learning approach requires no human annotation except for validation and testing, by leveraging ontologies, NER+L tools, and contextual representations. The study also demonstrates that Natural Language Processing (NLP) can complement traditional ICD-based approaches to better estimate rare diseases in clinical notes. We discuss the usefulness and limitations of the weak supervision approach and propose directions for future studies.


Assuntos
Processamento de Linguagem Natural , Doenças Raras , Humanos , Doenças Raras/diagnóstico , Aprendizado de Máquina , Unified Medical Language System , Classificação Internacional de Doenças
14.
J Biomed Inform ; 126: 103969, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34864210

RESUMO

With clinical trials unable to detect all potential adverse reactions to drugs and medical devices prior to their release into the market, accurate post-market surveillance is critical to ensure their safety and efficacy. Electronic health records (EHR) contain rich observational patient data, making them a valuable source to actively monitor the safety of drugs and devices. While structured EHR data and spontaneous reporting systems often underreport the complexities of patient encounters and outcomes, free-text clinical notes offer greater detail about a patient's status. Previous studies have proposed machine learning methods to detect adverse events from clinical notes, but suffer from manually extracted features, reliance on costly hand-labeled data, and lack of validation on external datasets. To address these challenges, we develop a weakly-supervised machine learning framework for adverse event detection from unstructured clinical notes and evaluate it on insulin pump failure as a test case. Our model accurately detected cases of pump failure with 0.842 PR AUC on the holdout test set and 0.815 PR AUC when validated on an external dataset. Our approach allowed us to leverage a large dataset with far less hand-labeled data and can be easily transferred to additional adverse events for scalable post-market surveillance.


Assuntos
Registros Eletrônicos de Saúde , Aprendizado de Máquina , Humanos
15.
J Biomed Inform ; 134: 104195, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36150641

RESUMO

BACKGROUND: Electronic Health Records (EHRs) aggregate diverse information at the patient level, holding a trajectory representative of the evolution of the patient health status throughout time. Although this information provides context and can be leveraged by physicians to monitor patient health and make more accurate prognoses/diagnoses, patient records can contain information from very long time spans, which combined with the rapid generation rate of medical data makes clinical decision making more complex. Patient trajectory modelling can assist by exploring existing information in a scalable manner, and can contribute in augmenting health care quality by fostering preventive medicine practices (e.g. earlier disease diagnosis). METHODS: We propose a solution to model patient trajectories that combines different types of information (e.g. clinical text, standard codes) and considers the temporal aspect of clinical data. This solution leverages two different architectures: one supporting flexible sets of input features, to convert patient admissions into dense representations; and a second exploring extracted admission representations in a recurrent-based architecture, where patient trajectories are processed in sub-sequences using a sliding window mechanism. RESULTS: The developed solution was evaluated on two different clinical outcomes, unexpected patient readmission and disease progression, using the publicly available Medical Information Mart for Intensive Care (MIMIC)-III clinical database. The results obtained demonstrate the potential of the first architecture to model readmission and diagnoses prediction using single patient admissions. While information from clinical text did not show the discriminative power observed in other existing works, this may be explained by the need to fine-tune the clinicalBERT model. Finally, we demonstrate the potential of the sequence-based architecture using a sliding window mechanism to represent the input data, attaining comparable performances to other existing solutions. CONCLUSION: Herein, we explored DL-based techniques to model patient trajectories and propose two flexible architectures that explore patient admissions on an individual and sequence basis. The combination of clinical text with other types of information led to positive results, which can be further improved by including a fine-tuned version of clinicalBERT in the architectures. The proposed solution can be publicly accessed at https://github.com/bioinformatics-ua/PatientTM.


Assuntos
Readmissão do Paciente , Médicos , Progressão da Doença , Registros Eletrônicos de Saúde , Humanos , Prognóstico
16.
Soc Psychiatry Psychiatr Epidemiol ; 57(9): 1897-1906, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35445841

RESUMO

PURPOSE: Estimates of parenthood in individuals with psychosis range from 27 to 63%. This number has likely increased due to the introduction of newer anti-psychotics and shorter hospital stays. The problems of psychosis can affect patients' capacity to offer the consistent, responsive care required for healthy child development. The following research questions were assessed: (1) what proportion of these patients have their children correctly recorded in their clinical notes, (2) what proportion of patients in secondary care with a psychotic diagnosis have children, and (3) what sociodemographic characteristics are associated with parenthood in this population. METHODS: This study used CRIS (Clinical Record Interactive Search) to search for patients with a diagnosis of non-affective or affective psychosis (F20-29, F31.2 or F31.5) within a UK NHS Trust. A binomial regression model was fitted to identify the variables associated with parenthood. RESULTS: Fewer than half of the parents in the sample had their children recorded in the correct field in their clinical notes. Of 5173 patients with psychosis, 2006 (38.8%) were parents. Characteristics associated with parenthood included being female, older age, higher socioeconomic status, renting or owning, having ever been married, being unemployed, not being White (British) and not having a diagnosis of schizophrenia. CONCLUSION: Over one-third of patients with psychosis were parents, and the study indicates that not all NHS Trusts are recording dependants accurately. Many variables were strongly associated with parenthood and these findings may help target interventions for this population.


Assuntos
Transtornos Psicóticos , Esquizofrenia , Criança , Estudos Transversais , Feminino , Humanos , Masculino , Transtornos Psicóticos/diagnóstico , Transtornos Psicóticos/epidemiologia , Classe Social , Desemprego
17.
J Med Syst ; 46(12): 96, 2022 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-36380246

RESUMO

Petabytes of health data are collected annually across the globe in electronic health records (EHR), including significant information stored as unstructured free text. However, the lack of effective mechanisms to securely share clinical text has inhibited its full utilization. We propose a new method, DataSifterText, to generate partially synthetic clinical free-text that can be safely shared between stakeholders (e.g., clinicians, STEM researchers, engineers, analysts, and healthcare providers), limiting the re-identification risk while providing significantly better utility preservation than suppressing or generalizing sensitive tokens. The method creates partially synthetic free-text data, which inherits the joint population distribution of the original data, and disguises the location of true and obfuscated words. Under certain obfuscation levels, the resulting synthetic text was sufficiently altered with different choices, orders, and frequencies of words compared to the original records. The differences were comparable to machine-generated (fully synthetic) text reported in previous studies. We applied DataSifterText to two medical case studies. In the CDC work injury application, using privacy protection, 60.9-86.5% of the synthetic descriptions belong to the same cluster as the original descriptions, demonstrating better utility preservation than the naïve content suppressing method (45.8-85.7%). In the MIMIC III application, the generated synthetic data maintained over 80% of the original information regarding patients' overall health conditions. The reported DataSifterText statistical obfuscation results indicate that the technique provides sufficient privacy protection (low identification risk) while preserving population-level information (high utility).


Assuntos
Registros Eletrônicos de Saúde , Privacidade , Humanos
18.
Methods ; 173: 75-82, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31301375

RESUMO

The wide applications of automatic disease inference in many medical fields improve the efficiency of medical treatments. Many efforts have been made to predict patients' future health conditions according to their full clinical texts, clinical measurements or medical codes. Symptoms reflect the onset of diseases and can provide credible information for disease diagnosis. In this study, we propose a new disease inference method by extracting symptoms and integrating two symptom representation approaches. To reduce the uncertainty and irregularity of symptom descriptions in Electronic Medical Records (EMR), a comprehensive clinical knowledge database consisting of massive amount of data about diseases, symptoms, and their relationships, we extract symptoms with existing nature language process tool Metamap which is designed for biomedical texts. To take advantages of the complex relationship between symptoms and diseases to enhance the accuracy of disease inference, we present two symptom representation models: term frequency-inverse document frequency (TF-IDF) model for the representation of the relationship between symptoms and diseases and Word2Vec for the expression of the semantic relationship between symptoms. Based on these two symptom representations, we employ the bidirectional Long Short Term Memory networks (BiLSTMs) to model symptom sequences in EMR. Our proposed model shows a significant improvement in term of AUC (0.895) and F1 (0.572) for 50 diseases in MIMIC-III dataset. The results illustrate that the model with the combination of the two symptom representations perform better than the one with only one of them.


Assuntos
Registros Eletrônicos de Saúde , Memória de Curto Prazo/fisiologia , Redes Neurais de Computação , Algoritmos , Humanos , Processamento de Linguagem Natural , Semântica
19.
J Biomed Inform ; 117: 103754, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33831537

RESUMO

Respiratory diseases, including asthma, bronchitis, pneumonia, and upper respiratory tract infection (RTI), are among the most common diseases in clinics. The similarities among the symptoms of these diseases precludes prompt diagnosis upon the patients' arrival. In pediatrics, the patients' limited ability in expressing their situation makes precise diagnosis even harder. This becomes worse in primary hospitals, where the lack of medical imaging devices and the doctors' limited experience further increase the difficulty of distinguishing among similar diseases. In this paper, a pediatric fine-grained diagnosis-assistant system is proposed to provide prompt and precise diagnosis using solely clinical notes upon admission, which would assist clinicians without changing the diagnostic process. The proposed system consists of two stages: a test result structuralization stage and a disease identification stage. The first stage structuralizes test results by extracting relevant numerical values from clinical notes, and the disease identification stage provides a diagnosis based on text-form clinical notes and the structured data obtained from the first stage. A novel deep learning algorithm was developed for the disease identification stage, where techniques including adaptive feature infusion and multi-modal attentive fusion were introduced to fuse structured and text data together. Clinical notes from over 12000 patients with respiratory diseases were used to train a deep learning model, and clinical notes from a non-overlapping set of about 1800 patients were used to evaluate the performance of the trained model. The average precisions (AP) for pneumonia, RTI, bronchitis and asthma are 0.878, 0.857, 0.714, and 0.825, respectively, achieving a mean AP (mAP) of 0.819. These results demonstrate that our proposed fine-grained diagnosis-assistant system provides precise identification of the diseases.


Assuntos
Aprendizado Profundo , Algoritmos , Criança , Hospitalização , Humanos
20.
J Biomed Inform ; 120: 103849, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34214696

RESUMO

BACKGROUND: The content of the clinical notes that have been continuously collected along patients' health history has the potential to provide relevant information about treatments and diseases, and to increase the value of structured data available in Electronic Health Records (EHR) databases. EHR databases are currently being used in observational studies which lead to important findings in medical and biomedical sciences. However, the information present in clinical notes is not being used in those studies, since the computational analysis of this unstructured data is much complex in comparison to structured data. METHODS: We propose a two-stage workflow for solving an existing gap in Extraction, Transformation and Loading (ETL) procedures regarding observational databases. The first stage of the workflow extracts prescriptions present in patient's clinical notes, while the second stage harmonises the extracted information into their standard definition and stores the resulting information in a common database schema used in observational studies. RESULTS: We validated this methodology using two distinct data sets, in which the goal was to extract and store drug related information in a new Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) database. We analysed the performance of the used annotator as well as its limitations. Finally, we described some practical examples of how users can explore these datasets once migrated to OMOP CDM databases. CONCLUSION: With this methodology, we were able to show a strategy for using the information extracted from the clinical notes in business intelligence tools, or for other applications such as data exploration through the use of SQL queries. Besides, the extracted information complements the data present in OMOP CDM databases which was not directly available in the EHR database.


Assuntos
Registros Eletrônicos de Saúde , Preparações Farmacêuticas , Bases de Dados Factuais , Atenção à Saúde , Humanos , Fluxo de Trabalho
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa