Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
J Biomed Inform ; 147: 104507, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37778672

RESUMO

BACKGROUND: Although accurate identification of gender identity in the electronic health record (EHR) is crucial for providing equitable health care, particularly for transgender and gender diverse (TGD) populations, it remains a challenging task due to incomplete gender information in structured EHR fields. OBJECTIVE: Using TGD identification as a case study, this research uses NLP and deep learning to build an accurate patient gender identity predictive model, aiming to tackle the challenges of identifying relevant patient-level information from EHR data and reducing annotation work. METHODS: This study included adult patients in a large healthcare system in Boston, MA, between 4/1/2017 to 4/1/2022. To identify relevant information from massive clinical notes, we compiled a list of gender-related keywords through expert curation, literature review, and expansion via a fine-tuned BioWordVec model. This keyword list was used to pre-screen potential TGD individuals and create two datasets for model training, testing, and validation. Dataset I was a balanced dataset that contained clinician-confirmed TGD patients and cases without keywords. Dataset II contained cases with keywords. The performance of the deep learning model was compared to traditional machine learning and rule-based algorithms. RESULTS: The final keyword list consists of 109 keywords, of which 58 (53.2%) were expanded by the BioWordVec model. Dataset I contained 3,150 patients (50% TGD) while Dataset II contained 200 patients (90% TGD). On Dataset I the deep learning model achieved a F1 score of 0.917, sensitivity of 0.854, and a precision of 0.980; and on Dataset II a F1 score of 0.969, sensitivity of 0.967, and precision of 0.972. The deep learning model significantly outperformed rule-based algorithms. CONCLUSION: This is the first study to show that deep learning-integrated NLP algorithms can accurately identify gender identity using EHR data. Future work should leverage and evaluate additional diverse data sources to generate more generalizable algorithms.


Assuntos
Aprendizado Profundo , Pessoas Transgênero , Adulto , Humanos , Masculino , Feminino , Identidade de Gênero , Registros Eletrônicos de Saúde , Algoritmos
2.
J Med Internet Res ; 25: e45419, 2023 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-36812402

RESUMO

BACKGROUND: For an emergent pandemic, such as COVID-19, the statistics of symptoms based on hospital data may be biased or delayed due to the high proportion of asymptomatic or mild-symptom infections that are not recorded in hospitals. Meanwhile, the difficulty in accessing large-scale clinical data also limits many researchers from conducting timely research. OBJECTIVE: Given the wide coverage and promptness of social media, this study aimed to present an efficient workflow to track and visualize the dynamic characteristics and co-occurrence of symptoms for the COVID-19 pandemic from large-scale and long-term social media data. METHODS: This retrospective study included 471,553,966 COVID-19-related tweets from February 1, 2020, to April 30, 2022. We curated a hierarchical symptom lexicon for social media containing 10 affected organs/systems, 257 symptoms, and 1808 synonyms. The dynamic characteristics of COVID-19 symptoms over time were analyzed from the perspectives of weekly new cases, overall distribution, and temporal prevalence of reported symptoms. The symptom evolutions between virus strains (Delta and Omicron) were investigated by comparing the symptom prevalence during their dominant periods. A co-occurrence symptom network was developed and visualized to investigate inner relationships among symptoms and affected body systems. RESULTS: This study identified 201 COVID-19 symptoms and grouped them into 10 affected body systems. There was a significant correlation between the weekly quantity of self-reported symptoms and new COVID-19 infections (Pearson correlation coefficient=0.8528; P<.001). We also observed a 1-week leading trend (Pearson correlation coefficient=0.8802; P<.001) between them. The frequency of symptoms showed dynamic changes as the pandemic progressed, from typical respiratory symptoms in the early stage to more musculoskeletal and nervous symptoms in the later stages. We identified the difference in symptoms between the Delta and Omicron periods. There were fewer severe symptoms (coma and dyspnea), more flu-like symptoms (throat pain and nasal congestion), and fewer typical COVID symptoms (anosmia and taste altered) in the Omicron period than in the Delta period (all P<.001). Network analysis revealed co-occurrences among symptoms and systems corresponding to specific disease progressions, including palpitations (cardiovascular) and dyspnea (respiratory), and alopecia (musculoskeletal) and impotence (reproductive). CONCLUSIONS: This study identified more and milder COVID-19 symptoms than clinical research and characterized the dynamic symptom evolution based on 400 million tweets over 27 months. The symptom network revealed potential comorbidity risk and prognostic disease progression. These findings demonstrate that the cooperation of social media and a well-designed workflow can depict a holistic picture of pandemic symptoms to complement clinical studies.


Assuntos
COVID-19 , Mídias Sociais , Humanos , COVID-19/epidemiologia , SARS-CoV-2 , Pandemias , Estudos Retrospectivos , Infodemiologia
3.
J Med Internet Res ; 24(10): e39676, 2022 10 13.
Artigo em Inglês | MEDLINE | ID: mdl-36191167

RESUMO

BACKGROUND: The COVID-19 pandemic and its corresponding preventive and control measures have increased the mental burden on the public. Understanding and tracking changes in public mental status can facilitate optimizing public mental health intervention and control strategies. OBJECTIVE: This study aimed to build a social media-based pipeline that tracks public mental changes and use it to understand public mental health status regarding the pandemic. METHODS: This study used COVID-19-related tweets posted from February 2020 to April 2022. The tweets were downloaded using unique identifiers through the Twitter application programming interface. We created a lexicon of 4 mental health problems (depression, anxiety, insomnia, and addiction) to identify mental health-related tweets and developed a dictionary for identifying health care workers. We analyzed temporal and geographic distributions of public mental health status during the pandemic and further compared distributions among health care workers versus the general public, supplemented by topic modeling on their underlying foci. Finally, we used interrupted time series analysis to examine the statewide impact of a lockdown policy on public mental health in 12 states. RESULTS: We extracted 4,213,005 tweets related to mental health and COVID-19 from 2,316,817 users. Of these tweets, 2,161,357 (51.3%) were related to "depression," whereas 1,923,635 (45.66%), 225,205 (5.35%), and 150,006 (3.56%) were related to "anxiety," "insomnia," and "addiction," respectively. Compared to the general public, health care workers had higher risks of all 4 types of problems (all P<.001), and they were more concerned about clinical topics than everyday issues (eg, "students' pressure," "panic buying," and "fuel problems") than the general public. Finally, the lockdown policy had significant associations with public mental health in 4 out of the 12 states we studied, among which Pennsylvania showed a positive association, whereas Michigan, North Carolina, and Ohio showed the opposite (all P<.05). CONCLUSIONS: The impact of COVID-19 and the corresponding control measures on the public's mental status is dynamic and shows variability among different cohorts regarding disease types, occupations, and regional groups. Health agencies and policy makers should primarily focus on depression (reported by 51.3% of the tweets) and insomnia (which has had an ever-increasing trend since the beginning of the pandemic), especially among health care workers. Our pipeline timely tracks and analyzes public mental health changes, especially when primary studies and large-scale surveys are difficult to conduct.


Assuntos
COVID-19 , Distúrbios do Início e da Manutenção do Sono , Mídias Sociais , COVID-19/epidemiologia , COVID-19/prevenção & controle , Controle de Doenças Transmissíveis , Humanos , Infodemiologia , Saúde Mental , Pandemias/prevenção & controle , Políticas
4.
Res Sq ; 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38562731

RESUMO

Early and accurate diagnosis is crucial for effective treatment and improved outcomes, yet identifying psychotic episodes presents significant challenges due to its complex nature and the varied presentation of symptoms among individuals. One of the primary difficulties lies in the underreporting and underdiagnosis of psychosis, compounded by the stigma surrounding mental health and the individuals' often diminished insight into their condition. Existing efforts leveraging Electronic Health Records (EHRs) to retrospectively identify psychosis typically rely on structured data, such as medical codes and patient demographics, which frequently lack essential information. Addressing these challenges, our study leverages Natural Language Processing (NLP) algorithms to analyze psychiatric admission notes for the diagnosis of psychosis, providing a detailed evaluation of rule-based algorithms, machine learning models, and pre-trained language models. Additionally, the study investigates the effectiveness of employing keywords to streamline extensive note data before training and evaluating the models. Analyzing 4,617 initial psychiatric admission notes (1,196 cases of psychosis versus 3,433 controls) from 2005 to 2019, we discovered that the XGBoost classifier employing Term Frequency-Inverse Document Frequency (TF-IDF) features derived from notes pre-selected by expert-curated keywords, attained the highest performance with an F1 score of 0.8881 (AUROC [95% CI]: 0.9725 [0.9717, 0.9733]). BlueBERT demonstrated comparable efficacy an F1 score of 0.8841 (AUROC [95% CI]: 0.97 [0.9580,0.9820]) on the same set of notes. Both models markedly outperformed traditional International Classification of Diseases (ICD) code-based detection methods from discharge summaries, which had an F1 score of 0.7608, thus improving the margin by 0.12. Furthermore, our findings indicate that keyword pre-selection markedly enhances the performance of both machine learning and pre-trained language models. This study illustrates the potential of NLP techniques to improve psychosis detection within admission notes and aims to serve as a foundational reference for future research on applying NLP for psychosis identification in EHR notes.

5.
medRxiv ; 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38562701

RESUMO

Early and accurate diagnosis is crucial for effective treatment and improved outcomes, yet identifying psychotic episodes presents significant challenges due to its complex nature and the varied presentation of symptoms among individuals. One of the primary difficulties lies in the underreporting and underdiagnosis of psychosis, compounded by the stigma surrounding mental health and the individuals' often diminished insight into their condition. Existing efforts leveraging Electronic Health Records (EHRs) to retrospectively identify psychosis typically rely on structured data, such as medical codes and patient demographics, which frequently lack essential information. Addressing these challenges, our study leverages Natural Language Processing (NLP) algorithms to analyze psychiatric admission notes for the diagnosis of psychosis, providing a detailed evaluation of rule-based algorithms, machine learning models, and pre-trained language models. Additionally, the study investigates the effectiveness of employing keywords to streamline extensive note data before training and evaluating the models. Analyzing 4,617 initial psychiatric admission notes (1,196 cases of psychosis versus 3,433 controls) from 2005 to 2019, we discovered that the XGBoost classifier employing Term Frequency-Inverse Document Frequency (TF-IDF) features derived from notes pre-selected by expert-curated keywords, attained the highest performance with an F1 score of 0.8881 (AUROC [95% CI]: 0.9725 [0.9717, 0.9733]). BlueBERT demonstrated comparable efficacy an F1 score of 0.8841 (AUROC [95% CI]: 0.97 [0.9580, 0.9820]) on the same set of notes. Both models markedly outperformed traditional International Classification of Diseases (ICD) code-based detection methods from discharge summaries, which had an F1 score of 0.7608, thus improving the margin by 0.12. Furthermore, our findings indicate that keyword pre-selection markedly enhances the performance of both machine learning and pre-trained language models. This study illustrates the potential of NLP techniques to improve psychosis detection within admission notes and aims to serve as a foundational reference for future research on applying NLP for psychosis identification in EHR notes.

6.
J Am Med Inform Assoc ; 31(7): 1569-1577, 2024 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-38718216

RESUMO

OBJECTIVE: Social media-based public health research is crucial for epidemic surveillance, but most studies identify relevant corpora with keyword-matching. This study develops a system to streamline the process of curating colloquial medical dictionaries. We demonstrate the pipeline by curating a Unified Medical Language System (UMLS)-colloquial symptom dictionary from COVID-19-related tweets as proof of concept. METHODS: COVID-19-related tweets from February 1, 2020, to April 30, 2022 were used. The pipeline includes three modules: a named entity recognition module to detect symptoms in tweets; an entity normalization module to aggregate detected entities; and a mapping module that iteratively maps entities to Unified Medical Language System concepts. A random 500 entity samples were drawn from the final dictionary for accuracy validation. Additionally, we conducted a symptom frequency distribution analysis to compare our dictionary to a pre-defined lexicon from previous research. RESULTS: We identified 498 480 unique symptom entity expressions from the tweets. Pre-processing reduces the number to 18 226. The final dictionary contains 38 175 unique expressions of symptoms that can be mapped to 966 UMLS concepts (accuracy = 95%). Symptom distribution analysis found that our dictionary detects more symptoms and is effective at identifying psychiatric disorders like anxiety and depression, often missed by pre-defined lexicons. CONCLUSIONS: This study advances public health research by implementing a novel, systematic pipeline for curating symptom lexicons from social media data. The final lexicon's high accuracy, validated by medical professionals, underscores the potential of this methodology to reliably interpret, and categorize vast amounts of unstructured social media data into actionable medical insights across diverse linguistic and regional landscapes.


Assuntos
COVID-19 , Aprendizado Profundo , Mídias Sociais , Unified Medical Language System , Humanos , Saúde Pública , Armazenamento e Recuperação da Informação/métodos
7.
medRxiv ; 2024 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-38946986

RESUMO

Background: ANCA-associated vasculitis (AAV) is a rare but serious disease. Traditional case-identification methods using claims data can be time-intensive and may miss important subgroups. We hypothesized that a deep learning model analyzing electronic health records (EHR) can more accurately identify AAV cases. Methods: We examined the Mass General Brigham (MGB) repository of clinical documentation from 12/1/1979 to 5/11/2021, using expert-curated keywords and ICD codes to identify a large cohort of potential AAV cases. Three labeled datasets (I, II, III) were created, each containing note sections. We trained and evaluated a range of machine learning and deep learning algorithms for note-level classification, using metrics like positive predictive value (PPV), sensitivity, F-score, area under the receiver operating characteristic curve (AUROC), and area under the precision and recall curve (AUPRC). The deep learning model was further evaluated for its ability to classify AAV cases at the patient-level, compared with rule-based algorithms in 2,000 randomly chosen samples. Results: Datasets I, II, and III comprised 6,000, 3,008, and 7,500 note sections, respectively. Deep learning achieved the highest AUROC in all three datasets, with scores of 0.983, 0.991, and 0.991. The deep learning approach also had among the highest PPVs across the three datasets (0.941, 0.954, and 0.800, respectively). In a test cohort of 2,000 cases, the deep learning model achieved a PPV of 0.262 and an estimated sensitivity of 0.975. Compared to the best rule-based algorithm, the deep learning model identified six additional AAV cases, representing 13% of the total. Conclusion: The deep learning model effectively classifies clinical note sections for AAV diagnosis. Its application to EHR notes can potentially uncover additional cases missed by traditional rule-based methods. SIGNIFICANCE AND INNOVATIONS: Traditional approaches to identifying AAV cases for research have relied on registries assembled through clinical care and/or on billing codes which may miss important subgroups.Unstructured data entered as free text by clinicians document a patient's diagnosis, symptoms, manifestations, and other features of their condition which may be useful for identifying AAV casesWe found that a deep learning approach can classify notes as being indicative of AAV and, when applied at the case level, identifies more cases with AAV than rule-based algorithms.

8.
Artigo em Inglês | MEDLINE | ID: mdl-38060354

RESUMO

With the rapid development of the Internet-of-Medical-Things (IoMT) in recent years, it has emerged as a promising solution to alleviate the workload of medical staff, particularly in the field of Medical Image Quality Assessment (MIQA). By deploying MIQA based on IoMT, it proves to be highly valuable in assisting the diagnosis and treatment of various types of medical images, such as fundus images, ultrasound images, and dermoscopic images. However, traditional MIQA models necessitate a substantial number of labeled medical images to be effective, which poses a challenge in acquiring a sufficient training dataset. To address this issue, we present a label-free MIQA model developed through a zero-shot learning approach. This paper introduces a Semantics-Aware Contrastive Learning (SCL) model that can effectively generalise quality assessment to diverse medical image types. The proposed method integrates features extracted from zero-shot learning, the spatial domain, and the frequency domain. Zero-shot learning is achieved through a tailored Contrastive Language-Image Pre-training (CLIP) model. Natural Scene Statistics (NSS) and patch-based features are extracted in the spatial domain, while frequency features are hierarchically extracted from both local and global levels. All of this information is utilised to derive a final quality score for a medical image. To ensure a comprehensive evaluation, we not only utilise two existing datasets, EyeQ and LiverQ, but also create a dataset specifically for skin image quality assessment. As a result, our SCL method undergoes extensive evaluation using all three medical image quality datasets, demonstrating its superiority over advanced models.

9.
Artigo em Inglês | MEDLINE | ID: mdl-37695962

RESUMO

Biomedical image segmentation plays an important role in Diabetic Retinopathy (DR)-related biomarker detection. DR is an ocular disease that affects the retina in people with diabetes and could lead to visual impairment if management measures are not taken in a timely manner. In DR screening programs, the presence and severity of DR are identified and classified based on various microvascular lesions detected by qualified ophthalmic screeners. Such a detection process is time-consuming and error-prone, given the small size of the microvascular lesions and the volume of images, especially with the increasing prevalence of diabetes. Automated image processing using deep learning methods is recognized as a promising approach to support diabetic retinopathy screening. In this paper, we propose a novel compound scaling encoder-decoder network architecture to improve the accuracy and running efficiency of microvascular lesion segmentation. In the encoder phase, we develop a lightweight encoder to speed up the training process, where the encoder network is scaled up in depth, width, and resolution dimensions. In the decoder phase, an attention mechanism is introduced to yield higher accuracy. Specifically, we employ Concurrent Spatial and Channel Squeeze and Channel Excitation (scSE) blocks to fully utilise both spatial and channel-wise information. Additionally, a compound loss function is incorporated with transfer learning to handle the problem of imbalanced data and further improve performance. To assess performance, our method is evaluated on two large-scale lesion segmentation datasets: DDR and FGADR datasets. Experimental results demonstrate the superiority of our method compared to other competent methods. Our codes are available at https://github.com/DeweiYi/CoSED-Net.

10.
J Am Med Inform Assoc ; 29(10): 1668-1678, 2022 09 12.
Artigo em Inglês | MEDLINE | ID: mdl-35775946

RESUMO

OBJECTIVE: Understanding public discourse on emergency use of unproven therapeutics is essential to monitor safe use and combat misinformation. We developed a natural language processing-based pipeline to understand public perceptions of and stances on coronavirus disease 2019 (COVID-19)-related drugs on Twitter across time. METHODS: This retrospective study included 609 189 US-based tweets between January 29, 2020 and November 30, 2021 on 4 drugs that gained wide public attention during the COVID-19 pandemic: (1) Hydroxychloroquine and Ivermectin, drug therapies with anecdotal evidence; and (2) Molnupiravir and Remdesivir, FDA-approved treatment options for eligible patients. Time-trend analysis was used to understand the popularity and related events. Content and demographic analyses were conducted to explore potential rationales of people's stances on each drug. RESULTS: Time-trend analysis revealed that Hydroxychloroquine and Ivermectin received much more discussion than Molnupiravir and Remdesivir, particularly during COVID-19 surges. Hydroxychloroquine and Ivermectin were highly politicized, related to conspiracy theories, hearsay, celebrity effects, etc. The distribution of stance between the 2 major US political parties was significantly different (P < .001); Republicans were much more likely to support Hydroxychloroquine (+55%) and Ivermectin (+30%) than Democrats. People with healthcare backgrounds tended to oppose Hydroxychloroquine (+7%) more than the general population; in contrast, the general population was more likely to support Ivermectin (+14%). CONCLUSION: Our study found that social media users with have different perceptions and stances on off-label versus FDA-authorized drug use across different stages of COVID-19, indicating that health systems, regulatory agencies, and policymakers should design tailored strategies to monitor and reduce misinformation for promoting safe drug use. Our analysis pipeline and stance detection models are made public at https://github.com/ningkko/COVID-drug.


Assuntos
Tratamento Farmacológico da COVID-19 , Mídias Sociais , Citidina/análogos & derivados , Atenção à Saúde , Humanos , Hidroxicloroquina/uso terapêutico , Hidroxilaminas , Ivermectina , Uso Off-Label , Pandemias , Opinião Pública , Estudos Retrospectivos
11.
NPJ Precis Oncol ; 6(1): 79, 2022 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-36316482

RESUMO

Prognostic analysis for early-stage (stage I/II) melanomas is of paramount importance for customized surveillance and treatment plans. Since immune checkpoint inhibitors have recently been approved for stage IIB and IIC melanomas, prognostic tools to identify patients at high risk of recurrence have become even more critical. This study aims to assess the effectiveness of machine-learning algorithms in predicting melanoma recurrence using clinical and histopathologic features from Electronic Health Records (EHRs). We collected 1720 early-stage melanomas: 1172 from the Mass General Brigham healthcare system (MGB) and 548 from the Dana-Farber Cancer Institute (DFCI). We extracted 36 clinicopathologic features and used them to predict the recurrence risk with supervised machine-learning algorithms. Models were evaluated internally and externally: (1) five-fold cross-validation of the MGB cohort; (2) the MGB cohort for training and the DFCI cohort for testing independently. In the internal and external validations, respectively, we achieved a recurrence classification performance of AUC: 0.845 and 0.812, and a time-to-event prediction performance of time-dependent AUC: 0.853 and 0.820. Breslow tumor thickness and mitotic rate were identified as the most predictive features. Our results suggest that machine-learning algorithms can extract predictive signals from clinicopathologic features for early-stage melanoma recurrence prediction, which will enable the identification of patients that may benefit from adjuvant immunotherapy.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA