Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
J Biomed Inform ; 152: 104621, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38447600

RESUMO

OBJECTIVE: The primary objective of this review is to investigate the effectiveness of machine learning and deep learning methodologies in the context of extracting adverse drug events (ADEs) from clinical benchmark datasets. We conduct an in-depth analysis, aiming to compare the merits and drawbacks of both machine learning and deep learning techniques, particularly within the framework of named-entity recognition (NER) and relation classification (RC) tasks related to ADE extraction. Additionally, our focus extends to the examination of specific features and their impact on the overall performance of these methodologies. In a broader perspective, our research extends to ADE extraction from various sources, including biomedical literature, social media data, and drug labels, removing the limitation to exclusively machine learning or deep learning methods. METHODS: We conducted an extensive literature review on PubMed using the query "(((machine learning [Medical Subject Headings (MeSH) Terms]) OR (deep learning [MeSH Terms])) AND (adverse drug event [MeSH Terms])) AND (extraction)", and supplemented this with a snowballing approach to review 275 references sourced from retrieved articles. RESULTS: In our analysis, we included twelve articles for review. For the NER task, deep learning models outperformed machine learning models. In the RC task, gradient Boosting, multilayer perceptron and random forest models excelled. The Bidirectional Encoder Representations from Transformers (BERT) model consistently achieved the best performance in the end-to-end task. Future efforts in the end-to-end task should prioritize improving NER accuracy, especially for 'ADE' and 'Reason'. CONCLUSION: These findings hold significant implications for advancing the field of ADE extraction and pharmacovigilance, ultimately contributing to improved drug safety monitoring and healthcare outcomes.


Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Inteligência Artificial , Farmacovigilância , Benchmarking , Processamento de Linguagem Natural
2.
J Biomed Inform ; 152: 104623, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38458578

RESUMO

INTRODUCTION: Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS: FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS: ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION: NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.


Assuntos
Atividades Cotidianas , Estado Funcional , Humanos , Idoso , Aprendizagem , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural
3.
BMC Med Inform Decis Mak ; 23(Suppl 4): 298, 2024 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-38183034

RESUMO

BACKGROUND: Vaccine Adverse Events ReportingSystem (VAERS) is a promising resource of tracking adverse events following immunization. Medical Dictionary for Regulatory Activities (MedDRA) terminology used for coding adverse events in VAERS reports has several limitations. We focus on developing an automated system for semantic extraction of adverse events following vaccination and their temporal relationships for a better understanding of VAERS data and its integration into other applications. The aim of the present studyis to summarize the lessons learned during the initial phase of this project in annotating adverse events following influenza vaccination and related to Guillain-Barré syndrome (GBS). We emphasize on identifying the limitations of VAERS and MedDRA. RESULTS: We collected 282 VAERS reports documented between 1990 and 2016 and shortlisted those with at least 1,100 characters in the report. We used a subset of 50 reports for the preliminary investigation and annotated all adverse events following influenza vaccination by mapping to representative MedDRA terms. Associated time expressions were annotated when available. We used 16 System Organ Class (SOC) level MedDRA terms to map GBS related adverse events and expanded some SOC terms to Lowest Level Terms (LLT) for granular representation. We annotated three broad categories of events such as problems, clinical investigations, and treatments/procedures. The inter-annotator agreement of events achieved was 86%. Incomplete reports, typographical errors, lack of clarity and coherence, repeated texts, unavailability of associated temporal information, difficulty to interpret due to incorrect grammar, use of generalized terms to describe adverse events / symptoms, uncommon abbreviations, difficulty annotating multiple events with a conjunction / common phrase, irrelevant historical events and coexisting events were some of the challenges encountered. Some of the limitations we noted are in agreement with previous reports. CONCLUSIONS: We reported the challenges encountered and lessons learned during annotation of adverse events in VAERS reports following influenza vaccination and related to GBS. Though the challenges may be due to the inevitable limitations of public reporting systems and widely reported limitations of MedDRA, we emphasize the need to understand these limitations and extraction of other supportive information for a better understanding of adverse events following vaccination.


Assuntos
Síndrome de Guillain-Barré , Influenza Humana , Humanos , Síndrome de Guillain-Barré/etiologia , Sistemas de Notificação de Reações Adversas a Medicamentos , Influenza Humana/prevenção & controle , Vacinação/efeitos adversos , Linguística
4.
BMC Med Inform Decis Mak ; 23(Suppl 4): 299, 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38326827

RESUMO

BACKGROUND: In this era of big data, data harmonization is an important step to ensure reproducible, scalable, and collaborative research. Thus, terminology mapping is a necessary step to harmonize heterogeneous data. Take the Medical Dictionary for Regulatory Activities (MedDRA) and International Classification of Diseases (ICD) for example, the mapping between them is essential for drug safety and pharmacovigilance research. Our main objective is to provide a quantitative and qualitative analysis of the mapping status between MedDRA and ICD. We focus on evaluating the current mapping status between MedDRA and ICD through the Unified Medical Language System (UMLS) and Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). We summarized the current mapping statistics and evaluated the quality of the current MedDRA-ICD mapping; for unmapped terms, we used our self-developed algorithm to rank the best possible mapping candidates for additional mapping coverage. RESULTS: The identified MedDRA-ICD mapped pairs cover 27.23% of the overall MedDRA preferred terms (PT). The systematic quality analysis demonstrated that, among the mapped pairs provided by UMLS, only 51.44% are considered an exact match. For the 2400 sampled unmapped terms, 56 of the 2400 MedDRA Preferred Terms (PT) could have exact match terms from ICD. CONCLUSION: Some of the mapped pairs between MedDRA and ICD are not exact matches due to differences in granularity and focus. For 72% of the unmapped PT terms, the identified exact match pairs illustrate the possibility of identifying additional mapped pairs. Referring to its own mapping standard, some of the unmapped terms should qualify for the expansion of MedDRA to ICD mapping in UMLS.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos , Classificação Internacional de Doenças , Humanos , Unified Medical Language System , Farmacovigilância , Algoritmos
5.
BMC Bioinformatics ; 24(Suppl 3): 477, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-38102593

RESUMO

BACKGROUND: With more clinical trials are offering optional participation in the collection of bio-specimens for biobanking comes the increasing complexity of requirements of informed consent forms. The aim of this study is to develop an automatic natural language processing (NLP) tool to annotate informed consent documents to promote biorepository data regulation, sharing, and decision support. We collected informed consent documents from several publicly available sources, then manually annotated them, covering sentences containing permission information about the sharing of either bio-specimens or donor data, or conducting genetic research or future research using bio-specimens or donor data. RESULTS: We evaluated a variety of machine learning algorithms including random forest (RF) and support vector machine (SVM) for the automatic identification of these sentences. 120 informed consent documents containing 29,204 sentences were annotated, of which 1250 sentences (4.28%) provide answers to a permission question. A support vector machine (SVM) model achieved a F-1 score of 0.95 on classifying the sentences when using a gold standard, which is a prefiltered corpus containing all relevant sentences. CONCLUSIONS: This study provides the feasibility of using machine learning tools to classify permission-related sentences in informed consent documents.


Assuntos
Bancos de Espécimes Biológicos , Termos de Consentimento , Aprendizado de Máquina , Algoritmos , Processamento de Linguagem Natural
6.
PLoS One ; 19(3): e0300919, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38512919

RESUMO

Though Vaccines are instrumental in global health, mitigating infectious diseases and pandemic outbreaks, they can occasionally lead to adverse events (AEs). Recently, Large Language Models (LLMs) have shown promise in effectively identifying and cataloging AEs within clinical reports. Utilizing data from the Vaccine Adverse Event Reporting System (VAERS) from 1990 to 2016, this study particularly focuses on AEs to evaluate LLMs' capability for AE extraction. A variety of prevalent LLMs, including GPT-2, GPT-3 variants, GPT-4, and Llama2, were evaluated using Influenza vaccine as a use case. The fine-tuned GPT 3.5 model (AE-GPT) stood out with a 0.704 averaged micro F1 score for strict match and 0.816 for relaxed match. The encouraging performance of the AE-GPT underscores LLMs' potential in processing medical data, indicating a significant stride towards advanced AE detection, thus presumably generalizable to other AE extraction tasks.


Assuntos
Vacinas contra Influenza , Influenza Humana , Humanos , Vacinas contra Influenza/efeitos adversos , Sistemas de Notificação de Reações Adversas a Medicamentos , Influenza Humana/prevenção & controle , Alanina Transaminase , Surtos de Doenças
7.
medRxiv ; 2024 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-38978646

RESUMO

Patient recruitment is a key desideratum for the success of a clinical trial that entails identifying eligible patients that match the selection criteria for the trial. However, the complexity of criteria information and heterogeneity of patient data render manual analysis a burdensome and time-consuming task. In an attempt to automate patient recruitment, this work proposes a Siamese Neural Network-based model, namely Siamese-PTM. Siamese-PTM employs the pretrained LLaMA 2 model to derive contextual representations of the EHR and criteria inputs and jointly encodes them using two weight-sharing identical subnetworks. We evaluate Siamese-PTM on structured and unstructured EHR to analyze their predictive informativeness as standalone and collective feature sets. We explore a variety of deep models for Siamese-PTM's encoders and compare their performance against the Single-encoder counterparts. We develop a baseline rule-based classifier, compared to which Siamese-PTM improved performance by 40%. Furthermore, visualization of Siamese-PTM's learned embedding space reinforces its predictive robustness.

8.
JMIR Public Health Surveill ; 10: e51007, 2024 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-39008362

RESUMO

BACKGROUND: The COVID-19 pandemic, caused by SARS-CoV-2, has had a profound impact worldwide, leading to widespread morbidity and mortality. Vaccination against COVID-19 is a critical tool in controlling the spread of the virus and reducing the severity of the disease. However, the rapid development and deployment of COVID-19 vaccines have raised concerns about potential adverse events following immunization (AEFIs). Understanding the temporal and spatial patterns of these AEFIs is crucial for an effective public health response and vaccine safety monitoring. OBJECTIVE: This study aimed to analyze the temporal and spatial characteristics of AEFIs associated with COVID-19 vaccines in the United States reported to the Vaccine Adverse Event Reporting System (VAERS), thereby providing insights into the patterns and distributions of the AEFIs, the safety profile of COVID-19 vaccines, and potential risk factors associated with the AEFIs. METHODS: We conducted a retrospective analysis of administration data from the Centers for Disease Control and Prevention (n=663,822,575) and reports from the surveillance system VAERS (n=900,522) between 2020 and 2022. To gain a broader understanding of postvaccination AEFIs reported, we categorized them into system organ classes (SOCs) according to the Medical Dictionary for Regulatory Activities. Additionally, we performed temporal analysis to examine the trends of AEFIs in all VAERS reports, those related to Pfizer-BioNTech and Moderna, and the top 10 AEFI trends in serious reports. We also compared the similarity of symptoms across various regions within the United States. RESULTS: Our findings revealed that the most frequently reported symptoms following COVID-19 vaccination were headache (n=141,186, 15.68%), pyrexia (n=122,120, 13.56%), and fatigue (n=121,910, 13.54%). The most common symptom combination was chills and pyrexia (n=56,954, 6.32%). Initially, general disorders and administration site conditions (SOC 22) were the most prevalent class reported. Moderna exhibited a higher reporting rate of AEFIs compared to Pfizer-BioNTech. Over time, we observed a decreasing reporting rate of AEFIs associated with COVID-19 vaccines. In addition, the overall rates of AEFIs between the Pfizer-BioNTech and Moderna vaccines were comparable. In terms of spatial analysis, the middle and north regions of the United States displayed a higher reporting rate of AEFIs associated with COVID-19 vaccines, while the southeast and south-central regions showed notable similarity in symptoms reported. CONCLUSIONS: This study provides valuable insights into the temporal and spatial patterns of AEFIs associated with COVID-19 vaccines in the United States. The findings underscore the critical need for increasing vaccination coverage, as well as ongoing surveillance and monitoring of AEFIs. Implementing targeted monitoring programs can facilitate the effective and efficient management of AEFIs, enhancing public confidence in future COVID-19 vaccine campaigns.


Assuntos
Vacinas contra COVID-19 , Humanos , Estados Unidos/epidemiologia , Vacinas contra COVID-19/efeitos adversos , Vacinas contra COVID-19/administração & dosagem , Estudos Retrospectivos , Masculino , Feminino , Pessoa de Meia-Idade , Adulto , Sistemas de Notificação de Reações Adversas a Medicamentos/estatística & dados numéricos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Idoso , COVID-19/prevenção & controle , COVID-19/epidemiologia , Análise Espacial , Análise Espaço-Temporal , Adulto Jovem , Adolescente
9.
Online J Public Health Inform ; 16: e52845, 2024 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-38477963

RESUMO

BACKGROUND: Social determinants of health (SDoH) have been described by the World Health Organization as the conditions in which individuals are born, live, work, and age. These conditions can be grouped into 3 interrelated levels known as macrolevel (societal), mesolevel (community), and microlevel (individual) determinants. The scope of SDoH expands beyond the biomedical level, and there remains a need to connect other areas such as economics, public policy, and social factors. OBJECTIVE: Providing a computable artifact that can link health data to concepts involving the different levels of determinants may improve our understanding of the impact SDoH have on human populations. Modeling SDoH may help to reduce existing gaps in the literature through explicit links between the determinants and biological factors. This in turn can allow researchers and clinicians to make better sense of data and discover new knowledge through the use of semantic links. METHODS: An experimental ontology was developed to represent knowledge of the social and economic characteristics of SDoH. Information from 27 literature sources was analyzed to gather concepts and encoded using Web Ontology Language, version 2 (OWL2) and Protégé. Four evaluators independently reviewed the ontology axioms using natural language translation. The analyses from the evaluations and selected terminologies from the Basic Formal Ontology were used to create a revised ontology with a broad spectrum of knowledge concepts ranging from the macrolevel to the microlevel determinants. RESULTS: The literature search identified several topics of discussion for each determinant level. Publications for the macrolevel determinants centered around health policy, income inequality, welfare, and the environment. Articles relating to the mesolevel determinants discussed work, work conditions, psychosocial factors, socioeconomic position, outcomes, food, poverty, housing, and crime. Finally, sources found for the microlevel determinants examined gender, ethnicity, race, and behavior. Concepts were gathered from the literature and used to produce an ontology consisting of 383 classes, 109 object properties, and 748 logical axioms. A reasoning test revealed no inconsistent axioms. CONCLUSIONS: This ontology models heterogeneous social and economic concepts to represent aspects of SDoH. The scope of SDoH is expansive, and although the ontology is broad, it is still in its early stages. To our current understanding, this ontology represents the first attempt to concentrate on knowledge concepts that are currently not covered by existing ontologies. Future direction will include further expanding the ontology to link with other biomedical ontologies, including alignment for granular semantics.

10.
Artigo em Inglês | MEDLINE | ID: mdl-38898884

RESUMO

Human papillomavirus (HPV) vaccinations are lower than expected. To protect the onset of head and neck cancers, innovative strategies to improve the rates are needed. Artificial intelligence may offer some solutions, specifically conversational agents to perform counseling methods. We present our efforts in developing a dialogue model for automating motivational interviewing (MI) to encourage HPV vaccination. We developed a formalized dialogue model for MI using an existing ontology-based framework to manifest a computable representation using OWL2. New utterance classifications were identified along with the ontology that encodes the dialogue model. Our work is available on GitHub under the GPL v.3. We discuss how an ontology-based model of MI can help standardize/formalize MI counseling for HPV vaccine uptake. Our future steps will involve assessing MI fidelity of the ontology model, operationalization, and testing the dialogue model in a simulation with live participants.

11.
JMIR Med Inform ; 12: e49613, 2024 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-38904996

RESUMO

BACKGROUND: Dermoscopy is a growing field that uses microscopy to allow dermatologists and primary care physicians to identify skin lesions. For a given skin lesion, a wide variety of differential diagnoses exist, which may be challenging for inexperienced users to name and understand. OBJECTIVE: In this study, we describe the creation of the dermoscopy differential diagnosis explorer (D3X), an ontology linking dermoscopic patterns to differential diagnoses. METHODS: Existing ontologies that were incorporated into D3X include the elements of visuals ontology and dermoscopy elements of visuals ontology, which connect visual features to dermoscopic patterns. A list of differential diagnoses for each pattern was generated from the literature and in consultation with domain experts. Open-source images were incorporated from DermNet, Dermoscopedia, and open-access research papers. RESULTS: D3X was encoded in the OWL 2 web ontology language and includes 3041 logical axioms, 1519 classes, 103 object properties, and 20 data properties. We compared D3X with publicly available ontologies in the dermatology domain using a semiotic theory-driven metric to measure the innate qualities of D3X with others. The results indicate that D3X is adequately comparable with other ontologies of the dermatology domain. CONCLUSIONS: The D3X ontology is a resource that can link and integrate dermoscopic differential diagnoses and supplementary information with existing ontology-based resources. Future directions include developing a web application based on D3X for dermoscopy education and clinical practice.

12.
Neural Netw ; 176: 106338, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38692190

RESUMO

Electroencephalography (EEG) based Brain Computer Interface (BCI) systems play a significant role in facilitating how individuals with neurological impairments effectively interact with their environment. In real world applications of BCI system for clinical assistance and rehabilitation training, the EEG classifier often needs to learn on sequentially arriving subjects in an online manner. As patterns of EEG signals can be significantly different for different subjects, the EEG classifier can easily erase knowledge of learnt subjects after learning on later ones as it performs decoding in online streaming scenario, namely catastrophic forgetting. In this work, we tackle this problem with a memory-based approach, which considers the following conditions: (1) subjects arrive sequentially in an online manner, with no large scale dataset available for joint training beforehand, (2) data volume from the different subjects could be imbalanced, (3) decoding difficulty of the sequential streaming signal vary, (4) continual classification for a long time is required. This online sequential EEG decoding problem is more challenging than classic cross subject EEG decoding as there is no large-scale training data from the different subjects available beforehand. The proposed model keeps a small balanced memory buffer during sequential learning, with memory data dynamically selected based on joint consideration of data volume and informativeness. Furthermore, for the more general scenarios where subject identity is unknown to the EEG decoder, aka. subject agnostic scenario, we propose a kernel based subject shift detection method that identifies underlying subject changes on the fly in a computationally efficient manner. We develop challenging benchmarks of streaming EEG data from sequentially arriving subjects with both balanced and imbalanced data volumes, and performed extensive experiments with a detailed ablation study on the proposed model. The results show the effectiveness of our proposed approach, enabling the decoder to maintain performance on all previously seen subjects over a long period of sequential decoding. The model demonstrates the potential for real-world applications.


Assuntos
Interfaces Cérebro-Computador , Eletroencefalografia , Memória , Eletroencefalografia/métodos , Humanos , Memória/fisiologia , Processamento de Sinais Assistido por Computador , Encéfalo/fisiologia , Algoritmos
13.
Mayo Clin Proc Digit Health ; 2(2): 221-230, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38993485

RESUMO

Objective: To validate deep learning models' ability to predict post-transplantation major adverse cardiovascular events (MACE) in patients undergoing liver transplantation (LT). Patients and Methods: We used data from Optum's de-identified Clinformatics Data Mart Database to identify liver transplant recipients between January 2007 and March 2020. To predict post-transplantation MACE risk, we considered patients' demographics characteristics, diagnoses, medications, and procedural data recorded back to 3 years before the LT procedure date (index date). MACE is predicted using the bidirectional gated recurrent units (BiGRU) deep learning model in different prediction interval lengths up to 5 years after the index date. In total, 18,304 liver transplant recipients (mean age, 57.4 years [SD, 12.76]; 7158 [39.1%] women) were used to develop and test the deep learning model's performance against other baseline machine learning models. Models were optimized using 5-fold cross-validation on 80% of the cohort, and model performance was evaluated on the remaining 20% using the area under the receiver operating characteristic curve (AUC-ROC) and the area under the precision-recall curve (AUC-PR). Results: Using different prediction intervals after the index date, the top-performing model was the deep learning model, BiGRU, and achieved an AUC-ROC of 0.841 (95% CI, 0.822-0.862) and AUC-PR of 0.578 (95% CI, 0.537-0.621) for a 30-day prediction interval after LT. Conclusion: Using longitudinal claims data, deep learning models can efficiently predict MACE after LT, assisting clinicians in identifying high-risk candidates for further risk stratification or other management strategies to improve transplant outcomes based on important features identified by the model.

14.
J Biomed Semantics ; 15(1): 14, 2024 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-39123237

RESUMO

BACKGROUND: Vaccines have revolutionized public health by providing protection against infectious diseases. They stimulate the immune system and generate memory cells to defend against targeted diseases. Clinical trials evaluate vaccine performance, including dosage, administration routes, and potential side effects. CLINICALTRIALS: gov is a valuable repository of clinical trial information, but the vaccine data in them lacks standardization, leading to challenges in automatic concept mapping, vaccine-related knowledge development, evidence-based decision-making, and vaccine surveillance. RESULTS: In this study, we developed a cascaded framework that capitalized on multiple domain knowledge sources, including clinical trials, the Unified Medical Language System (UMLS), and the Vaccine Ontology (VO), to enhance the performance of domain-specific language models for automated mapping of VO from clinical trials. The Vaccine Ontology (VO) is a community-based ontology that was developed to promote vaccine data standardization, integration, and computer-assisted reasoning. Our methodology involved extracting and annotating data from various sources. We then performed pre-training on the PubMedBERT model, leading to the development of CTPubMedBERT. Subsequently, we enhanced CTPubMedBERT by incorporating SAPBERT, which was pretrained using the UMLS, resulting in CTPubMedBERT + SAPBERT. Further refinement was accomplished through fine-tuning using the Vaccine Ontology corpus and vaccine data from clinical trials, yielding the CTPubMedBERT + SAPBERT + VO model. Finally, we utilized a collection of pre-trained models, along with the weighted rule-based ensemble approach, to normalize the vaccine corpus and improve the accuracy of the process. The ranking process in concept normalization involves prioritizing and ordering potential concepts to identify the most suitable match for a given context. We conducted a ranking of the Top 10 concepts, and our experimental results demonstrate that our proposed cascaded framework consistently outperformed existing effective baselines on vaccine mapping, achieving 71.8% on top 1 candidate's accuracy and 90.0% on top 10 candidate's accuracy. CONCLUSION: This study provides a detailed insight into a cascaded framework of fine-tuned domain-specific language models improving mapping of VO from clinical trials. By effectively leveraging domain-specific information and applying weighted rule-based ensembles of different pre-trained BERT models, our framework can significantly enhance the mapping of VO from clinical trials.


Assuntos
Ontologias Biológicas , Ensaios Clínicos como Assunto , Vacinas , Vacinas/imunologia , Humanos , Processamento de Linguagem Natural , Unified Medical Language System
15.
Expert Rev Vaccines ; 23(1): 53-59, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38063069

RESUMO

INTRODUCTION: The rapid development of COVID-19 vaccines has provided crucial tools for pandemic control, but the occurrence of vaccine-related adverse events (AEs) underscores the need for comprehensive monitoring. METHODS: This study analyzed the Vaccine Adverse Event Reporting System (VAERS) data from 2020-2022 using statistical methods such as zero-truncated Poisson regression and logistic regression to assess associations with age, gender groups, and vaccine manufacturers. RESULTS: Logistic regression identified 26 System Organ Classes (SOCs) significantly associated with age and gender. Females displayed especially higher odds in SOC 19 (Pregnancy, puerperium and perinatal conditions), while males had higher odds in SOC 25 (Surgical and medical procedures). Older adults (>65) were more prone to symptoms like Cardiac disorders, whereas those aged 18-65 showed susceptibility to AEs like Skin and subcutaneous tissue disorders. Moderna and Pfizer vaccines induced fewer SOC symptoms compared to Janssen and Novavax. The zero-truncated Poisson regression model estimated an average of 4.243 symptoms per individual. CONCLUSION: These findings offer vital insights into vaccine safety, guiding evidence-based vaccination strategies and monitoring programs for precise and effective outcomes.


Assuntos
Vacinas contra COVID-19 , COVID-19 , Vacinas , Idoso , Feminino , Humanos , Masculino , Gravidez , Sistemas de Notificação de Reações Adversas a Medicamentos , COVID-19/epidemiologia , COVID-19/prevenção & controle , Vacinas contra COVID-19/efeitos adversos , Estados Unidos , Vacinação/efeitos adversos , Vacinas/efeitos adversos
16.
J Healthc Inform Res ; 8(2): 206-224, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38681754

RESUMO

Biomedical relation extraction (RE) is critical in constructing high-quality knowledge graphs and databases as well as supporting many downstream text mining applications. This paper explores prompt tuning on biomedical RE and its few-shot scenarios, aiming to propose a simple yet effective model for this specific task. Prompt tuning reformulates natural language processing (NLP) downstream tasks into masked language problems by embedding specific text prompts into the original input, facilitating the adaption of pre-trained language models (PLMs) to better address these tasks. This study presents a customized prompt tuning model designed explicitly for biomedical RE, including its applicability in few-shot learning contexts. The model's performance was rigorously assessed using the chemical-protein relation (CHEMPROT) dataset from BioCreative VI and the drug-drug interaction (DDI) dataset from SemEval-2013, showcasing its superior performance over conventional fine-tuned PLMs across both datasets, encompassing few-shot scenarios. This observation underscores the effectiveness of prompt tuning in enhancing the capabilities of conventional PLMs, though the extent of enhancement may vary by specific model. Additionally, the model demonstrated a harmonious balance between simplicity and efficiency, matching state-of-the-art performance without needing external knowledge or extra computational resources. The pivotal contribution of our study is the development of a suitably designed prompt tuning model, highlighting prompt tuning's effectiveness in biomedical RE. It offers a robust, efficient approach to the field's challenges and represents a significant advancement in extracting complex relations from biomedical texts. Supplementary Information: The online version contains supplementary material available at 10.1007/s41666-024-00162-9.

17.
JMIR Med Inform ; 12: e50428, 2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38787295

RESUMO

Background: Individuals from minoritized racial and ethnic backgrounds experience pernicious and pervasive health disparities that have emerged, in part, from clinician bias. Objective: We used a natural language processing approach to examine whether linguistic markers in electronic health record (EHR) notes differ based on the race and ethnicity of the patient. To validate this methodological approach, we also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias. Methods: In this cross-sectional study, we extracted EHR notes for patients who were aged 18 years or older; had more than 5 years of diabetes diagnosis codes; and received care between 2006 and 2014 from family physicians, general internists, or endocrinologists practicing in an urban, academic network of clinics. The race and ethnicity of patients were defined as White non-Hispanic, Black non-Hispanic, or Hispanic or Latino. We hypothesized that Sentiment Analysis and Social Cognition Engine (SEANCE) components (ie, negative adjectives, positive adjectives, joy words, fear and disgust words, politics words, respect words, trust verbs, and well-being words) and mean word count would be indicators of bias if racial differences emerged. We performed linear mixed effects analyses to examine the relationship between the outcomes of interest (the SEANCE components and word count) and patient race and ethnicity, controlling for patient age. To validate this approach, we asked clinicians to indicate the extent to which they thought variation in the use of SEANCE language domains for different racial and ethnic groups was reflective of bias in EHR notes. Results: We examined EHR notes (n=12,905) of Black non-Hispanic, White non-Hispanic, and Hispanic or Latino patients (n=1562), who were seen by 281 physicians. A total of 27 clinicians participated in the validation study. In terms of bias, participants rated negative adjectives as 8.63 (SD 2.06), fear and disgust words as 8.11 (SD 2.15), and positive adjectives as 7.93 (SD 2.46) on a scale of 1 to 10, with 10 being extremely indicative of bias. Notes for Black non-Hispanic patients contained significantly more negative adjectives (coefficient 0.07, SE 0.02) and significantly more fear and disgust words (coefficient 0.007, SE 0.002) than those for White non-Hispanic patients. The notes for Hispanic or Latino patients included significantly fewer positive adjectives (coefficient -0.02, SE 0.007), trust verbs (coefficient -0.009, SE 0.004), and joy words (coefficient -0.03, SE 0.01) than those for White non-Hispanic patients. Conclusions: This approach may enable physicians and researchers to identify and mitigate bias in medical interactions, with the goal of reducing health disparities stemming from bias.

18.
JMIR Aging ; 7: e54748, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38976869

RESUMO

BACKGROUND: Alzheimer disease and related dementias (ADRD) rank as the sixth leading cause of death in the United States, underlining the importance of accurate ADRD risk prediction. While recent advancements in ADRD risk prediction have primarily relied on imaging analysis, not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes. OBJECTIVE: The study aims to use graph neural networks (GNNs) with claim data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative, self-explainable method to evaluate relationship importance and its influence on ADRD risk prediction. METHODS: We used a variationally regularized encoder-decoder GNN (variational GNN [VGNN]) integrated with our proposed relation importance method for estimating ADRD likelihood. This self-explainable method can provide a feature-important explanation in the context of ADRD risk prediction, leveraging relational information within a graph. Three scenarios with 1-year, 2-year, and 3-year prediction windows were created to assess the model's efficiency, respectively. Random forest (RF) and light gradient boost machine (LGBM) were used as baselines. By using this method, we further clarify the key relationships for ADRD risk prediction. RESULTS: In scenario 1, the VGNN model showed area under the receiver operating characteristic (AUROC) scores of 0.7272 and 0.7480 for the small subset and the matched cohort data set. It outperforms RF and LGBM by 10.6% and 9.1%, respectively, on average. In scenario 2, it achieved AUROC scores of 0.7125 and 0.7281, surpassing the other models by 10.5% and 8.9%, respectively. Similarly, in scenario 3, AUROC scores of 0.7001 and 0.7187 were obtained, exceeding 10.1% and 8.5% than the baseline models, respectively. These results clearly demonstrate the significant superiority of the graph-based approach over the tree-based models (RF and LGBM) in predicting ADRD. Furthermore, the integration of the VGNN model and our relation importance interpretation could provide valuable insight into paired factors that may contribute to or delay ADRD progression. CONCLUSIONS: Using our innovative self-explainable method with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.


Assuntos
Doença de Alzheimer , Redes Neurais de Computação , Humanos , Doença de Alzheimer/diagnóstico , Medição de Risco/métodos , Algoritmos , Feminino , Idoso , Masculino , Demência/epidemiologia , Demência/diagnóstico , Aprendizado de Máquina , Fatores de Risco
19.
JMIR Aging ; 7: e49415, 2024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-38261365

RESUMO

BACKGROUND: Reminiscence, a therapy that uses stimulating materials such as old photos and videos to stimulate long-term memory, can improve the emotional well-being and life satisfaction of older adults, including those who are cognitively intact. However, providing personalized reminiscence therapy can be challenging for caregivers and family members. OBJECTIVE: This study aimed to achieve three objectives: (1) design and develop the GoodTimes app, an interactive multimodal photo album that uses artificial intelligence (AI) to engage users in personalized conversations and storytelling about their pictures, encompassing family, friends, and special moments; (2) examine the app's functionalities in various scenarios using use-case studies and assess the app's usability and user experience through the user study; and (3) investigate the app's potential as a supplementary tool for reminiscence therapy among cognitively intact older adults, aiming to enhance their psychological well-being by facilitating the recollection of past experiences. METHODS: We used state-of-the-art AI technologies, including image recognition, natural language processing, knowledge graph, logic, and machine learning, to develop GoodTimes. First, we constructed a comprehensive knowledge graph that models the information required for effective communication, including photos, people, locations, time, and stories related to the photos. Next, we developed a voice assistant that interacts with users by leveraging the knowledge graph and machine learning techniques. Then, we created various use cases to examine the functions of the system in different scenarios. Finally, to evaluate GoodTimes' usability, we conducted a study with older adults (N=13; age range 58-84, mean 65.8 years). The study period started from January to March 2023. RESULTS: The use-case tests demonstrated the performance of GoodTimes in handling a variety of scenarios, highlighting its versatility and adaptability. For the user study, the feedback from our participants was highly positive, with 92% (12/13) reporting a positive experience conversing with GoodTimes. All participants mentioned that the app invoked pleasant memories and aided in recollecting loved ones, resulting in a sense of happiness for the majority (11/13, 85%). Additionally, a significant majority found GoodTimes to be helpful (11/13, 85%) and user-friendly (12/13, 92%). Most participants (9/13, 69%) expressed a desire to use the app frequently, although some (4/13, 31%) indicated a need for technical support to navigate the system effectively. CONCLUSIONS: Our AI-based interactive photo album, GoodTimes, was able to engage users in browsing their photos and conversing about them. Preliminary evidence supports GoodTimes' usability and benefits cognitively intact older adults. Future work is needed to explore its potential positive effects among older adults with cognitive impairment.


Assuntos
Inteligência Artificial , Aplicativos Móveis , Humanos , Idoso , Pessoa de Meia-Idade , Idoso de 80 Anos ou mais , Memória , Memória de Longo Prazo , Aprendizado de Máquina
20.
medRxiv ; 2024 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-38826441

RESUMO

The consistent and persuasive evidence illustrating the influence of social determinants on health has prompted a growing realization throughout the health care sector that enhancing health and health equity will likely depend, at least to some extent, on addressing detrimental social determinants. However, detailed social determinants of health (SDoH) information is often buried within clinical narrative text in electronic health records (EHRs), necessitating natural language processing (NLP) methods to automatically extract these details. Most current NLP efforts for SDoH extraction have been limited, investigating on limited types of SDoH elements, deriving data from a single institution, focusing on specific patient cohorts or note types, with reduced focus on generalizability. This study aims to address these issues by creating cross-institutional corpora spanning different note types and healthcare systems, and developing and evaluating the generalizability of classification models, including novel large language models (LLMs), for detecting SDoH factors from diverse types of notes from four institutions: Harris County Psychiatric Center, University of Texas Physician Practice, Beth Israel Deaconess Medical Center, and Mayo Clinic. Four corpora of deidentified clinical notes were annotated with 21 SDoH factors at two levels: level 1 with SDoH factor types only and level 2 with SDoH factors along with associated values. Three traditional classification algorithms (XGBoost, TextCNN, Sentence BERT) and an instruction tuned LLM-based approach (LLaMA) were developed to identify multiple SDoH factors. Substantial variation was noted in SDoH documentation practices and label distributions based on patient cohorts, note types, and hospitals. The LLM achieved top performance with micro-averaged F1 scores over 0.9 on level 1 annotated corpora and an F1 over 0.84 on level 2 annotated corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. To foster collaboration, access to partial annotated corpora and models trained by merging all annotated datasets will be made available on the PhysioNet repository.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA