Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 334
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-38829731

RESUMEN

OBJECTIVE: We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO). METHODS: We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama 2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms. RESULTS: When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ∼20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy. CONCLUSION: Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLMs to identify named medical entities from clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary.

2.
J Biomed Inform ; 156: 104663, 2024 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-38838949

RESUMEN

OBJECTIVE: This study aims to investigate the association between social determinants of health (SDoH) and clinical research recruitment outcomes and recommends evidence-based strategies to enhance equity. MATERIALS AND METHODS: Data were collected from the internal clinical study manager database, clinical data warehouse, and clinical research registry. Study characteristics (e.g., study phase) and sociodemographic information were extracted. Median neighborhood income, distance from the study location, and Area Deprivation Index (ADI) were calculated. Mixed effect generalized regression was used for clustering effects and false discovery rate adjustment for multiple testing. A stratified analysis was performed to examine the impact in distinct medical departments. RESULTS: The study sample consisted of 3,962 individuals, with a mean age of 61.5 years, 53.6 % male, 54.2 % White, and 49.1 % non-Hispanic or Latino. Study characteristics revealed a variety of protocols across different departments, with cardiology having the highest percentage of participants (46.4 %). Industry funding was the most common (74.5 %), and digital advertising and personal outreach were the main recruitment methods (58.9 % and 90.8 %). DISCUSSION: The analysis demonstrated significant associations between participant characteristics and research participation, including biological sex, age, ethnicity, and language. The stratified analysis revealed other significant associations for recruitment strategies. SDoH is crucial to clinical research recruitment, and this study presents evidence-based solutions for equity and inclusivity. Researchers can tailor recruitment strategies to overcome barriers and increase participant diversity by identifying participant characteristics and research involvement status. CONCLUSION: The findings highlight the relevance of clinical research inequities and equitable representation of historically underrepresented populations. We need to improve recruitment strategies to promote diversity and inclusivity in research.

3.
Clin Dermatol ; 2024 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-38925444

RESUMEN

Non-melanoma skin cancers (NMSC) cancers are among the top five most common cancers globally. NMSC is an area with great potential for novel application of diagnostic tools including artificial intelligence (AI). In this scoping review, we aimed to describe the applications of AI in diagnosis and treatment of NMSC. Twenty-nine publications described AI applications to dermatopathology including lesion classification and margin assessment. Twenty-five publications discussed AI use in clinical image analysis, showing that algorithms are not superior to dermatologists and may rely on unbalanced, nonrepresentative, and nontransparent training datasets. Sixteen publications described use of AI in cutaneous surgery for NMSC including use in margin assessment during excisions and Mohs surgery, as well as predicting procedural complexity. Eleven publications discussed spectroscopy, confocal microscopy, and thermography and the AI algorithms that analyze and interpret their data. Ten publications pertained to AI application for discovery and utilization of NMSC biomarkers. Eight publications discussed the use of smart phones and AI, specifically how they enable clinicians and patients to have increased access to instant dermatological assessments but with varying accuracies. Five publications discussed large language models and NMSC, including how they may facilitate or hinder patient education and medical decision-making. Three publications pertained to skin of color and AI for NMSC discussed concerns regarding limited diverse datasets for training of CNNs. AI demonstrates tremendous potential to improve diagnosis, patient and clinician education, and management of NMSC. Despite excitement regarding AI, datasets are often not transparently reported, may include low quality images, and may not include diverse skin types, limiting generalizability. AI may serve as a tool to increase access to dermatology services for patients in rural areas and save healthcare dollars. These benefits can only be achieved, however, with consideration of potential ethical costs.

4.
AMIA Jt Summits Transl Sci Proc ; 2024: 515-524, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38827062

RESUMEN

Clinical notes are full of ambiguous medical abbreviations. Contextual knowledge has been leveraged by recent learning-based approaches for sense disambiguation. Previous findings indicated that structural elements of clinical notes entail useful characteristics for informing different interpretations of abbreviations, yet they have remained underutilized and have not been fully investigated. To our best knowledge, the only study exploring note structures simply enumerated the headers in the notes, where such representations are not semantically meaningful. This paper describes a learning-based approach using the note structure represented by the semantic types predefined in Unified Medical Language System (UMLS). We evaluated the representation in addition to the widely used N-gram with three learning models on two different datasets. Experiments indicate that our feature augmentation consistently improved model performance for abbreviation disambiguation, with the optimal F1 score of 0.93.

5.
AMIA Jt Summits Transl Sci Proc ; 2024: 670-678, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38827089

RESUMEN

Topic modeling performs poorly on short phrases or sentences and ever-changing slang, which are common in social media, such as X, formerly known as Twitter. This study investigates whether concept annotation tools such as MetaMap can enable topic modeling at the semantic level. Using tweets mentioning "hydroxychloroquine" for a case study, we extracted 56,017 posted between 03/01/2020-12/31/2021. The tweets were run through MetaMap to encode concepts with UMLS Concept Unique Identifiers (CUIs) and then we used Latent Dirichlet Allocation (LDA) to identify the optimal model for two datasets: 1) tweets with the original text and 2) tweets with the replaced CUIs. We found that the MetaMap LDA models outperformed the non-MetaMap models in terms of coherence and representativeness and identified topics timely relevant to social and political discussions. We concluded that integrating MetaMap to standardize tweets through UMLS concepts improved semantic topic modeling performance amidst noise in the text.

6.
Front Med (Lausanne) ; 11: 1243659, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38711781

RESUMEN

Skin cancer mortality rates continue to rise, and survival analysis is increasingly needed to understand who is at risk and what interventions improve outcomes. However, current statistical methods are limited by inability to synthesize multiple data types, such as patient genetics, clinical history, demographics, and pathology and reveal significant multimodal relationships through predictive algorithms. Advances in computing power and data science enabled the rise of artificial intelligence (AI), which synthesizes vast amounts of data and applies algorithms that enable personalized diagnostic approaches. Here, we analyze AI methods used in skin cancer survival analysis, focusing on supervised learning, unsupervised learning, deep learning, and natural language processing. We illustrate strengths and weaknesses of these approaches with examples. Our PubMed search yielded 14 publications meeting inclusion criteria for this scoping review. Most publications focused on melanoma, particularly histopathologic interpretation with deep learning. Such concentration on a single type of skin cancer amid increasing focus on deep learning highlight growing areas for innovation; however, it also demonstrates opportunity for additional analysis that addresses other types of cutaneous malignancies and expands the scope of prognostication to combine both genetic, histopathologic, and clinical data. Moreover, researchers may leverage multiple AI methods for enhanced benefit in analyses. Expanding AI to this arena may enable improved survival analysis, targeted treatments, and outcomes.

7.
medRxiv ; 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38712122

RESUMEN

Background: Endometriosis affects 10% of reproductive-age women, and yet, it goes undiagnosed for 3.6 years on average after symptoms onset. Despite large GWAS meta-analyses (N > 750,000), only a few dozen causal loci have been identified. We hypothesized that the challenges in identifying causal genes for endometriosis stem from heterogeneity across clinical and biological factors underlying endometriosis diagnosis. Methods: We extracted known endometriosis risk factors, symptoms, and concomitant conditions from the Penn Medicine Biobank (PMBB) and performed unsupervised spectral clustering on 4,078 women with endometriosis. The 5 clusters were characterized by utilizing additional electronic health record (EHR) variables, such as endometriosis-related comorbidities and confirmed surgical phenotypes. From four EHR-linked genetic datasets, PMBB, eMERGE, AOU, and UKBB, we extracted lead variants and tag variants 39 known endometriosis loci for association testing. We meta-analyzed ancestry-stratified case/control tests for each locus and cluster in addition to a positive control (Total N endometriosis cases = 10,108). Results: We have designated the five subtype clusters as pain comorbidities, uterine disorders, pregnancy complications, cardiometabolic comorbidities, and EHR-asymptomatic based on enriched features from each group. One locus, RNLS , surpassed the genome-wide significant threshold in the positive control. Thirteen more loci reached a Bonferroni threshold of 1.3 x 10 -3 (0.05 / 39) in the positive control. The cluster-stratified tests yielded more significant associations than the positive control for anywhere from 5 to 15 loci depending on the cluster. Bonferroni significant loci were identified for four out of five clusters, including WNT4 and GREB1 for the uterine disorders cluster, RNLS for the cardiometabolic cluster, FSHB for the pregnancy complications cluster, and SYNE1 and CDKN2B-AS1 for the EHR-asymptomatic cluster. This study enhances our understanding of the clinical presentation patterns of endometriosis subtypes, showcasing the innovative approach employed to investigate this complex disease.

8.
Artículo en Inglés | MEDLINE | ID: mdl-38787964

RESUMEN

OBJECTIVES: To automatically construct a drug indication taxonomy from drug labels using generative Artificial Intelligence (AI) represented by the Large Language Model (LLM) GPT-4 and real-world evidence (RWE). MATERIALS AND METHODS: We extracted indication terms from 46 421 free-text drug labels using GPT-4, iteratively and recursively generated indication concepts and inferred indication concept-to-concept and concept-to-term subsumption relations by integrating GPT-4 with RWE, and created a drug indication taxonomy. Quantitative and qualitative evaluations involving domain experts were performed for cardiovascular (CVD), Endocrine, and Genitourinary system diseases. RESULTS: 2909 drug indication terms were extracted and assigned into 24 high-level indication categories (ie, initially generated concepts), each of which was expanded into a sub-taxonomy. For example, the CVD sub-taxonomy contains 242 concepts, spanning a depth of 11, with 170 being leaf nodes. It collectively covers a total of 234 indication terms associated with 189 distinct drugs. The accuracies of GPT-4 on determining the drug indication hierarchy exceeded 0.7 with "good to very good" inter-rater reliability. However, the accuracies of the concept-to-term subsumption relation checking varied greatly, with "fair to moderate" reliability. DISCUSSION AND CONCLUSION: We successfully used generative AI and RWE to create a taxonomy, with drug indications adequately consistent with domain expert expectations. We show that LLMs are good at deriving their own concept hierarchies but still fall short in determining the subsumption relations between concepts and terms in unregulated language from free-text drug labels, which is the same hard task for human experts.

9.
J Biomed Inform ; 155: 104659, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38777085

RESUMEN

OBJECTIVE: This study aims to promote interoperability in precision medicine and translational research by aligning the Observational Medical Outcomes Partnership (OMOP) and Phenopackets data models. Phenopackets is an expert knowledge-driven schema designed to facilitate the storage and exchange of multimodal patient data, and support downstream analysis. The first goal of this paper is to explore model alignment by characterizing the common data models using a newly developed data transformation process and evaluation method. Second, using OMOP normalized clinical data, we evaluate the mapping of real-world patient data to Phenopackets. We evaluate the suitability of Phenopackets as a patient data representation for real-world clinical cases. METHODS: We identified mappings between OMOP and Phenopackets and applied them to a real patient dataset to assess the transformation's success. We analyzed gaps between the models and identified key considerations for transforming data between them. Further, to improve ambiguous alignment, we incorporated Unified Medical Language System (UMLS) semantic type-based filtering to direct individual concepts to their most appropriate domain and conducted a domain-expert evaluation of the mapping's clinical utility. RESULTS: The OMOP to Phenopacket transformation pipeline was executed for 1,000 Alzheimer's disease patients and successfully mapped all required entities. However, due to missing values in OMOP for required Phenopacket attributes, 10.2 % of records were lost. The use of UMLS-semantic type filtering for ambiguous alignment of individual concepts resulted in 96 % agreement with clinical thinking, increased from 68 % when mapping exclusively by domain correspondence. CONCLUSION: This study presents a pipeline to transform data from OMOP to Phenopackets. We identified considerations for the transformation to ensure data quality, handling restrictions for successful Phenopacket validation and discrepant data formats. We identified unmappable Phenopacket attributes that focus on specialty use cases, such as genomics or oncology, which OMOP does not currently support. We introduce UMLS semantic type filtering to resolve ambiguous alignment to Phenopacket entities to be most appropriate for real-world interpretation. We provide a systematic approach to align OMOP and Phenopackets schemas. Our work facilitates future use of Phenopackets in clinical applications by addressing key barriers to interoperability when deriving a Phenopacket from real-world patient data.


Asunto(s)
Unified Medical Language System , Humanos , Semántica , Registros Electrónicos de Salud , Medicina de Precisión/métodos , Investigación Biomédica Traslacional , Informática Médica/métodos , Procesamiento de Lenguaje Natural , Enfermedad de Alzheimer
10.
ArXiv ; 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38711434

RESUMEN

Individuals with suspected rare genetic disorders often undergo multiple clinical evaluations, imaging studies, laboratory tests and genetic tests, to find a possible answer over a prolonged period of time. Addressing this "diagnostic odyssey" thus has substantial clinical, psychosocial, and economic benefits. Many rare genetic diseases have distinctive facial features, which can be used by artificial intelligence algorithms to facilitate clinical diagnosis, in prioritizing candidate diseases to be further examined by lab tests or genetic assays, or in helping the phenotype-driven reinterpretation of genome/exome sequencing data. Existing methods using frontal facial photos were built on conventional Convolutional Neural Networks (CNNs), rely exclusively on facial images, and cannot capture non-facial phenotypic traits and demographic information essential for guiding accurate diagnoses. Here we introduce GestaltMML, a multimodal machine learning (MML) approach solely based on the Transformer architecture. It integrates facial images, demographic information (age, sex, ethnicity), and clinical notes (optionally, a list of Human Phenotype Ontology terms) to improve prediction accuracy. Furthermore, we also evaluated GestaltMML on a diverse range of datasets, including 528 diseases from the GestaltMatcher Database, several in-house datasets of Beckwith-Wiedemann syndrome (BWS, over-growth syndrome with distinct facial features), Sotos syndrome (overgrowth syndrome with overlapping features with BWS), NAA10-related neurodevelopmental syndrome, Cornelia de Lange syndrome (multiple malformation syndrome), and KBG syndrome (multiple malformation syndrome). Our results suggest that GestaltMML effectively incorporates multiple modalities of data, greatly narrowing candidate genetic diagnoses of rare diseases and may facilitate the reinterpretation of genome/exome sequencing data.

11.
J Biomed Inform ; 154: 104649, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38697494

RESUMEN

OBJECTIVE: Automated identification of eligible patients is a bottleneck of clinical research. We propose Criteria2Query (C2Q) 3.0, a system that leverages GPT-4 for the semi-automatic transformation of clinical trial eligibility criteria text into executable clinical database queries. MATERIALS AND METHODS: C2Q 3.0 integrated three GPT-4 prompts for concept extraction, SQL query generation, and reasoning. Each prompt was designed and evaluated separately. The concept extraction prompt was benchmarked against manual annotations from 20 clinical trials by two evaluators, who later also measured SQL generation accuracy and identified errors in GPT-generated SQL queries from 5 clinical trials. The reasoning prompt was assessed by three evaluators on four metrics: readability, correctness, coherence, and usefulness, using corrected SQL queries and an open-ended feedback questionnaire. RESULTS: Out of 518 concepts from 20 clinical trials, GPT-4 achieved an F1-score of 0.891 in concept extraction. For SQL generation, 29 errors spanning seven categories were detected, with logic errors being the most common (n = 10; 34.48 %). Reasoning evaluations yielded a high coherence rating, with the mean score being 4.70 but relatively lower readability, with a mean of 3.95. Mean scores of correctness and usefulness were identified as 3.97 and 4.37, respectively. CONCLUSION: GPT-4 significantly improves the accuracy of extracting clinical trial eligibility criteria concepts in C2Q 3.0. Continued research is warranted to ensure the reliability of large language models.


Asunto(s)
Ensayos Clínicos como Asunto , Humanos , Procesamiento de Lenguaje Natural , Programas Informáticos , Selección de Paciente
12.
J Biomed Inform ; 153: 104640, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38608915

RESUMEN

Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.


Asunto(s)
Inteligencia Artificial , Medicina Basada en la Evidencia , Humanos , Confianza , Procesamiento de Lenguaje Natural
13.
ArXiv ; 2024 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-38562452

RESUMEN

Phenotype-driven gene prioritization is a critical process in the diagnosis of rare genetic disorders for identifying and ranking potential disease-causing genes based on observed physical traits or phenotypes. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models have opened doors to the potential of AI predictions through extensive training on diverse corpora and complex models. This study conducted a comprehensive evaluation of five large language models, including two Generative Pre-trained Transformers series, and three Llama2 series, assessing their performance across three key metrics: task completeness, gene prediction accuracy, and adherence to required output structures. Various experiments explored combinations of models, prompts, input types, and task difficulty levels. Our findings reveal that even the best-performing LLM, GPT-4, achieved an accuracy of 16.0%, which still lags behind traditional bioinformatics tools. Prediction accuracy increased with the parameter/model size. A similar increasing trend was observed for the task completion rate, with complicated prompts more likely to increase task completeness in models smaller than GPT-4. However, complicated prompts are more likely to decrease the structure compliance rate, but no prompt effects on GPT-4. Compared to HPO term-based input, LLM was also able to achieve better than random prediction accuracy by taking free-text input, but slightly lower than with the HPO input. Bias analysis showed that certain genes, such as MECP2, CDKL5, and SCN1A, are more likely to be top-ranked, potentially explaining the variances observed across different datasets. This study provides valuable insights into the integration of LLMs within genomic analysis, contributing to the ongoing discussion on the utilization of advanced LLMs in clinical workflows.

14.
medRxiv ; 2024 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-38645167

RESUMEN

Apart from ancestry, personal or environmental covariates may contribute to differences in polygenic score (PGS) performance. We analyzed effects of covariate stratification and interaction on body mass index (BMI) PGS (PGSBMI) across four cohorts of European (N=491,111) and African (N=21,612) ancestry. Stratifying on binary covariates and quintiles for continuous covariates, 18/62 covariates had significant and replicable R2 differences among strata. Covariates with the largest differences included age, sex, blood lipids, physical activity, and alcohol consumption, with R2 being nearly double between best and worst performing quintiles for certain covariates. 28 covariates had significant PGSBMI-covariate interaction effects, modifying PGSBMI effects by nearly 20% per standard deviation change. We observed overlap between covariates that had significant R2 differences among strata and interaction effects - across all covariates, their main effects on BMI were correlated with their maximum R2 differences and interaction effects (0.56 and 0.58, respectively), suggesting high-PGSBMI individuals have highest R2 and increase in PGS effect. Using quantile regression, we show the effect of PGSBMI increases as BMI itself increases, and that these differences in effects are directly related to differences in R2 when stratifying by different covariates. Given significant and replicable evidence for context-specific PGSBMI performance and effects, we investigated ways to increase model performance taking into account non-linear effects. Machine learning models (neural networks) increased relative model R2 (mean 23%) across datasets. Finally, creating PGSBMI directly from GxAge GWAS effects increased relative R2 by 7.8%. These results demonstrate that certain covariates, especially those most associated with BMI, significantly affect both PGSBMI performance and effects across diverse cohorts and ancestries, and we provide avenues to improve model performance that consider these effects.

15.
JAMIA Open ; 7(1): ooae021, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38455840

RESUMEN

Objective: To automate scientific claim verification using PubMed abstracts. Materials and Methods: We developed CliVER, an end-to-end scientific Claim VERification system that leverages retrieval-augmented techniques to automatically retrieve relevant clinical trial abstracts, extract pertinent sentences, and use the PICO framework to support or refute a scientific claim. We also created an ensemble of three state-of-the-art deep learning models to classify rationale of support, refute, and neutral. We then constructed CoVERt, a new COVID VERification dataset comprising 15 PICO-encoded drug claims accompanied by 96 manually selected and labeled clinical trial abstracts that either support or refute each claim. We used CoVERt and SciFact (a public scientific claim verification dataset) to assess CliVER's performance in predicting labels. Finally, we compared CliVER to clinicians in the verification of 19 claims from 6 disease domains, using 189 648 PubMed abstracts extracted from January 2010 to October 2021. Results: In the evaluation of label prediction accuracy on CoVERt, CliVER achieved a notable F1 score of 0.92, highlighting the efficacy of the retrieval-augmented models. The ensemble model outperforms each individual state-of-the-art model by an absolute increase from 3% to 11% in the F1 score. Moreover, when compared with four clinicians, CliVER achieved a precision of 79.0% for abstract retrieval, 67.4% for sentence selection, and 63.2% for label prediction, respectively. Conclusion: CliVER demonstrates its early potential to automate scientific claim verification using retrieval-augmented strategies to harness the wealth of clinical trial abstracts in PubMed. Future studies are warranted to further test its clinical utility.

16.
Appl Clin Inform ; 15(2): 306-312, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38442909

RESUMEN

OBJECTIVES: Large language models (LLMs) like Generative pre-trained transformer (ChatGPT) are powerful algorithms that have been shown to produce human-like text from input data. Several potential clinical applications of this technology have been proposed and evaluated by biomedical informatics experts. However, few have surveyed health care providers for their opinions about whether the technology is fit for use. METHODS: We distributed a validated mixed-methods survey to gauge practicing clinicians' comfort with LLMs for a breadth of tasks in clinical practice, research, and education, which were selected from the literature. RESULTS: A total of 30 clinicians fully completed the survey. Of the 23 tasks, 16 were rated positively by more than 50% of the respondents. Based on our qualitative analysis, health care providers considered LLMs to have excellent synthesis skills and efficiency. However, our respondents had concerns that LLMs could generate false information and propagate training data bias.Our survey respondents were most comfortable with scenarios that allow LLMs to function in an assistive role, like a physician extender or trainee. CONCLUSION: In a mixed-methods survey of clinicians about LLM use, health care providers were encouraging of having LLMs in health care for many tasks, and especially in assistive roles. There is a need for continued human-centered development of both LLMs and artificial intelligence in general.


Asunto(s)
Algoritmos , Inteligencia Artificial , Humanos , Instituciones de Salud , Personal de Salud , Lenguaje
17.
J Am Med Inform Assoc ; 31(5): 1062-1073, 2024 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-38447587

RESUMEN

BACKGROUND: Alzheimer's disease and related dementias (ADRD) affect over 55 million globally. Current clinical trials suffer from low recruitment rates, a challenge potentially addressable via natural language processing (NLP) technologies for researchers to effectively identify eligible clinical trial participants. OBJECTIVE: This study investigates the sociotechnical feasibility of NLP-driven tools for ADRD research prescreening and analyzes the tools' cognitive complexity's effect on usability to identify cognitive support strategies. METHODS: A randomized experiment was conducted with 60 clinical research staff using three prescreening tools (Criteria2Query, Informatics for Integrating Biology and the Bedside [i2b2], and Leaf). Cognitive task analysis was employed to analyze the usability of each tool using the Health Information Technology Usability Evaluation Scale. Data analysis involved calculating descriptive statistics, interrater agreement via intraclass correlation coefficient, cognitive complexity, and Generalized Estimating Equations models. RESULTS: Leaf scored highest for usability followed by Criteria2Query and i2b2. Cognitive complexity was found to be affected by age, computer literacy, and number of criteria, but was not significantly associated with usability. DISCUSSION: Adopting NLP for ADRD prescreening demands careful task delegation, comprehensive training, precise translation of eligibility criteria, and increased research accessibility. The study highlights the relevance of these factors in enhancing NLP-driven tools' usability and efficacy in clinical research prescreening. CONCLUSION: User-modifiable NLP-driven prescreening tools were favorably received, with system type, evaluation sequence, and user's computer literacy influencing usability more than cognitive complexity. The study emphasizes NLP's potential in improving recruitment for clinical trials, endorsing a mixed-methods approach for future system evaluation and enhancements.


Asunto(s)
Enfermedad de Alzheimer , Informática Médica , Humanos , Procesamiento de Lenguaje Natural , Estudios de Factibilidad , Determinación de la Elegibilidad
18.
J Am Med Inform Assoc ; 31(5): 1163-1171, 2024 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-38471120

RESUMEN

OBJECTIVES: Extracting PICO (Populations, Interventions, Comparison, and Outcomes) entities is fundamental to evidence retrieval. We present a novel method, PICOX, to extract overlapping PICO entities. MATERIALS AND METHODS: PICOX first identifies entities by assessing whether a word marks the beginning or conclusion of an entity. Then, it uses a multi-label classifier to assign one or more PICO labels to a span candidate. PICOX was evaluated using 1 of the best-performing baselines, EBM-NLP, and 3 more datasets, ie, PICO-Corpus and randomized controlled trial publications on Alzheimer's Disease (AD) or COVID-19, using entity-level precision, recall, and F1 scores. RESULTS: PICOX achieved superior precision, recall, and F1 scores across the board, with the micro F1 score improving from 45.05 to 50.87 (P ≪.01). On the PICO-Corpus, PICOX obtained higher recall and F1 scores than the baseline and improved the micro recall score from 56.66 to 67.33. On the COVID-19 dataset, PICOX also outperformed the baseline and improved the micro F1 score from 77.10 to 80.32. On the AD dataset, PICOX demonstrated comparable F1 scores with higher precision when compared to the baseline. CONCLUSION: PICOX excels in identifying overlapping entities and consistently surpasses a leading baseline across multiple datasets. Ablation studies reveal that its data augmentation strategy effectively minimizes false positives and improves precision.


Asunto(s)
Enfermedad de Alzheimer , COVID-19 , Humanos , Procesamiento de Lenguaje Natural
19.
BMC Med Inform Decis Mak ; 22(Suppl 2): 348, 2024 Mar 03.
Artículo en Inglés | MEDLINE | ID: mdl-38433189

RESUMEN

BACKGROUND: Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW). METHODS: We developed five algorithms: a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC). RESULTS: Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm. CONCLUSION: Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies.


Asunto(s)
Lupus Eritematoso Sistémico , Nefritis Lúpica , Humanos , Nefritis Lúpica/diagnóstico , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Fenotipo , Enfermedades Raras
20.
HGG Adv ; 5(2): 100281, 2024 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-38414240

RESUMEN

Research on polygenic risk scores (PRSs) for common, genetically complex chronic diseases aims to improve health-related predictions, tailor risk-reducing interventions, and improve health outcomes. Yet, the study and use of PRSs in clinical settings raise equity, clinical, and regulatory challenges that can be greater for individuals from historically marginalized racial, ethnic, and other minoritized communities. As part of the National Human Genome Research Institute-funded Electronic Medical Records and Genomics IV Network, we conducted online focus groups with patients/community members, clinicians, and members of institutional review boards to explore their views on key issues, including PRS research, return of PRS results, clinical translation, and barriers and facilitators to health behavioral changes in response to PRS results. Across stakeholder groups, our findings indicate support for PRS development and a strong interest in having PRS results returned to research participants. However, we also found multi-level barriers and significant differences in stakeholders' views about what is needed and possible for successful implementation. These include researcher-participant interaction formats, health and genomic literacy, and a range of structural barriers, such as financial instability, insurance coverage, and the absence of health-supporting infrastructure and affordable healthy food options in poorer neighborhoods. Our findings highlight the need to revisit and implement measures in PRS studies (e.g., incentives and resources for follow-up care), as well as system-level policies to promote equity in genomic research and health outcomes.


Asunto(s)
Registros Electrónicos de Salud , Puntuación de Riesgo Genético , Humanos , Grupos Focales
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA