Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 185
Filtrar
1.
F1000Res ; 13: 664, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39220382

RESUMO

Background: An abundance of rapidly accumulating scientific evidence presents novel opportunities for researchers and practitioners alike, yet such advantages are often overshadowed by resource demands associated with finding and aggregating a continually expanding body of scientific information. Data extraction activities associated with evidence synthesis have been described as time-consuming to the point of critically limiting the usefulness of research. Across social science disciplines, the use of automation technologies for timely and accurate knowledge synthesis can enhance research translation value, better inform key policy development, and expand the current understanding of human interactions, organizations, and systems. Ongoing developments surrounding automation are highly concentrated in research for evidence-based medicine with limited evidence surrounding tools and techniques applied outside of the clinical research community. The goal of the present study is to extend the automation knowledge base by synthesizing current trends in the application of extraction technologies of key data elements of interest for social scientists. Methods: We report the baseline results of a living systematic review of automated data extraction techniques supporting systematic reviews and meta-analyses in the social sciences. This review follows PRISMA standards for reporting systematic reviews. Results: The baseline review of social science research yielded 23 relevant studies. Conclusions: When considering the process of automating systematic review and meta-analysis information extraction, social science research falls short as compared to clinical research that focuses on automatic processing of information related to the PICO framework. With a few exceptions, most tools were either in the infancy stage and not accessible to applied researchers, were domain specific, or required substantial manual coding of articles before automation could occur. Additionally, few solutions considered extraction of data from tables which is where key data elements reside that social and behavioral scientists analyze.


Assuntos
Ciências Sociais , Ciências Sociais/métodos , Humanos , Metanálise como Assunto , Automação , Armazenamento e Recuperação da Informação/métodos
2.
JMIR Res Protoc ; 13: e55092, 2024 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-39240683

RESUMO

BACKGROUND: The global community has set an ambitious goal to end HIV/AIDS as a public health threat by 2030. Significant progress has been achieved in pursuing these objectives; however, concerns remain regarding the lack of disaggregated routine data for key populations (KPs) for a targeted HIV response. KPs include female sex workers, transgender populations, gay men and other men who have sex with men, people who are incarcerated, and people who use drugs. From an epidemiological perspective, KPs play a fundamental role in shaping the dynamics of HIV transmission due to specific behaviors. In South Africa, routine health information management systems (RHIMS) do not include a unique identifier code (UIC) for KPs. The purpose of this protocol is to develop the framework for improved HIV monitoring and programming through piloting the inclusion of KPs UIC in the South African RHIMS. OBJECTIVE: This paper aims to describe the protocol for a multiphased study to pilot the inclusion of KPs UIC in RHIMS. METHODS: We will conduct a multiphased study to pilot the framework for the inclusion of KPs UIC in the RHIMS. The study has attained the University of Johannesburg Research Ethics Committee approval (REC-2518-2023). This study has four objectives, including a systematic review, according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines (objective 1). Second, policy document review and in-depth stakeholder interviews using semistructured questionnaires (objective 2). Third, exploratory data analysis of deidentified HIV data sets (objective 3), and finally, piloting the framework to assess the feasibility of incorporating KPs UIC in RHIMS using findings from objectives 1, 2, and 3 (objective 4). Qualitative and quantitative data will be analyzed using ATLAS.ti (version 6; ATLAS.ti Scientific Software Development GmbH) and Python (version 3.8; Python Software Foundation) programming language, respectively. RESULTS: The results will encompass a systematic review of literature, qualitative interviews, and document reviews, along with exploratory analysis of deidentified routine program data and findings from the pilot study. The systematic review has been registered in PROSPERO (International Prospective Register of Systematic Reviews; CRD42023440656). Data collection is planned to commence in September 2024 and expected results for all objectives will be published by December 2025. CONCLUSIONS: The study will produce a framework to be recommended for the inclusion of the KP UIC national rollout. The study results will contribute to the knowledge base around the inclusion of KPs UIC in RHIMS data. TRIAL REGISTRATION: PROSPERO CRD42023440656; https://tinyurl.com/msnppany. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): PRR1-10.2196/55092.


Assuntos
Infecções por HIV , Gestão da Informação em Saúde , Humanos , África do Sul/epidemiologia , Infecções por HIV/prevenção & controle , Infecções por HIV/epidemiologia , Infecções por HIV/transmissão , Projetos Piloto , Gestão da Informação em Saúde/métodos , Masculino , Feminino
3.
Stud Health Technol Inform ; 316: 949-950, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176948

RESUMO

In the field of medical data analysis, converting unstructured text documents into a structured format suitable for further use is a significant challenge. This study introduces an automated local deployed data privacy secure pipeline that uses open-source Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) architecture to convert medical German language documents with sensitive health-related information into a structured format. Testing on a proprietary dataset of 800 unstructured original medical reports demonstrated an accuracy of up to 90% in data extraction of the pipeline compared to data extracted manually by physicians and medical students. This highlights the pipeline's potential as a valuable tool for efficiently extracting relevant data from unstructured sources.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Alemanha , Armazenamento e Recuperação da Informação/métodos , Humanos , Segurança Computacional , Mineração de Dados/métodos
4.
Toxicology ; 508: 153933, 2024 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-39181527

RESUMO

To underpin scientific evaluations of chemical risks, agencies such as the European Food Safety Authority (EFSA) heavily rely on the outcome of systematic reviews, which currently require extensive manual effort. One specific challenge constitutes the meaningful use of vast amounts of valuable data from new approach methodologies (NAMs) which are mostly reported in an unstructured way in the scientific literature. In the EFSA-initiated project 'AI4NAMS', the potential of large language models (LLMs) was explored. Models from the GPT family, where GPT refers to Generative Pre-trained Transformer, were used for searching, extracting, and integrating data from scientific publications for NAM-based risk assessment. A case study on bisphenol A (BPA), a substance of very high concern due to its adverse effects on human health, focused on the structured extraction of information on test systems measuring biologic activities of BPA. Fine-tuning of a GPT-3 model (Curie base model) for extraction tasks was tested and the performance of the fine-tuned model was compared to the performance of a ready-to-use model (text-davinci-002). To update findings from the AI4NAMS project and to check for technical progress, the fine-tuning exercise was repeated and a newer ready-to-use model (text-davinci-003) served as comparison. In both cases, the fine-tuned Curie model was found to be superior to the ready-to-use model. Performance improvement was also obvious between text-davinci-002 and the newer text-davinci-003. Our findings demonstrate how fine-tuning and the swift general technical development improve model performance and contribute to the growing number of investigations on the use of AI in scientific and regulatory tasks.

5.
Stud Health Technol Inform ; 316: 1255-1259, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176609

RESUMO

This paper presents a chatbot that simplifies accessing and understanding the open-access records of adverse events related to medical devices in the MAUDE database. The chatbot is powered by generative AI technology, enabling count and search queries. The chatbot uses the openFDA API and GPT-4 model to interpret users' natural language queries, generate appropriate API calls, and summarize adverse event reports. The chatbot also provides a downloadable link to the original reports. The model's performance in generating accurate API calls was assessed and improved by training it with few-shot examples of query-URL pairs. Additionally, the quality of content-based summaries was evaluated by human expert ratings. This initiative is a significant step towards making patient safety data accessible, replicable, and easily manageable by a broader range of researchers.


Assuntos
Processamento de Linguagem Natural , Humanos , Inteligência Artificial , Bases de Dados Factuais , Segurança do Paciente , Registros Eletrônicos de Saúde
6.
Stud Health Technol Inform ; 316: 1861-1865, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176854

RESUMO

Using clinical decision support systems (CDSSs) for breast cancer management necessitates to extract relevant patient data from textual reports which is a complex task although efficiently achieved by machine learning but black box methods. We proposed a rule-based natural language processing (NLP) method to automate the translation of breast cancer patient summaries into structured patient profiles suitable for input into the guideline-based CDSS of the DESIREE project. Our method encompasses named entity recognition (NER), relation extraction and structured data extraction to systematically organize patient data. The method demonstrated strong alignment with treatment recommendations generated for manually created patient profiles (gold standard) with only 2% of differences. Moreover, the NER pipeline achieved an average F1-score of 0.9 across the main entities (patient, side, and tumor), of 0,87 for relation extraction, and 0.75 for contextual information, showing promising results for rule-based NLP.


Assuntos
Neoplasias da Mama , Sistemas de Apoio a Decisões Clínicas , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Neoplasias da Mama/terapia , Feminino , Mineração de Dados/métodos , Aprendizado de Máquina
7.
Health Inf Sci Syst ; 12(1): 37, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38974364

RESUMO

Obtaining high-quality data sets from raw data is a key step before data exploration and analysis. Nowadays, in the medical domain, a large amount of data is in need of quality improvement before being used to analyze the health condition of patients. There have been many researches in data extraction, data cleaning and data imputation, respectively. However, there are seldom frameworks integrating with these three techniques, making the dataset suffer in accuracy, consistency and integrity. In this paper, a multi-source heterogeneous data enhancement framework based on a lakehouse MHDP is proposed, which includes three steps of data extraction, data cleaning and data imputation. In the data extraction step, a data fusion technique is offered to handle multi-modal and multi-source heterogeneous data. In the data cleaning step, we propose HoloCleanX, which provides a convenient interactive procedure. In the data imputation step, multiple imputation (MI) and the SOTA algorithm SAITS, are applied for different situations. We evaluate our framework via three tasks: clustering, classification and strategy prediction. The experimental results prove the effectiveness of our data enhancement framework.

8.
J Gynecol Oncol ; 35(4): e54, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38991943

RESUMO

OBJECTIVE: In this study, we collected data over 8 years (2012-2019) from the Japan Society of Obstetrics and Gynecology (JSOG) tumor registry to determine the status of endometrial cancer in Japan, and analyzed detailed clinicopathological factors. METHODS: The JSOG maintains a tumor registry that gathers information on endometrial cancer treated at the JSOG-registered institutions. Data from the patients whose endometrial cancer treatment was initiated from 2012 to 2019 were analyzed retrospectively. RESULTS: A total of 82,969 patients with endometrial cancer underwent treatment from 2012 to 2019. Chemotherapy alone or in combination with hormonal therapy is more common among endometrial cancer patients under 40 years compared with those over 40 years. The number of patients with endometrial cancer, treated with laparoscopic or robot-assisted surgery was observed to have increased yearly. Small cell carcinomas and undifferentiated carcinomas were more likely to be diagnosed at an advanced stage. Lymphadenectomy was most commonly performed for stage IIIC2 disease, whereas positive peritoneal washing cytology was most common for stage IVB and serous carcinoma. CONCLUSION: Multi-year summary reports provided detailed clinicopathological information regarding endometrial cancer that could not be obtained in a single year. These reports were useful in understanding treatment strategies and trends over time based on age, histology, and stage.


Assuntos
Neoplasias do Endométrio , Estadiamento de Neoplasias , Sistema de Registros , Humanos , Feminino , Neoplasias do Endométrio/patologia , Neoplasias do Endométrio/terapia , Neoplasias do Endométrio/cirurgia , Japão/epidemiologia , Pessoa de Meia-Idade , Adulto , Idoso , Estudos Retrospectivos , Excisão de Linfonodo/estatística & dados numéricos , Laparoscopia/estatística & dados numéricos , Procedimentos Cirúrgicos Robóticos/estatística & dados numéricos , Idoso de 80 Anos ou mais
9.
EFSA J ; 22(7): e8898, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39010863

RESUMO

This scientific report provides an update of the Xylella spp. host plant database, aiming to provide information and scientific support to risk assessors, risk managers and researchers dealing with Xylella spp. Upon a mandate of the European Commission, EFSA created and regularly updates a database of host plant species of Xylella spp. The current mandate covers the period 2021-2026. This report is related to the 10th version of the database published in Zenodo in the EFSA Knowledge Junction community, covering literature published from 1 July 2023 up to 31 December 2023, and recent Europhyt outbreak notifications. Informative data have been extracted from 39 selected publications. Sixteen new host plants, five genera and one family were identified and added to the database. They were naturally infected by X. fastidiosa subsp. fastidiosa or unknown either in Portugal or the United States. No additional data were retrieved for X. taiwanensis, and no additional multilocus sequence types (STs) were identified worldwide. New information on the tolerant/resistant response of plant species to X. fastidiosa infection were added to the database. The Xylella spp. host plant species were listed in different categories based on the number and type of detection methods applied for each finding. The overall number of Xylella spp. host plants determined with at least two different detection methods or positive with one method either by sequencing or pure culture isolation (category A), reaches now 451 plant species, 204 genera and 70 families. Such numbers rise to 712 plant species, 312 genera and 89 families if considered regardless of the detection methods applied (category E).

10.
JMIR Form Res ; 8: e54407, 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38980712

RESUMO

Social media analyses have become increasingly popular among health care researchers. Social media continues to grow its user base and, when analyzed, offers unique insight into health problems. The process of obtaining data for social media analyses varies greatly and involves ethical considerations. Data extraction is often facilitated by software tools, some of which are open source, while others are costly and therefore not accessible to all researchers. The use of software for data extraction is accompanied by additional challenges related to the uniqueness of social media data. Thus, this paper serves as a tutorial for a simple method of extracting social media data that is accessible to novice health care researchers and public health professionals who are interested in pursuing social media research. The discussed methods were used to extract data from Facebook for a study of maternal perspectives on sudden unexpected infant death.

11.
J Med Internet Res ; 26: e57586, 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39083789

RESUMO

BACKGROUND: The use of telehealth has rapidly increased, yet some populations may be disproportionally excluded from accessing and using this modality of care. Training service users in telehealth may increase accessibility for certain groups. The extent and nature of these training activities have not been explored. OBJECTIVE: The objective of this scoping review is to identify and describe activities for training service users in the use of telehealth. METHODS: Five databases (MEDLINE [via PubMed], Embase, CINAHL, PsycINFO, and Web of Science) were searched in June 2023. Studies that described activities to train service users in the use of synchronous telehealth consultations were eligible for inclusion. Studies that focused on health care professional education were excluded. Papers were limited to those published in the English language. The review followed the Joanna Briggs Institute guidelines for scoping reviews and was reported in line with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. Titles and abstracts were screened by 1 reviewer (EG). Full texts were screened by 2 reviewers (EG and JH or SC). Data extraction was guided by the research question. RESULTS: The search identified 8087 unique publications. In total, 13 studies met the inclusion criteria. Telehealth training was commonly described as once-off preparatory phone calls to service users before a telehealth visit, facilitated primarily by student volunteers, and accompanied by written instructions. The training content included guidance on how to download and install software, troubleshoot technical issues, and adjust device settings. Older adults were the most common target population for the training. All but 1 of the studies were conducted during the COVID-19 pandemic. Overall, training was feasible and well-received by service users, and studies mostly reported increased rates of video visits following training. There was limited and mixed evidence that training improved participants' competency with telehealth. CONCLUSIONS: The review mapped the literature on training activities for service users in telehealth. The common features of telehealth training for service users included once-off preparatory phone calls on the technical elements of telehealth, targeted at older adults. Key issues for consideration include the need for co-designed training and improving the broader digital skills of service users. There is a need for further studies to evaluate the outcomes of telehealth training activities in geographically diverse areas.


Assuntos
Telemedicina , Humanos , Telemedicina/estatística & dados numéricos , COVID-19 , Adulto , Idoso
12.
Res Synth Methods ; 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38895747

RESUMO

Accurate data extraction is a key component of evidence synthesis and critical to valid results. The advent of publicly available large language models (LLMs) has generated interest in these tools for evidence synthesis and created uncertainty about the choice of LLM. We compare the performance of two widely available LLMs (Claude 2 and GPT-4) for extracting pre-specified data elements from 10 published articles included in a previously completed systematic review. We use prompts and full study PDFs to compare the outputs from the browser versions of Claude 2 and GPT-4. GPT-4 required use of a third-party plugin to upload and parse PDFs. Accuracy was high for Claude 2 (96.3%). The accuracy of GPT-4 with the plug-in was lower (68.8%); however, most of the errors were due to the plug-in. Both LLMs correctly recognized when prespecified data elements were missing from the source PDF and generated correct information for data elements that were not reported explicitly in the articles. A secondary analysis demonstrated that, when provided selected text from the PDFs, Claude 2 and GPT-4 accurately extracted 98.7% and 100% of the data elements, respectively. Limitations include the narrow scope of the study PDFs used, that prompt development was completed using only Claude 2, and that we cannot guarantee the open-source articles were not used to train the LLMs. This study highlights the potential for LLMs to revolutionize data extraction but underscores the importance of accurate PDF parsing. For now, it remains essential for a human investigator to validate LLM extractions.

13.
BMC Med Res Methodol ; 24(1): 139, 2024 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-38918736

RESUMO

BACKGROUND: Large language models (LLMs) that can efficiently screen and identify studies meeting specific criteria would streamline literature reviews. Additionally, those capable of extracting data from publications would enhance knowledge discovery by reducing the burden on human reviewers. METHODS: We created an automated pipeline utilizing OpenAI GPT-4 32 K API version "2023-05-15" to evaluate the accuracy of the LLM GPT-4 responses to queries about published papers on HIV drug resistance (HIVDR) with and without an instruction sheet. The instruction sheet contained specialized knowledge designed to assist a person trying to answer questions about an HIVDR paper. We designed 60 questions pertaining to HIVDR and created markdown versions of 60 published HIVDR papers in PubMed. We presented the 60 papers to GPT-4 in four configurations: (1) all 60 questions simultaneously; (2) all 60 questions simultaneously with the instruction sheet; (3) each of the 60 questions individually; and (4) each of the 60 questions individually with the instruction sheet. RESULTS: GPT-4 achieved a mean accuracy of 86.9% - 24.0% higher than when the answers to papers were permuted. The overall recall and precision were 72.5% and 87.4%, respectively. The standard deviation of three replicates for the 60 questions ranged from 0 to 5.3% with a median of 1.2%. The instruction sheet did not significantly increase GPT-4's accuracy, recall, or precision. GPT-4 was more likely to provide false positive answers when the 60 questions were submitted individually compared to when they were submitted together. CONCLUSIONS: GPT-4 reproducibly answered 3600 questions about 60 papers on HIVDR with moderately high accuracy, recall, and precision. The instruction sheet's failure to improve these metrics suggests that more sophisticated approaches are necessary. Either enhanced prompt engineering or finetuning an open-source model could further improve an LLM's ability to answer questions about highly specialized HIVDR papers.


Assuntos
Infecções por HIV , Humanos , Reprodutibilidade dos Testes , Infecções por HIV/tratamento farmacológico , PubMed , Publicações/estatística & dados numéricos , Publicações/normas , Armazenamento e Recuperação da Informação/métodos , Armazenamento e Recuperação da Informação/normas , Software
14.
Wellcome Open Res ; 9: 168, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38873399

RESUMO

Background: The Behaviour Change Intervention Ontology (BCIO) aims to improve the clarity, completeness and consistency of reporting within intervention descriptions and evidence synthesis. However, a recommended method for transparently annotating intervention evaluation reports using the BCIO does not currently exist. This study aimed to develop a data extraction template for annotating using the BCIO. Methods: The BCIO data extraction template was developed in four stages: i) scoping review of papers citing component ontologies within the BCIO, ii) development of a draft template, iii) piloting and revising the template, and iv) dissemination and maintenance of the template. Results: A prototype data extraction template using Microsoft Excel was developed based on BCIO annotations from 14 papers. The 'BCIO data extraction template v1' was produced following piloting and revision, incorporating a facility for user feedback. Discussion: This data extraction template provides a single, accessible resource to extract all necessary characteristics of behaviour change intervention scenarios. It can be used to annotate the presence of BCIO entities for evidence synthesis, including systematic reviews. In the future, we will update this template based on feedback from the community, additions of newly published ontologies within the BCIO, and revisions to existing ontologies.


Behaviour change interventions are often reported in an inconsistent and incomplete manner in study reports. This makes it difficult to build knowledge and predict outcomes. There is a need for a shared language to describe behaviour change interventions. This need was met using 'ontologies', which are classification systems that represent knowledge in a standardised way. The Behaviour Change Intervention Ontology (BCIO) has been developed to describe the different aspects of interventions in a way that is precise enough for computers as well as humans to 'read' study findings. The BCIO can be used to extract information from study reports for evidence synthesis, such as systematic literature reviews. To meet the need for a resource for annotating (coding) study reports according to the BCIO, we developed a data extraction template. The template was developed in four stages: i) reviewing existing papers using the BCIO, ii) development of a draft template, iii) piloting and revising the template, and iv) dissemination and maintenance of the template. The resulting resource is an accessible, easy-to-use template to assist with specifying the content of published papers reporting interventions and their evaluation. The template will be updated based on user feedback and future revisions to the BCIO.

15.
J Bioinform Comput Biol ; 22(2): 2450005, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38779780

RESUMO

Enzymes catalyze diverse biochemical reactions and are building blocks of cellular and metabolic pathways. Data and metadata of enzymes are distributed across databases and are archived in various formats. The enzyme databases provide utilities for efficient searches and downloading enzyme records in batch mode but do not support organism-specific extraction of subsets of data. Users are required to write scripts for parsing entries for customized data extraction prior to downstream analysis. Integrated Customized Extraction of Enzyme Data (iCEED) has been developed to provide organism-specific customized data extraction utilities for seven commonly used enzyme databases and brings these resources under an integrated portal. iCEED provides dropdown menus and search boxes using typehead utility for submission of queries as well as enzyme class-based browsing utility. A utility to facilitate mapping and visualization of functionally important features on the three-dimensional (3D) structures of enzymes is integrated. The customized data extraction utilities provided in iCEED are expected to be useful for biochemists, biotechnologists, computational biologists, and life science researchers to build curated datasets of their choice through an easy to navigate web-based interface. The integrated feature visualization system is useful for a fine-grained understanding of the enzyme structure-function relationship. Desired subsets of data, extracted and curated using iCEED can be subsequently used for downstream processing, analyses, and knowledge discovery. iCEED can also be used for training and teaching purposes.


Assuntos
Bases de Dados de Proteínas , Enzimas , Software , Enzimas/química , Enzimas/metabolismo , Biologia Computacional/métodos , Interface Usuário-Computador , Internet
16.
World J Surg ; 48(6): 1297-1300, 2024 06.
Artigo em Inglês | MEDLINE | ID: mdl-38794809

RESUMO

The transformative potential of web scraping in surgical research through a comprehensive analysis of its revolutionary applications and profound impact is now within reach. This manuscript unveils the pivotal role of web scraping in driving innovation, enabling more effective management of human capital dynamics, and enhancing patient outcomes in the surgical field. As an example, we demonstrate how web scraping can uncover insights into international collaboration in surgery research revealing limited collaboration between surgeons in developed and developing countries.


Assuntos
Pesquisa Biomédica , Cooperação Internacional , Internet , Humanos , Países em Desenvolvimento , Cirurgia Geral
17.
BMC Res Notes ; 17(1): 115, 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38654333

RESUMO

OBJECTIVE: Pulmonary function test (PFT) results are recorded variably across hospitals in the Department of Veterans Affairs (VA) electronic health record (EHR), using both unstructured and semi-structured notes. We developed and validated a hospital-specific code to extract pre-bronchodilator measures of obstruction (ratio of forced expiratory volume in one second [FEV1] to forced vital capacity [FVC]) and severity of obstruction (percent predicted of FEV1). RESULTS: Among 36 VA facilities with the most PFTs completed between 2018 and 2022 from a parent cohort of veterans receiving long-acting controller inhalers, 12 had a consistent syntactical convention or template for reporting PFT data in the EHR. Of the 42,718 PFTs identified from these 12 facilities, the hospital-specific text processing pipeline yielded 24,860 values for the FEV1:FVC ratio and 23,729 values for FEV1. A ratio of FEV1:FVC less than 0.7 was identified in 17,615 of 24,922 studies (70.7%); 8864 of 24,922 (35.6%) had a severe or very severe reduction in FEV1 (< 50% of the predicted value). Among 100 randomly selected PFT reports reviewed by two pulmonary physicians, the coding solution correctly identified the presence of obstruction in 99 out of 100 studies and the degree of obstruction in 96 out of 100 studies.


Assuntos
Registros Eletrônicos de Saúde , Testes de Função Respiratória , United States Department of Veterans Affairs , Humanos , Estados Unidos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Testes de Função Respiratória/métodos , Volume Expiratório Forçado , Capacidade Vital , Veteranos/estatística & dados numéricos , Masculino , Feminino
18.
Res Synth Methods ; 15(4): 576-589, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38432227

RESUMO

Data extraction is a crucial, yet labor-intensive and error-prone part of evidence synthesis. To date, efforts to harness machine learning for enhancing efficiency of the data extraction process have fallen short of achieving sufficient accuracy and usability. With the release of large language models (LLMs), new possibilities have emerged to increase efficiency and accuracy of data extraction for evidence synthesis. The objective of this proof-of-concept study was to assess the performance of an LLM (Claude 2) in extracting data elements from published studies, compared with human data extraction as employed in systematic reviews. Our analysis utilized a convenience sample of 10 English-language, open-access publications of randomized controlled trials included in a single systematic review. We selected 16 distinct types of data, posing varying degrees of difficulty (160 data elements across 10 studies). We used the browser version of Claude 2 to upload the portable document format of each publication and then prompted the model for each data element. Across 160 data elements, Claude 2 demonstrated an overall accuracy of 96.3% with a high test-retest reliability (replication 1: 96.9%; replication 2: 95.0% accuracy). Overall, Claude 2 made 6 errors on 160 data items. The most common errors (n = 4) were missed data items. Importantly, Claude 2's ease of use was high; it required no technical expertise or labeled training data for effective operation (i.e., zero-shot learning). Based on findings of our proof-of-concept study, leveraging LLMs has the potential to substantially enhance the efficiency and accuracy of data extraction for evidence syntheses.


Assuntos
Aprendizado de Máquina , Estudo de Prova de Conceito , Humanos , Reprodutibilidade dos Testes , Revisões Sistemáticas como Assunto , Ensaios Clínicos Controlados Aleatórios como Assunto , Algoritmos , Armazenamento e Recuperação da Informação/métodos , Idioma , Software , Processamento de Linguagem Natural , Projetos de Pesquisa
19.
JMIR Res Protoc ; 13: e56933, 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38526541

RESUMO

BACKGROUND: Atypical presentations have been increasingly recognized as a significant contributing factor to diagnostic errors in internal medicine. However, research to address associations between atypical presentations and diagnostic errors has not been evaluated due to the lack of widely applicable definitions and criteria for what is considered an atypical presentation. OBJECTIVE: The aim of the study is to describe how atypical presentations are defined and measured in studies of diagnostic errors in internal medicine and use this new information to develop new criteria to identify atypical presentations at high risk for diagnostic errors. METHODS: This study will follow an established framework for conducting scoping reviews. Inclusion criteria are developed according to the participants, concept, and context framework. This review will consider studies that fulfill all of the following criteria: include adult patients (participants); explore the association between atypical presentations and diagnostic errors using any definition, criteria, or measurement to identify atypical presentations and diagnostic errors (concept); and focus on internal medicine (context). Regarding the type of sources, this scoping review will consider quantitative, qualitative, and mixed methods study designs; systematic reviews; and opinion papers for inclusion. Case reports, case series, and conference abstracts will be excluded. The data will be extracted through MEDLINE, Web of Science, CINAHL, Embase, Cochrane Library, and Google Scholar searches. No limits will be applied to language, and papers indexed from database inception to December 31, 2023, will be included. Two independent reviewers (YH and RK) will conduct study selection and data extraction. The data extracted will include specific details about the patient characteristics (eg, age, sex, and disease), the definitions and measuring methods for atypical presentations and diagnostic errors, clinical settings (eg, department and outpatient or inpatient), type of evidence source, and the association between atypical presentations and diagnostic errors relevant to the review question. The extracted data will be presented in tabular format with descriptive statistics, allowing us to identify the key components or types of atypical presentations and develop new criteria to identify atypical presentations for future studies of diagnostic errors. Developing the new criteria will follow guidance for a basic qualitative content analysis with an inductive approach. RESULTS: As of January 2024, a literature search through multiple databases is ongoing. We will complete this study by December 2024. CONCLUSIONS: This scoping review aims to provide rigorous evidence to develop new criteria to identify atypical presentations at high risk for diagnostic errors in internal medicine. Such criteria could facilitate the development of a comprehensive conceptual model to understand the associations between atypical presentations and diagnostic errors in internal medicine. TRIAL REGISTRATION: Open Science Framework; www.osf.io/27d5m. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/56933.

20.
J Med Internet Res ; 26: e54580, 2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38551633

RESUMO

BACKGROUND: The study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), semantic-based extraction pipelines are gaining acceptance in clinical research. However, the security and feature hallucination issues of LLMs require further attention. OBJECTIVE: This study aimed to introduce a novel modular LLM pipeline, which could semantically extract features from textual patient admission records. METHODS: The pipeline was designed to process a systematic succession of concept extraction, aggregation, question generation, corpus extraction, and question-and-answer scale extraction, which was tested via 2 low-parameter LLMs: Qwen-14B-Chat (QWEN) and Baichuan2-13B-Chat (BAICHUAN). A data set of 25,709 pregnancy cases from the People's Hospital of Guangxi Zhuang Autonomous Region, China, was used for evaluation with the help of a local expert's annotation. The pipeline was evaluated with the metrics of accuracy and precision, null ratio, and time consumption. Additionally, we evaluated its performance via a quantified version of Qwen-14B-Chat on a consumer-grade GPU. RESULTS: The pipeline demonstrates a high level of precision in feature extraction, as evidenced by the accuracy and precision results of Qwen-14B-Chat (95.52% and 92.93%, respectively) and Baichuan2-13B-Chat (95.86% and 90.08%, respectively). Furthermore, the pipeline exhibited low null ratios and variable time consumption. The INT4-quantified version of QWEN delivered an enhanced performance with 97.28% accuracy and a 0% null ratio. CONCLUSIONS: The pipeline exhibited consistent performance across different LLMs and efficiently extracted clinical features from textual data. It also showed reliable performance on consumer-grade hardware. This approach offers a viable and effective solution for mining clinical research data from textual records.


Assuntos
Mineração de Dados , Registros Eletrônicos de Saúde , Humanos , Mineração de Dados/métodos , Processamento de Linguagem Natural , China , Idioma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA