Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 738
Filtrar
Más filtros

Intervalo de año de publicación
1.
J Am Med Inform Assoc ; 31(9): 1939-1952, 2024 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-39042516

RESUMEN

OBJECTIVE: This paper aims to address the challenges in abstract screening within systematic reviews (SR) by leveraging the zero-shot capabilities of large language models (LLMs). METHODS: We employ LLM to prioritize candidate studies by aligning abstracts with the selection criteria outlined in an SR protocol. Abstract screening was transformed into a novel question-answering (QA) framework, treating each selection criterion as a question addressed by LLM. The framework involves breaking down the selection criteria into multiple questions, properly prompting LLM to answer each question, scoring and re-ranking each answer, and combining the responses to make nuanced inclusion or exclusion decisions. RESULTS AND DISCUSSION: Large-scale validation was performed on the benchmark of CLEF eHealth 2019 Task 2: Technology-Assisted Reviews in Empirical Medicine. Focusing on GPT-3.5 as a case study, the proposed QA framework consistently exhibited a clear advantage over traditional information retrieval approaches and bespoke BERT-family models that were fine-tuned for prioritizing candidate studies (ie, from the BERT to PubMedBERT) across 31 datasets of 4 categories of SRs, underscoring their high potential in facilitating abstract screening. The experiments also showcased the viability of using selection criteria as a query for reference prioritization. The experiments also showcased the viability of the framework using different LLMs. CONCLUSION: Investigation justified the indispensable value of leveraging selection criteria to improve the performance of automated abstract screening. LLMs demonstrated proficiency in prioritizing candidate studies for abstract screening using the proposed QA framework. Significant performance improvements were obtained by re-ranking answers using the semantic alignment between abstracts and selection criteria. This further highlighted the pertinence of utilizing selection criteria to enhance abstract screening.


Asunto(s)
Procesamiento de Lenguaje Natural , Indización y Redacción de Resúmenes/métodos , Revisiones Sistemáticas como Asunto , Humanos , Almacenamiento y Recuperación de la Información/métodos
2.
J Med Internet Res ; 26: e52001, 2024 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-38924787

RESUMEN

BACKGROUND: Due to recent advances in artificial intelligence (AI), language model applications can generate logical text output that is difficult to distinguish from human writing. ChatGPT (OpenAI) and Bard (subsequently rebranded as "Gemini"; Google AI) were developed using distinct approaches, but little has been studied about the difference in their capability to generate the abstract. The use of AI to write scientific abstracts in the field of spine surgery is the center of much debate and controversy. OBJECTIVE: The objective of this study is to assess the reproducibility of the structured abstracts generated by ChatGPT and Bard compared to human-written abstracts in the field of spine surgery. METHODS: In total, 60 abstracts dealing with spine sections were randomly selected from 7 reputable journals and used as ChatGPT and Bard input statements to generate abstracts based on supplied paper titles. A total of 174 abstracts, divided into human-written abstracts, ChatGPT-generated abstracts, and Bard-generated abstracts, were evaluated for compliance with the structured format of journal guidelines and consistency of content. The likelihood of plagiarism and AI output was assessed using the iThenticate and ZeroGPT programs, respectively. A total of 8 reviewers in the spinal field evaluated 30 randomly extracted abstracts to determine whether they were produced by AI or human authors. RESULTS: The proportion of abstracts that met journal formatting guidelines was greater among ChatGPT abstracts (34/60, 56.6%) compared with those generated by Bard (6/54, 11.1%; P<.001). However, a higher proportion of Bard abstracts (49/54, 90.7%) had word counts that met journal guidelines compared with ChatGPT abstracts (30/60, 50%; P<.001). The similarity index was significantly lower among ChatGPT-generated abstracts (20.7%) compared with Bard-generated abstracts (32.1%; P<.001). The AI-detection program predicted that 21.7% (13/60) of the human group, 63.3% (38/60) of the ChatGPT group, and 87% (47/54) of the Bard group were possibly generated by AI, with an area under the curve value of 0.863 (P<.001). The mean detection rate by human reviewers was 53.8% (SD 11.2%), achieving a sensitivity of 56.3% and a specificity of 48.4%. A total of 56.3% (63/112) of the actual human-written abstracts and 55.9% (62/128) of AI-generated abstracts were recognized as human-written and AI-generated by human reviewers, respectively. CONCLUSIONS: Both ChatGPT and Bard can be used to help write abstracts, but most AI-generated abstracts are currently considered unethical due to high plagiarism and AI-detection rates. ChatGPT-generated abstracts appear to be superior to Bard-generated abstracts in meeting journal formatting guidelines. Because humans are unable to accurately distinguish abstracts written by humans from those produced by AI programs, it is crucial to exercise special caution and examine the ethical boundaries of using AI programs, including ChatGPT and Bard.


Asunto(s)
Indización y Redacción de Resúmenes , Columna Vertebral , Humanos , Columna Vertebral/cirugía , Indización y Redacción de Resúmenes/normas , Indización y Redacción de Resúmenes/métodos , Reproducibilidad de los Resultados , Inteligencia Artificial , Escritura/normas
3.
BMC Med Res Methodol ; 24(1): 108, 2024 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-38724903

RESUMEN

OBJECTIVE: Systematic literature reviews (SLRs) are critical for life-science research. However, the manual selection and retrieval of relevant publications can be a time-consuming process. This study aims to (1) develop two disease-specific annotated corpora, one for human papillomavirus (HPV) associated diseases and the other for pneumococcal-associated pediatric diseases (PAPD), and (2) optimize machine- and deep-learning models to facilitate automation of the SLR abstract screening. METHODS: This study constructed two disease-specific SLR screening corpora for HPV and PAPD, which contained citation metadata and corresponding abstracts. Performance was evaluated using precision, recall, accuracy, and F1-score of multiple combinations of machine- and deep-learning algorithms and features such as keywords and MeSH terms. RESULTS AND CONCLUSIONS: The HPV corpus contained 1697 entries, with 538 relevant and 1159 irrelevant articles. The PAPD corpus included 2865 entries, with 711 relevant and 2154 irrelevant articles. Adding additional features beyond title and abstract improved the performance (measured in Accuracy) of machine learning models by 3% for HPV corpus and 2% for PAPD corpus. Transformer-based deep learning models that consistently outperformed conventional machine learning algorithms, highlighting the strength of domain-specific pre-trained language models for SLR abstract screening. This study provides a foundation for the development of more intelligent SLR systems.


Asunto(s)
Aprendizaje Automático , Infecciones por Papillomavirus , Humanos , Infecciones por Papillomavirus/diagnóstico , Economía Médica , Algoritmos , Evaluación de Resultado en la Atención de Salud/métodos , Aprendizaje Profundo , Indización y Redacción de Resúmenes/métodos
4.
Med Ref Serv Q ; 43(2): 106-118, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38722606

RESUMEN

The objective of this study was to examine the accuracy of indexing for "Appalachian Region"[Mesh]. Researchers performed a search in PubMed for articles published in 2019 using "Appalachian Region"[Mesh] or "Appalachia" or "Appalachian" in the title or abstract. Only 17.88% of the articles retrieved by the search were about Appalachia according to the ARC definition. Most articles retrieved appeared because they were indexed with state terms that were included as part of the mesh term. Database indexing and searching transparency is of growing importance as indexers rely increasingly on automated systems to catalog information and publications.


Asunto(s)
Indización y Redacción de Resúmenes , Región de los Apalaches , Indización y Redacción de Resúmenes/métodos , Humanos , Medical Subject Headings , PubMed , Bibliometría
5.
J Med Libr Assoc ; 111(3): 684-694, 2023 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-37483360

RESUMEN

Objective: In 2002, the National Library of Medicine (NLM) introduced semi-automated indexing of Medline using the Medical Text Indexer (MTI). In 2021, NLM announced that it would fully automate its indexing in Medline with an improved MTI by mid-2022. This pilot study examines indexing using a sample of records in Medline from 2000, and how an early, public version of MTI's outputs compares to records created by human indexers. Methods: This pilot study examines twenty Medline records from 2000, a year before the MTI was introduced as a MeSH term recommender. We identified twenty higher- and lower-impact biomedical journals based on Journal Impact Factor (JIF) and examined the indexing of papers by feeding their PubMed records into the Interactive MTI tool. Results: In the sample, we found key differences between automated and human-indexed Medline records: MTI assigned more terms and used them more accurately for citations in the higher JIF group, and MTI tended to rank the Male check tag more highly than the Female check tag and to omit Aged check tags. Sometimes MTI chose more specific terms than human indexers but was inconsistent in applying specificity principles. Conclusion: NLM's transition to fully automated indexing of the biomedical literature could introduce or perpetuate inconsistencies and biases in Medline. Librarians and searchers should assess changes to index terms, and their impact on PubMed's mapping features for a range of topics. Future research should evaluate automated indexing as it pertains to finding clinical information effectively, and in performing systematic searches.


Asunto(s)
Indización y Redacción de Resúmenes , MEDLINE , Medical Subject Headings , Indización y Redacción de Resúmenes/métodos , Indización y Redacción de Resúmenes/normas , National Library of Medicine (U.S.) , Proyectos Piloto , Estados Unidos
6.
Reprod Toxicol ; 113: 150-154, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36067870

RESUMEN

The Dutch Teratology Information Service Lareb counsels healthcare professionals and patients about medication use during pregnancy and lactation. To keep the evidence up to date, employees perform a standardized weekly PubMed query where relevant literature is identified manually. We aimed to develop an accurate machine-learning algorithm to predict the relevance of PubMed entries, thereby reducing the labor-intensive task of manually screening the articles. We fine-tuned a pre-trained natural language processing transformer model to identify relevant entries. We split 15,540 labeled entries into case-control-balanced train, validation, and test datasets. Additionally, we externally validated the model prospectively with 1288 labeled entries obtained from weekly queries after developing the model. This dataset was also independently labeled by a team of six experienced human raters to evaluate our model's performance. The validation of our machine learning model on the retrospectively collected outheld dataset obtained an area under the sensitivity-versus-specificity curve of 89.3 % (CI: 88.2- 90.4). In the prospective external validation of the model, our model classified relevant literature with a sensitivity versus specificity curve area of 87.4 % (CI: 85.0-89.8). Our model achieved a higher sensitivity than the human raters' team without sacrificing too much specificity. The team of human raters showed weak to moderate levels of agreement in their article classifications (kappa range 0.40-0.64). The human selection of the latest relevant literature is indispensable to keep the teratology information up to date. We show that automatic preselection of relevant abstracts using machine learning is possible without sacrificing the selection performance.


Asunto(s)
Indización y Redacción de Resúmenes , Algoritmos , Aprendizaje Automático , Teratología , Indización y Redacción de Resúmenes/métodos , Femenino , Humanos , Embarazo , Estudios Prospectivos , Reproducibilidad de los Resultados , Estudios Retrospectivos
8.
Dis Colon Rectum ; 65(3): 429-443, 2022 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-34108364

RESUMEN

BACKGROUND: A new bibliometric index called the disruption score was recently proposed to identify innovative and paradigm-changing publications. OBJECTIVE: The goal was to apply the disruption score to the colorectal surgery literature to provide the community with a repository of important research articles. DESIGN: This study is a bibliometric analysis. SETTINGS: The 100 most disruptive and developmental publications in Diseases of the Colon & Rectum, Colorectal Disease, International Journal of Colorectal Disease, and Techniques in Coloproctology were identified from a validated data set of disruption scores and linked with the iCite National Institutes of Health tool to obtain citation counts. MAIN OUTCOME MEASURES: The primary outcomes measured were the disruption score and citation count. RESULTS: We identified 12,127 articles published in Diseases of the Colon & Rectum (n = 8109), International Journal of Colorectal Disease (n = 1912), Colorectal Disease (n = 1751), and Techniques in Coloproctology (n = 355) between 1954 and 2014. Diseases of the Colon & Rectum had the most articles in the top 100 most disruptive and developmental lists. The disruptive articles were in the top 1% of the disruption score distribution in PubMed and were cited between 1 and 671 times. Being highly cited was weakly correlated with high disruption scores (r = 0.09). Developmental articles had disruption scores that were more strongly correlated with citation count (r = 0.18). LIMITATIONS: This study is subject to the limitations of bibliometric indices, which change over time. DISCUSSION: The disruption score identified insightful and paradigm-changing studies in colorectal surgery. These studies include a wide range of topics and consistently identified editorials and case reports/case series as important research. This bibliometric analysis provides colorectal surgeons with a unique archive of research that can often be overlooked but that may have scholarly significance. See Video Abstract at http://links.lww.com/DCR/B639.UN NUEVO INDICE BIBLIOMÉTRICO: LAS 100 MAS IMPORTANTES PUBLICACIONES EN INNOVACIONES DESESTABILIZADORAS Y DE DESARROLLO EN LAS REVISTAS DE CIRUGÍA COLORRECTALANTECEDENTES:Un nuevo índice bibliométrico llamado innovación desestabilizadora y de desarrollo ha sido propuesto para identificar publicaciones de vanguardia y que pueden romper paradigmas.OBJETIVO:La meta fué aplicar el índice de desestabilización a la literature en cirugía colorectal para aportar a la comunidad con un acervo importante de artículos de investigación.DISEÑO:Un análisis bibliométrico.PARAMETROS:Las 100 publicaciones mas desestabilizadores y de desarrollo en las revistas: Diseases of the Colon and Rectum, Colorectal Disease, International Journal of Colorectal Disease, y Techniques in Coloproctology se recuperaron de una base de datos validada con puntuaciones de desestabilización y se ligaron con la herramienta iCite NIH para obtener la cuantificación de citas.PRINCIPAL MEDIDA DE RESULTADO:El índice desestabilizador y la cuantificación de citas.RESULTADOS:Se identificaron 12,127 articulos publicados en Diseases of the Colon and Rectum (n = 8,109), International Journal of Colorectal Disease (n = 1,912), Colorectal Disease (n = 1,751), y Techniques in Coloproctology (n = 355) de 1954-2014. Diseases of the Colon and Rectum representó la mayoría de las publicaciones dentro de la lista de los 100 mas desestabilizadores y de desarrollo. Esta literatura desestabilizadora se encuentra en el principal 1% de la distribución de la puntuacón desestabilizadora en PubMed y se citaron de 1 a 671 veces. El ser citado con frecuencia se relacionó vagamente con las puntuaciones de desastibilización (r = 0.09). Los artículos de desarrollo tuvieron puntuaciones de desestabilización que estuvieron muy correlacionados con la cuantificación de las citas (r = 0.18).LIMITACIONES:Las sujetas a las limitaciones de los índices bibliométricos, que se modifican en el tiempo.DISCUSION:La putuación de desestabilicación identificó trabajos perspicaces, pragmáticos y modificadores de paradigmas en cirugía colorrectal. Es de interés identificar que se incluyeron una gran variedad de temas y en forma consistente editoriales, reportes de casos y series de casos que representaron una investigación importante. Este análisis bibliométrico aporta a los cirujanos colorrectales de un acervo de investigación único que puede con frecuencia pasarse por alto, y sin embargo tener una gran importancia académica. Consulte Video Resumen en http://links.lww.com/DCR/B639. (Traducción- Dr. Miguel Esquivel-Herrera).


Asunto(s)
Indización y Redacción de Resúmenes , Cirugía Colorrectal , Publicaciones , Indización y Redacción de Resúmenes/métodos , Indización y Redacción de Resúmenes/tendencias , Bibliometría , Cirugía Colorrectal/educación , Cirugía Colorrectal/métodos , Cirugía Colorrectal/tendencias , Humanos , Factor de Impacto de la Revista , Evaluación de Resultado en la Atención de Salud , Publicaciones Periódicas como Asunto , PubMed/estadística & datos numéricos , Publicaciones/estadística & datos numéricos , Publicaciones/tendencias , Investigación
19.
PLoS One ; 16(5): e0251094, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33945566

RESUMEN

The embedding of Medical Subject Headings (MeSH) terms has become a foundation for many downstream bioinformatics tasks. Recent studies employ different data sources, such as the corpus (in which each document is indexed by a set of MeSH terms), the MeSH term ontology, and the semantic predications between MeSH terms (extracted by SemMedDB), to learn their embeddings. While these data sources contribute to learning the MeSH term embeddings, current approaches fail to incorporate all of them in the learning process. The challenge is that the structured relationships between MeSH terms are different across the data sources, and there is no approach to fusing such complex data into the MeSH term embedding learning. In this paper, we study the problem of incorporating corpus, ontology, and semantic predications to learn the embeddings of MeSH terms. We propose a novel framework, Corpus, Ontology, and Semantic predications-based MeSH term embedding (COS), to generate high-quality MeSH term embeddings. COS converts the corpus, ontology, and semantic predications into MeSH term sequences, merges these sequences, and learns MeSH term embeddings using the sequences. Extensive experiments on different datasets show that COS outperforms various baseline embeddings and traditional non-embedding-based baselines.


Asunto(s)
Indización y Redacción de Resúmenes/métodos , Biología Computacional/métodos , Almacenamiento y Recuperación de la Información/métodos , Ontologías Biológicas , Aprendizaje/fisiología , Medical Subject Headings , Semántica , Translocación Genética/genética
20.
PLoS One ; 16(5): e0250994, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33951072

RESUMEN

China's carbon emission performance has significant regional heterogeneity. Identified the sources of carbon emission performance differences and the influence of various driving factors in China's eight economic regions accurately is the premise for realizing China's carbon emission reduction goals. Based on the provincial panel data from 2005 to 2017, the super-efficiency SBM model and Malmquist model are constructed in this paper to measure regional carbon emission performance's static and dynamic changes. After that, the Theil index is used to distinguish the impact of inter-regional and intra-regional differences on different regions' carbon emissions performance. Finally, by introducing the Tobit model, the effect of various driving factors on carbon emission performance differences is analyzed quantitatively. The results show that: (1) There are significant differences in different regions' carbon emission performance, but the overall carbon emission performance presents an upward fluctuation trend. Malmquist index decomposition results show substantial differences in technology progress index and technology efficiency index in different regions, leading to significant carbon emission performance differences. (2) Overall, inter-regional differences contribute the most to the overall carbon emission performance, up to more than 80%. Among them, the inter-regional and intra-regional differences in ERMRYR contributed significantly. (3) Through Tobit regression analysis, it is found that residents' living standards, urbanization level, ecological development degree, and industrial structure positively affect carbon emission performance. On the contrary, energy intensity presents an apparent negative correlation on carbon emission performance. Therefore, to improve the carbon emission performance, we should put forward targeted suggestions according to the characteristics of different regional development stages, regional carbon emission differences, and influencing driving factors.


Asunto(s)
Carbono/química , Indización y Redacción de Resúmenes/métodos , China , Desarrollo Económico , Industrias/métodos , Factores Socioeconómicos , Urbanización
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA