Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 5.023
Filtrar
1.
JAMA Netw Open ; 7(8): e2425373, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39093561

RESUMO

Importance: Artificial intelligence (AI) has permeated academia, especially OpenAI Chat Generative Pretrained Transformer (ChatGPT), a large language model. However, little has been reported on its use in medical research. Objective: To assess a chatbot's capability to generate and grade medical research abstracts. Design, Setting, and Participants: In this cross-sectional study, ChatGPT versions 3.5 and 4.0 (referred to as chatbot 1 and chatbot 2) were coached to generate 10 abstracts by providing background literature, prompts, analyzed data for each topic, and 10 previously presented, unassociated abstracts to serve as models. The study was conducted between August 2023 and February 2024 (including data analysis). Exposure: Abstract versions utilizing the same topic and data were written by a surgical trainee or a senior physician or generated by chatbot 1 and chatbot 2 for comparison. The 10 training abstracts were written by 8 surgical residents or fellows, edited by the same senior surgeon, at a high-volume hospital in the Southeastern US with an emphasis on outcomes-based research. Abstract comparison was then based on 10 abstracts written by 5 surgical trainees within the first 6 months of their research year, edited by the same senior author. Main Outcomes and Measures: The primary outcome measurements were the abstract grades using 10- and 20-point scales and ranks (first to fourth). Abstract versions by chatbot 1, chatbot 2, junior residents, and the senior author were compared and judged by blinded surgeon-reviewers as well as both chatbot models. Five academic attending surgeons from Denmark, the UK, and the US, with extensive experience in surgical organizations, research, and abstract evaluation served as reviewers. Results: Surgeon-reviewers were unable to differentiate between abstract versions. Each reviewer ranked an AI-generated version first at least once. Abstracts demonstrated no difference in their median (IQR) 10-point scores (resident, 7.0 [6.0-8.0]; senior author, 7.0 [6.0-8.0]; chatbot 1, 7.0 [6.0-8.0]; chatbot 2, 7.0 [6.0-8.0]; P = .61), 20-point scores (resident, 14.0 [12.0-7.0]; senior author, 15.0 [13.0-17.0]; chatbot 1, 14.0 [12.0-16.0]; chatbot 2, 14.0 [13.0-16.0]; P = .50), or rank (resident, 3.0 [1.0-4.0]; senior author, 2.0 [1.0-4.0]; chatbot 1, 3.0 [2.0-4.0]; chatbot 2, 2.0 [1.0-3.0]; P = .14). The abstract grades given by chatbot 1 were comparable to the surgeon-reviewers' grades. However, chatbot 2 graded more favorably than the surgeon-reviewers and chatbot 1. Median (IQR) chatbot 2-reviewer grades were higher than surgeon-reviewer grades of all 4 abstract versions (resident, 14.0 [12.0-17.0] vs 16.9 [16.0-17.5]; P = .02; senior author, 15.0 [13.0-17.0] vs 17.0 [16.5-18.0]; P = .03; chatbot 1, 14.0 [12.0-16.0] vs 17.8 [17.5-18.5]; P = .002; chatbot 2, 14.0 [13.0-16.0] vs 16.8 [14.5-18.0]; P = .04). When comparing the grades of the 2 chatbots, chatbot 2 gave higher median (IQR) grades for abstracts than chatbot 1 (resident, 14.0 [13.0-15.0] vs 16.9 [16.0-17.5]; P = .003; senior author, 13.5 [13.0-15.5] vs 17.0 [16.5-18.0]; P = .004; chatbot 1, 14.5 [13.0-15.0] vs 17.8 [17.5-18.5]; P = .003; chatbot 2, 14.0 [13.0-15.0] vs 16.8 [14.5-18.0]; P = .01). Conclusions and Relevance: In this cross-sectional study, trained chatbots generated convincing medical abstracts, undifferentiable from resident or senior author drafts. Chatbot 1 graded abstracts similarly to surgeon-reviewers, while chatbot 2 was less stringent. These findings may assist surgeon-scientists in successfully implementing AI in medical research.


Assuntos
Indexação e Redação de Resumos , Pesquisa Biomédica , Humanos , Estudos Transversais , Inteligência Artificial , Cirurgiões , Internato e Residência/estatística & dados numéricos , Cirurgia Geral/educação
2.
Proc Biol Sci ; 291(2027): 20241222, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39079668

RESUMO

In a growing digital landscape, enhancing the discoverability and resonance of scientific articles is essential. Here, we offer 10 recommendations to amplify the discoverability of studies in search engines and databases. Particularly, we argue that the strategic use and placement of key terms in the title, abstract and keyword sections can boost indexing and appeal. By surveying 230 journals in ecology and evolutionary biology, we found that current author guidelines may unintentionally limit article findability. Our survey of 5323 studies revealed that authors frequently exhaust abstract word limits-particularly those capped under 250 words. This suggests that current guidelines may be overly restrictive and not optimized to increase the dissemination and discoverability of digital publications. Additionally, 92% of studies used redundant keywords in the title or abstract, undermining optimal indexing in databases. We encourage adopting structured abstracts to maximize the incorporation of key terms in titles, abstracts and keywords. In addition, we encourage the relaxation of abstract and keyword limitations in journals with strict guidelines, and the inclusion of multilingual abstracts to broaden global accessibility. These recommendations to editors are designed to improve article engagement and facilitate evidence synthesis, thereby aligning scientific publishing with the modern needs of academic research.


Assuntos
Publicações Periódicas como Assunto , Ecologia/métodos , Indexação e Redação de Resumos , Editoração/normas
4.
JCO Clin Cancer Inform ; 8: e2400077, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38822755

RESUMO

PURPOSE: Artificial intelligence (AI) models can generate scientific abstracts that are difficult to distinguish from the work of human authors. The use of AI in scientific writing and performance of AI detection tools are poorly characterized. METHODS: We extracted text from published scientific abstracts from the ASCO 2021-2023 Annual Meetings. Likelihood of AI content was evaluated by three detectors: GPTZero, Originality.ai, and Sapling. Optimal thresholds for AI content detection were selected using 100 abstracts from before 2020 as negative controls, and 100 produced by OpenAI's GPT-3 and GPT-4 models as positive controls. Logistic regression was used to evaluate the association of predicted AI content with submission year and abstract characteristics, and adjusted odds ratios (aORs) were computed. RESULTS: Fifteen thousand five hundred and fifty-three abstracts met inclusion criteria. Across detectors, abstracts submitted in 2023 were significantly more likely to contain AI content than those in 2021 (aOR range from 1.79 with Originality to 2.37 with Sapling). Online-only publication and lack of clinical trial number were consistently associated with AI content. With optimal thresholds, 99.5%, 96%, and 97% of GPT-3/4-generated abstracts were identified by GPTZero, Originality, and Sapling respectively, and no sampled abstracts from before 2020 were classified as AI generated by the GPTZero and Originality detectors. Correlation between detectors was low to moderate, with Spearman correlation coefficient ranging from 0.14 for Originality and Sapling to 0.47 for Sapling and GPTZero. CONCLUSION: There is an increasing signal of AI content in ASCO abstracts, coinciding with the growing popularity of generative AI models.


Assuntos
Indexação e Redação de Resumos , Inteligência Artificial , Oncologia , Humanos , Oncologia/métodos
6.
Ann Plast Surg ; 93(1): 9-13, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38864431

RESUMO

ABSTRACT: Current literature fails to examine gender differences of authors presenting abstracts at national plastic surgery meetings. This study aims to assess the ratio of female to male abstract presentations at Plastic Surgery The Meeting (PSTM).The gender of all abstract presenters from PSTM between 2010 and 2020 was recorded. The primary outcome variable was authorship (first, second, or last). Trends in gender authorship were assessed via Cochran-Armitage trend tests. Chi-square was utilized to evaluate the association between author gender and presentation type and author gender and subspecialty.Between 2010 and 2020, 3653 abstracts were presented (oral = 3035, 83.1%; poster = 618, 16.9%) with 19,328 (5175 females, 26.8%) authors. Of these, 34.5%, 32.0%, and 18.6% of first, second, and last authors were female, respectively. The total proportion of female authors increased from 153 (20.4%) in 2010 to 1065 (33.1%) by 2020. The proportion of female first, second, and last authors increased from 21.8% to 44.8%, 24.0% to 45.3%, and 14.3% to 22.1%, respectively, and demonstrated a positive linear trend ( P < 0.001 ). The proportion of female first authors in aesthetics (23.9%) was lower than that for breast (41.8%), cranio/maxillofacial/head & neck (38.5%), practice management (43.3%), and research/technology (39.4%) ( P < 0.001 ).Our study demonstrates a significant increase in female representation as first, second, and last authors in abstract presentations at PSTM within the last decade, although the absolute prevalence remains low.


Assuntos
Autoria , Congressos como Assunto , Cirurgia Plástica , Cirurgia Plástica/tendências , Cirurgia Plástica/estatística & dados numéricos , Humanos , Feminino , Congressos como Assunto/estatística & dados numéricos , Masculino , Indexação e Redação de Resumos/estatística & dados numéricos , Indexação e Redação de Resumos/tendências , Editoração/estatística & dados numéricos , Editoração/tendências
8.
J Med Internet Res ; 26: e52001, 2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-38924787

RESUMO

BACKGROUND: Due to recent advances in artificial intelligence (AI), language model applications can generate logical text output that is difficult to distinguish from human writing. ChatGPT (OpenAI) and Bard (subsequently rebranded as "Gemini"; Google AI) were developed using distinct approaches, but little has been studied about the difference in their capability to generate the abstract. The use of AI to write scientific abstracts in the field of spine surgery is the center of much debate and controversy. OBJECTIVE: The objective of this study is to assess the reproducibility of the structured abstracts generated by ChatGPT and Bard compared to human-written abstracts in the field of spine surgery. METHODS: In total, 60 abstracts dealing with spine sections were randomly selected from 7 reputable journals and used as ChatGPT and Bard input statements to generate abstracts based on supplied paper titles. A total of 174 abstracts, divided into human-written abstracts, ChatGPT-generated abstracts, and Bard-generated abstracts, were evaluated for compliance with the structured format of journal guidelines and consistency of content. The likelihood of plagiarism and AI output was assessed using the iThenticate and ZeroGPT programs, respectively. A total of 8 reviewers in the spinal field evaluated 30 randomly extracted abstracts to determine whether they were produced by AI or human authors. RESULTS: The proportion of abstracts that met journal formatting guidelines was greater among ChatGPT abstracts (34/60, 56.6%) compared with those generated by Bard (6/54, 11.1%; P<.001). However, a higher proportion of Bard abstracts (49/54, 90.7%) had word counts that met journal guidelines compared with ChatGPT abstracts (30/60, 50%; P<.001). The similarity index was significantly lower among ChatGPT-generated abstracts (20.7%) compared with Bard-generated abstracts (32.1%; P<.001). The AI-detection program predicted that 21.7% (13/60) of the human group, 63.3% (38/60) of the ChatGPT group, and 87% (47/54) of the Bard group were possibly generated by AI, with an area under the curve value of 0.863 (P<.001). The mean detection rate by human reviewers was 53.8% (SD 11.2%), achieving a sensitivity of 56.3% and a specificity of 48.4%. A total of 56.3% (63/112) of the actual human-written abstracts and 55.9% (62/128) of AI-generated abstracts were recognized as human-written and AI-generated by human reviewers, respectively. CONCLUSIONS: Both ChatGPT and Bard can be used to help write abstracts, but most AI-generated abstracts are currently considered unethical due to high plagiarism and AI-detection rates. ChatGPT-generated abstracts appear to be superior to Bard-generated abstracts in meeting journal formatting guidelines. Because humans are unable to accurately distinguish abstracts written by humans from those produced by AI programs, it is crucial to exercise special caution and examine the ethical boundaries of using AI programs, including ChatGPT and Bard.


Assuntos
Indexação e Redação de Resumos , Coluna Vertebral , Humanos , Coluna Vertebral/cirurgia , Indexação e Redação de Resumos/normas , Indexação e Redação de Resumos/métodos , Reprodutibilidade dos Testes , Inteligência Artificial , Redação/normas
9.
Res Social Adm Pharm ; 20(9): 911-917, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38902136

RESUMO

BACKGROUND: The Medical Subject Headings (MeSH) thesaurus is the controlled vocabulary used to index articles in MEDLINE. MeSH were mainly manually selected until June 2022 when an automated algorithm, the Medical Text Indexer (MTI) automated was fully implemented. A selection of automated indexed articles is then reviewed (curated) by human indexers to ensure the quality of the process. OBJECTIVE: To describe the association of MEDLINE indexing methods (i.e., manual, automated, and automated + curated) on the MeSH assignment in pharmacy practice journals compared with medical journals. METHODS: Original research articles published between 2016 and 2023 in two groups of journals (i.e., the Big-five general medicine and three pharmacy practice journals) were selected from PubMed using journal-specific search strategies. Metadata of the articles, including MeSH terms and indexing method, was extracted. A list of pharmacy-specific MeSH terms had been compiled from previously published studies, and their presence in pharmacy practice journal records was investigated. Using bivariate and multivariate analyses, as well as effect size measures, the number of MeSH per article was compared between journal groups, geographic origin of the journal, and indexing method. RESULTS: A total of 8479 original research articles was retrieved: 6254 from the medical journals and 2225 from pharmacy practice journals. The number of articles indexed by the various methods was disproportionate; 77.8 % of medical and 50.5 % of pharmacy manually indexed. Among those indexed using the automated system, 51.1 % medical and 10.9 % pharmacy practice articles were then curated to ensure the indexing quality. Number of MeSH per article varied among the three indexing methods for medical and pharmacy journals, with 15.5 vs. 13.0 in manually indexed, 9.4 vs. 7.4 in automated indexed, and 12.1 vs. 7.8 in automated and then curated, respectively. Multivariate analysis showed significant effect of indexing method and journal group in the number of MeSH attributed, but not the geographical origin of the journal. CONCLUSIONS: Articles indexed using automated MTI have less MeSH than manually indexed articles. Articles published in pharmacy practice journals were indexed with fewer number of MeSH compared with general medical journal articles regardless of the indexing method used.


Assuntos
Indexação e Redação de Resumos , Medical Subject Headings , Publicações Periódicas como Assunto , Humanos , MEDLINE , Farmácia , Automação
10.
Med Ref Serv Q ; 43(2): 106-118, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38722606

RESUMO

The objective of this study was to examine the accuracy of indexing for "Appalachian Region"[Mesh]. Researchers performed a search in PubMed for articles published in 2019 using "Appalachian Region"[Mesh] or "Appalachia" or "Appalachian" in the title or abstract. Only 17.88% of the articles retrieved by the search were about Appalachia according to the ARC definition. Most articles retrieved appeared because they were indexed with state terms that were included as part of the mesh term. Database indexing and searching transparency is of growing importance as indexers rely increasingly on automated systems to catalog information and publications.


Assuntos
Indexação e Redação de Resumos , Região dos Apalaches , Indexação e Redação de Resumos/métodos , Humanos , Medical Subject Headings , PubMed , Bibliometria
11.
Ann Intern Med ; 177(6): 791-799, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38768452

RESUMO

BACKGROUND: Systematic reviews are performed manually despite the exponential growth of scientific literature. OBJECTIVE: To investigate the sensitivity and specificity of GPT-3.5 Turbo, from OpenAI, as a single reviewer, for title and abstract screening in systematic reviews. DESIGN: Diagnostic test accuracy study. SETTING: Unannotated bibliographic databases from 5 systematic reviews representing 22 665 citations. PARTICIPANTS: None. MEASUREMENTS: A generic prompt framework to instruct GPT to perform title and abstract screening was designed. The output of the model was compared with decisions from authors under 2 rules. The first rule balanced sensitivity and specificity, for example, to act as a second reviewer. The second rule optimized sensitivity, for example, to reduce the number of citations to be manually screened. RESULTS: Under the balanced rule, sensitivities ranged from 81.1% to 96.5% and specificities ranged from 25.8% to 80.4%. Across all reviews, GPT identified 7 of 708 citations (1%) missed by humans that should have been included after full-text screening at the cost of 10 279 of 22 665 false-positive recommendations (45.3%) that would require reconciliation during the screening process. Under the sensitive rule, sensitivities ranged from 94.6% to 99.8% and specificities ranged from 2.2% to 46.6%. Limiting manual screening to citations not ruled out by GPT could reduce the number of citations to screen from 127 of 6334 (2%) to 1851 of 4077 (45.4%), at the cost of missing from 0 to 1 of 26 citations (3.8%) at the full-text level. LIMITATIONS: Time needed to fine-tune prompt. Retrospective nature of the study, convenient sample of 5 systematic reviews, and GPT performance sensitive to prompt development and time. CONCLUSION: The GPT-3.5 Turbo model may be used as a second reviewer for title and abstract screening, at the cost of additional work to reconcile added false positives. It also showed potential to reduce the number of citations before screening by humans, at the cost of missing some citations at the full-text level. PRIMARY FUNDING SOURCE: None.


Assuntos
Metanálise como Assunto , Sensibilidade e Especificidade , Humanos , Indexação e Redação de Resumos , Literatura de Revisão como Assunto , Revisões Sistemáticas como Assunto
12.
Am J Obstet Gynecol ; 231(2): 276.e1-276.e10, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38710267

RESUMO

BACKGROUND: ChatGPT, a publicly available artificial intelligence large language model, has allowed for sophisticated artificial intelligence technology on demand. Indeed, use of ChatGPT has already begun to make its way into medical research. However, the medical community has yet to understand the capabilities and ethical considerations of artificial intelligence within this context, and unknowns exist regarding ChatGPT's writing abilities, accuracy, and implications for authorship. OBJECTIVE: We hypothesize that human reviewers and artificial intelligence detection software differ in their ability to correctly identify original published abstracts and artificial intelligence-written abstracts in the subjects of Gynecology and Urogynecology. We also suspect that concrete differences in writing errors, readability, and perceived writing quality exist between original and artificial intelligence-generated text. STUDY DESIGN: Twenty-five articles published in high-impact medical journals and a collection of Gynecology and Urogynecology journals were selected. ChatGPT was prompted to write 25 corresponding artificial intelligence-generated abstracts, providing the abstract title, journal-dictated abstract requirements, and select original results. The original and artificial intelligence-generated abstracts were reviewed by blinded Gynecology and Urogynecology faculty and fellows to identify the writing as original or artificial intelligence-generated. All abstracts were analyzed by publicly available artificial intelligence detection software GPTZero, Originality, and Copyleaks, and were assessed for writing errors and quality by artificial intelligence writing assistant Grammarly. RESULTS: A total of 157 reviews of 25 original and 25 artificial intelligence-generated abstracts were conducted by 26 faculty and 4 fellows; 57% of original abstracts and 42.3% of artificial intelligence-generated abstracts were correctly identified, yielding an average accuracy of 49.7% across all abstracts. All 3 artificial intelligence detectors rated the original abstracts as less likely to be artificial intelligence-written than the ChatGPT-generated abstracts (GPTZero, 5.8% vs 73.3%; P<.001; Originality, 10.9% vs 98.1%; P<.001; Copyleaks, 18.6% vs 58.2%; P<.001). The performance of the 3 artificial intelligence detection software differed when analyzing all abstracts (P=.03), original abstracts (P<.001), and artificial intelligence-generated abstracts (P<.001). Grammarly text analysis identified more writing issues and correctness errors in original than in artificial intelligence abstracts, including lower Grammarly score reflective of poorer writing quality (82.3 vs 88.1; P=.006), more total writing issues (19.2 vs 12.8; P<.001), critical issues (5.4 vs 1.3; P<.001), confusing words (0.8 vs 0.1; P=.006), misspelled words (1.7 vs 0.6; P=.02), incorrect determiner use (1.2 vs 0.2; P=.002), and comma misuse (0.3 vs 0.0; P=.005). CONCLUSION: Human reviewers are unable to detect the subtle differences between human and ChatGPT-generated scientific writing because of artificial intelligence's ability to generate tremendously realistic text. Artificial intelligence detection software improves the identification of artificial intelligence-generated writing, but still lacks complete accuracy and requires programmatic improvements to achieve optimal detection. Given that reviewers and editors may be unable to reliably detect artificial intelligence-generated texts, clear guidelines for reporting artificial intelligence use by authors and implementing artificial intelligence detection software in the review process will need to be established as artificial intelligence chatbots gain more widespread use.


Assuntos
Inteligência Artificial , Ginecologia , Urologia , Humanos , Indexação e Redação de Resumos , Publicações Periódicas como Assunto , Software , Redação , Autoria
13.
PLoS One ; 19(5): e0302108, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38696383

RESUMO

OBJECTIVE: To assess the reporting quality of published RCT abstracts regarding patients with endometriosis pelvic pain and investigate the prevalence and characteristics of spin in these abstracts. METHODS: PubMed and Scopus were searched for RCT abstracts addressing endometriosis pelvic pain published from January 1st, 2010 to December 1st, 2023.The reporting quality of RCT abstracts was assessed using the CONSORT statement for abstracts. Additionally, spin was evaluated in the results and conclusions section of the abstracts, defined as the misleading reporting of study findings to emphasize the perceived benefits of an intervention or to confound readers from statistically non-significant results. Assessing factors affecting the reporting quality and spin existence, linear and logistic regression was used, respectively. RESULTS: A total of 47 RCT abstracts were included. Out of 16 checklist items, only three items including objective, intervention and conclusions were sufficiently reported in the most abstracts (more than 95%), and none of the abstracts presented precise data as required by the CONSORT-A guidelines. In the reporting quality of material and method section, trial design, type of randomization, the generation of random allocation sequences, the allocation concealment and blinding were most items identified that were suboptimal. The total score for the quality varied between 5 and 15 (mean: 9.59, SD: 3.03, median: 9, IQR: 5). Word count (beta = 0.015, p-value = 0.005) and publishing in open-accessed journals (beta = 2.023, p-value = 0.023) were the significant factors that affecting the reporting quality. Evaluating spin within each included paper, we found that 18 (51.43%) papers had statistically non-significant results. From these studies, 12 (66.66%) had spin in both results and conclusion sections. Furthermore, the spin intensity increased during 2010-2023 and 38.29% of abstracts had spin in both results and conclusion sections. CONCLUSION: Overall poor adherence to CONSORT-A was observed, with spin detected in several RCTs featuring non-significant primary endpoints in obstetrics and gynecology literature.


Assuntos
Endometriose , Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Feminino , Ensaios Clínicos Controlados Aleatórios como Assunto/normas , Projetos de Pesquisa/normas , Dor Pélvica , Indexação e Redação de Resumos/normas
14.
Artigo em Inglês | MEDLINE | ID: mdl-38775596

RESUMO

BACKGROUND: Increasing use of "hype" language (eg, language overstating research impact) has been documented in the scientific community. Evaluating language in abstracts is important because readers may use abstracts to extrapolate findings to entire publications. Our purpose was to assess the frequency of hype language within orthopaedic surgery. METHODS: One hundred thirty-nine hype adjectives were previously identified using a linguistics approach. All publicly available abstracts from 18 orthopaedic surgery journals between 1985 and 2020 were obtained, and hype adjectives were tabulated. Change in frequency of these adjectives was calculated. RESULTS: A total of 112,916 abstracts were identified. 67.0% (948/1414) of abstracts in 1985 contained hype adjectives, compared with 92.5% (5287/5714) in 2020. The average number of hype adjectives per abstract increased by 136% (1.1 to 2.6). Of the 139 adjectives, 87 (62.5%) increased in frequency and 40 (28.7%) decreased in frequency while 12 (9%) were not used. The hype adjectives with the largest absolute increases in frequency were quality (+324wpm), significant (+320wpm), systematic (+246wpm), top (+239wpm), and international (+201wpm). The five hype adjectives with the largest relative increases in frequency were novel (+10500%), international (+2850%), urgent (+2600%), robust (+2300%), and emerging (+1400%). CONCLUSION: Promotional language is increasing in orthopaedic surgery abstracts. Authors, editors, and reviewers should seek to minimize the usage of nonobjective language.


Assuntos
Idioma , Ortopedia , Humanos , Indexação e Redação de Resumos , Publicações Periódicas como Assunto , Procedimentos Ortopédicos
15.
BMC Med Res Methodol ; 24(1): 108, 2024 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-38724903

RESUMO

OBJECTIVE: Systematic literature reviews (SLRs) are critical for life-science research. However, the manual selection and retrieval of relevant publications can be a time-consuming process. This study aims to (1) develop two disease-specific annotated corpora, one for human papillomavirus (HPV) associated diseases and the other for pneumococcal-associated pediatric diseases (PAPD), and (2) optimize machine- and deep-learning models to facilitate automation of the SLR abstract screening. METHODS: This study constructed two disease-specific SLR screening corpora for HPV and PAPD, which contained citation metadata and corresponding abstracts. Performance was evaluated using precision, recall, accuracy, and F1-score of multiple combinations of machine- and deep-learning algorithms and features such as keywords and MeSH terms. RESULTS AND CONCLUSIONS: The HPV corpus contained 1697 entries, with 538 relevant and 1159 irrelevant articles. The PAPD corpus included 2865 entries, with 711 relevant and 2154 irrelevant articles. Adding additional features beyond title and abstract improved the performance (measured in Accuracy) of machine learning models by 3% for HPV corpus and 2% for PAPD corpus. Transformer-based deep learning models that consistently outperformed conventional machine learning algorithms, highlighting the strength of domain-specific pre-trained language models for SLR abstract screening. This study provides a foundation for the development of more intelligent SLR systems.


Assuntos
Aprendizado de Máquina , Infecções por Papillomavirus , Humanos , Infecções por Papillomavirus/diagnóstico , Economia Médica , Algoritmos , Avaliação de Resultados em Cuidados de Saúde/métodos , Aprendizado Profundo , Indexação e Redação de Resumos/métodos
16.
Air Med J ; 43(3): 216-220, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38821701

RESUMO

OBJECTIVE: Pediatric-neonatal transport research projects are presented at the American Academy of Pediatrics (AAP) Section on Transport Medicine (SOTM) scientific abstract program annually. Journal publication increases the impact of these projects. Our objectives were to determine the publication rate of transport abstracts and to identify factors predictive of publication success. METHODS: We reviewed all AAP SOTM abstracts accepted for presentation from 2011 to 2020 and assessed presentation format (oral/platform vs. poster), authors' professional degree (physician vs. nonphysician), and first author's trainee status. We searched PubMed, Ovid, and ResearchGate for publications by abstract title and authors and then compared published versus unpublished abstracts. Categorical variables were expressed as proportions and compared using the chi-square test or the Fisher exact test, whereas continuous variables were summarized using medians and interquartile ranges (IQRs) and compared using the Student t-test or the Kruskal-Wallis test as appropriate. A linear probability model was performed. RESULTS: Of 194 presented abstracts, 67 (34.5%) were published. The publication rate was significantly higher for oral/platform versus poster abstracts (P < .01), if the abstract was an oral/platform (probability increase by 19.5%, P < .01), and if the first author was a trainee (probability increase by 25.6%, p < 0.05). The constant was estimated as 24.9% probability of publication. Hence, if the first author was a physician, a trainee, and had an oral/platform presentation, there was an 85.8% chance of being published. The median (IQR) time to publication was 2 years (IQR: 2-4 years), with articles published the longest having the most citations. Articles were published in 27 different journals, with nearly half (33/67, 49.3%) being published in 3 journals. CONCLUSION: AAP SOTM abstracts have a 34.5% publication rate over the past 10 years, which is consistent with other medical specialties. Oral abstracts, physician first authors, and trainee first authors had a significantly higher success rate. Special emphasis should be placed nationally on supporting nonphysician transport professionals to publish their work.


Assuntos
Pediatria , Humanos , Transporte de Pacientes , Indexação e Redação de Resumos/estatística & dados numéricos , Editoração/estatística & dados numéricos
18.
JAMA Pediatr ; 178(6): 625-626, 2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38683595

RESUMO

This survey study assesses the ability of health care professionals to discern whether abstracts were written by investigators or by an artificial intelligence (AI) chatbot.


Assuntos
Pessoal de Saúde , Humanos , Indexação e Redação de Resumos , Inteligência Artificial , Pesquisa Biomédica
20.
Int J Gynecol Cancer ; 34(5): 669-674, 2024 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-38627032

RESUMO

OBJECTIVE: To determine if reviewer experience impacts the ability to discriminate between human-written and ChatGPT-written abstracts. METHODS: Thirty reviewers (10 seniors, 10 juniors, and 10 residents) were asked to differentiate between 10 ChatGPT-written and 10 human-written (fabricated) abstracts. For the study, 10 gynecologic oncology abstracts were fabricated by the authors. For each human-written abstract we generated a ChatGPT matching abstract by using the same title and the fabricated results of each of the human generated abstracts. A web-based questionnaire was used to gather demographic data and to record the reviewers' evaluation of the 20 abstracts. Comparative statistics and multivariable regression were used to identify factors associated with a higher correct identification rate. RESULTS: The 30 reviewers discriminated 20 abstracts, giving a total of 600 abstract evaluations. The reviewers were able to correctly identify 300/600 (50%) of the abstracts: 139/300 (46.3%) of the ChatGPT-generated abstracts and 161/300 (53.7%) of the human-written abstracts (p=0.07). Human-written abstracts had a higher rate of correct identification (median (IQR) 56.7% (49.2-64.1%) vs 45.0% (43.2-48.3%), p=0.023). Senior reviewers had a higher correct identification rate (60%) than junior reviewers and residents (45% each; p=0.043 and p=0.002, respectively). In a linear regression model including the experience level of the reviewers, familiarity with artificial intelligence (AI) and the country in which the majority of medical training was achieved (English speaking vs non-English speaking), the experience of the reviewer (ß=10.2 (95% CI 1.8 to 18.7)) and familiarity with AI (ß=7.78 (95% CI 0.6 to 15.0)) were independently associated with the correct identification rate (p=0.019 and p=0.035, respectively). In a correlation analysis the number of publications by the reviewer was positively correlated with the correct identification rate (r28)=0.61, p<0.001. CONCLUSION: A total of 46.3% of abstracts written by ChatGPT were detected by reviewers. The correct identification rate increased with reviewer and publication experience.


Assuntos
Indexação e Redação de Resumos , Humanos , Indexação e Redação de Resumos/normas , Feminino , Revisão da Pesquisa por Pares , Redação/normas , Ginecologia , Inquéritos e Questionários , Editoração/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...