Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 3.386
Filter
1.
J Med Libr Assoc ; 112(2): 133-139, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-39119157

ABSTRACT

Background: Libraries provide access to databases with auto-cite features embedded into the services; however, the accuracy of these auto-cite buttons is not very high in humanities and social sciences databases. Case Presentation: This case compares two biomedical databases, Ovid MEDLINE and PubMed, to see if either is reliable enough to confidently recommend to students for use when writing papers. A total of 60 citations were assessed, 30 citations from each citation generator, based on the top 30 articles in PubMed from 2010 to 2020. Conclusions: Error rates were higher in Ovid MEDLINE than PubMed but neither database platform provided error-free references. The auto-cite tools were not reliable. Zero of the 60 citations examined were 100% correct. Librarians should continue to advise students not to rely solely upon citation generators in these biomedical databases.


Subject(s)
MEDLINE , Humans , MEDLINE/statistics & numerical data , PubMed , Bibliometrics , Information Storage and Retrieval/methods , Information Storage and Retrieval/statistics & numerical data
2.
Database (Oxford) ; 20242024 Aug 09.
Article in English | MEDLINE | ID: mdl-39126204

ABSTRACT

The automatic recognition of biomedical relationships is an important step in the semantic understanding of the information contained in the unstructured text of the published literature. The BioRED track at BioCreative VIII aimed to foster the development of such methods by providing the participants the BioRED-BC8 corpus, a collection of 1000 PubMed documents manually curated for diseases, gene/proteins, chemicals, cell lines, gene variants, and species, as well as pairwise relationships between them which are disease-gene, chemical-gene, disease-variant, gene-gene, chemical-disease, chemical-chemical, chemical-variant, and variant-variant. Furthermore, relationships are categorized into the following semantic categories: positive correlation, negative correlation, binding, conversion, drug interaction, comparison, cotreatment, and association. Unlike most of the previous publicly available corpora, all relationships are expressed at the document level as opposed to the sentence level, and as such, the entities are normalized to the corresponding concept identifiers of the standardized vocabularies, namely, diseases and chemicals are normalized to MeSH, genes (and proteins) to National Center for Biotechnology Information (NCBI) Gene, species to NCBI Taxonomy, cell lines to Cellosaurus, and gene/protein variants to Single Nucleotide Polymorphism Database. Finally, each annotated relationship is categorized as 'novel' depending on whether it is a novel finding or experimental verification in the publication it is expressed in. This distinction helps differentiate novel findings from other relationships in the same text that provides known facts and/or background knowledge. The BioRED-BC8 corpus uses the previous BioRED corpus of 600 PubMed articles as the training dataset and includes a set of newly published 400 articles to serve as the test data for the challenge. All test articles were manually annotated for the BioCreative VIII challenge by expert biocurators at the National Library of Medicine, using the original annotation guidelines, where each article is doubly annotated in a three-round annotation process until full agreement is reached between all curators. This manuscript details the characteristics of the BioRED-BC8 corpus as a critical resource for biomedical named entity recognition and relation extraction. Using this new resource, we have demonstrated advancements in biomedical text-mining algorithm development. Database URL: https://codalab.lisn.upsaclay.fr/competitions/16381.


Subject(s)
Data Curation , Humans , Data Curation/methods , Data Mining/methods , Semantics , PubMed
3.
Syst Rev ; 13(1): 174, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38978132

ABSTRACT

BACKGROUND: The demand for high-quality systematic literature reviews (SRs) for evidence-based medical decision-making is growing. SRs are costly and require the scarce resource of highly skilled reviewers. Automation technology has been proposed to save workload and expedite the SR workflow. We aimed to provide a comprehensive overview of SR automation studies indexed in PubMed, focusing on the applicability of these technologies in real world practice. METHODS: In November 2022, we extracted, combined, and ran an integrated PubMed search for SRs on SR automation. Full-text English peer-reviewed articles were included if they reported studies on SR automation methods (SSAM), or automated SRs (ASR). Bibliographic analyses and knowledge-discovery studies were excluded. Record screening was performed by single reviewers, and the selection of full text papers was performed in duplicate. We summarized the publication details, automated review stages, automation goals, applied tools, data sources, methods, results, and Google Scholar citations of SR automation studies. RESULTS: From 5321 records screened by title and abstract, we included 123 full text articles, of which 108 were SSAM and 15 ASR. Automation was applied for search (19/123, 15.4%), record screening (89/123, 72.4%), full-text selection (6/123, 4.9%), data extraction (13/123, 10.6%), risk of bias assessment (9/123, 7.3%), evidence synthesis (2/123, 1.6%), assessment of evidence quality (2/123, 1.6%), and reporting (2/123, 1.6%). Multiple SR stages were automated by 11 (8.9%) studies. The performance of automated record screening varied largely across SR topics. In published ASR, we found examples of automated search, record screening, full-text selection, and data extraction. In some ASRs, automation fully complemented manual reviews to increase sensitivity rather than to save workload. Reporting of automation details was often incomplete in ASRs. CONCLUSIONS: Automation techniques are being developed for all SR stages, but with limited real-world adoption. Most SR automation tools target single SR stages, with modest time savings for the entire SR process and varying sensitivity and specificity across studies. Therefore, the real-world benefits of SR automation remain uncertain. Standardizing the terminology, reporting, and metrics of study reports could enhance the adoption of SR automation techniques in real-world practice.


Subject(s)
Automation , PubMed , Systematic Reviews as Topic , Humans
4.
PeerJ ; 12: e17470, 2024.
Article in English | MEDLINE | ID: mdl-38948230

ABSTRACT

TIN-X (Target Importance and Novelty eXplorer) is an interactive visualization tool for illuminating associations between diseases and potential drug targets and is publicly available at newdrugtargets.org. TIN-X uses natural language processing to identify disease and protein mentions within PubMed content using previously published tools for named entity recognition (NER) of gene/protein and disease names. Target data is obtained from the Target Central Resource Database (TCRD). Two important metrics, novelty and importance, are computed from this data and when plotted as log(importance) vs. log(novelty), aid the user in visually exploring the novelty of drug targets and their associated importance to diseases. TIN-X Version 3.0 has been significantly improved with an expanded dataset, modernized architecture including a REST API, and an improved user interface (UI). The dataset has been expanded to include not only PubMed publication titles and abstracts, but also full-text articles when available. This results in approximately 9-fold more target/disease associations compared to previous versions of TIN-X. Additionally, the TIN-X database containing this expanded dataset is now hosted in the cloud via Amazon RDS. Recent enhancements to the UI focuses on making it more intuitive for users to find diseases or drug targets of interest while providing a new, sortable table-view mode to accompany the existing plot-view mode. UI improvements also help the user browse the associated PubMed publications to explore and understand the basis of TIN-X's predicted association between a specific disease and a target of interest. While implementing these upgrades, computational resources are balanced between the webserver and the user's web browser to achieve adequate performance while accommodating the expanded dataset. Together, these advances aim to extend the duration that users can benefit from TIN-X while providing both an expanded dataset and new features that researchers can use to better illuminate understudied proteins.


Subject(s)
User-Computer Interface , Humans , Natural Language Processing , PubMed , Software
5.
South Med J ; 117(7): 358-363, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38959961

ABSTRACT

OBJECTIVES: Periodically, medical publications are retracted. The reasons vary from minor situations, such as author attributions, which do not undermine the validity of the data or the analysis in the article, to serious reasons, such as fraud. Understanding the reasons for retraction can provide important information for clinicians, educators, researchers, journals, and editorial boards. METHODS: The PubMed database was searched using the term "COVID-19" (coronavirus disease 2019) and the term limitation "retracted publication." The characteristics of the journals with retracted articles, the types of article, and the reasons for retraction were analyzed. RESULTS: This search recovered 196 articles that had been retracted. These retractions were published in 179 different journals; 14 journals had >1 retracted article. The mean impact factor of these journals was 8.4, with a range of 0.32-168.9. The most frequent reasons for retractions were duplicate publication, concerns about data validity and analysis, concerns about peer review, author request, and the lack of permission or ethical violation. There were significant differences between the types of article and the reasons for retraction but no consistent pattern. A more detailed analysis of two particular retractions demonstrates the complexity and the effort required to make decisions about article retractions. CONCLUSIONS: The retraction of published articles presents a significant challenge to journals, editorial boards, peer reviewers, and authors. This process has the potential to provide important benefits; it also has the potential to undermine confidence in both research and the editorial process.


Subject(s)
COVID-19 , Periodicals as Topic , PubMed , Retraction of Publication as Topic , Humans , COVID-19/epidemiology , Periodicals as Topic/statistics & numerical data , SARS-CoV-2 , Journal Impact Factor , Scientific Misconduct
6.
J Biomed Inform ; 156: 104674, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38871012

ABSTRACT

OBJECTIVE: Biomedical Named Entity Recognition (bio NER) is the task of recognizing named entities in biomedical texts. This paper introduces a new model that addresses bio NER by considering additional external contexts. Different from prior methods that mainly use original input sequences for sequence labeling, the model takes into account additional contexts to enhance the representation of entities in the original sequences, since additional contexts can provide enhanced information for the concept explanation of biomedical entities. METHODS: To exploit an additional context, given an original input sequence, the model first retrieves the relevant sentences from PubMed and then ranks the retrieved sentences to form the contexts. It next combines the context with the original input sequence to form a new enhanced sequence. The original and new enhanced sequences are fed into PubMedBERT for learning feature representation. To obtain more fine-grained features, the model stacks a BiLSTM layer on top of PubMedBERT. The final named entity label prediction is done by using a CRF layer. The model is jointly trained in an end-to-end manner to take advantage of the additional context for NER of the original sequence. RESULTS: Experimental results on six biomedical datasets show that the proposed model achieves promising performance compared to strong baselines and confirms the contribution of additional contexts for bio NER. CONCLUSION: The promising results confirm three important points. First, the additional context from PubMed helps to improve the quality of the recognition of biomedical entities. Second, PubMed is more appropriate than the Google search engine for providing relevant information of bio NER. Finally, more relevant sentences from the context are more beneficial than irrelevant ones to provide enhanced information for the original input sequences. The model is flexible to integrate any additional context types for the NER task.


Subject(s)
Natural Language Processing , PubMed , Humans , Algorithms , Data Mining/methods , Semantics , Medical Informatics/methods
7.
J Med Libr Assoc ; 112(1): 33-41, 2024 Jan 16.
Article in English | MEDLINE | ID: mdl-38911530

ABSTRACT

Objective: With exponential growth in the publication of interprofessional education (IPE) research studies, it has become more difficult to find relevant literature and stay abreast of the latest research. To address this gap, we developed, evaluated, and validated search strategies for IPE studies in PubMed, to improve future access to and synthesis of IPE research. These search strategies, or search hedges, provide comprehensive, validated sets of search terms for IPE publications. Methods: The search strategies were created for PubMed using relative recall methodology. The research methods followed the guidance of previous search hedge and search filter validation studies in creating a gold standard set of relevant references using systematic reviews, having expert searchers identify and test search terms, and using relative recall calculations to validate the searches' performance against the gold standard set. Results: The three recommended search hedges for IPE studies presented had recall of 71.5%, 82.7%, and 95.1%; the first more focused for efficient literature searching, the last with high recall for comprehensive literature searching, and the remaining hedge as a middle ground between the other two options. Conclusion: These validated search hedges can be used in PubMed to expedite finding relevant scholarships, staying up to date with IPE research, and conducting literature reviews and evidence syntheses.


Subject(s)
Information Storage and Retrieval , Interprofessional Education , PubMed , Humans , Information Storage and Retrieval/methods , Interprofessional Education/methods
8.
J Med Libr Assoc ; 112(1): 22-32, 2024 Jan 16.
Article in English | MEDLINE | ID: mdl-38911528

ABSTRACT

Objective: There is a need for additional comprehensive and validated filters to find relevant references more efficiently in the growing body of research on immigrant populations. Our goal was to create reliable search filters that direct librarians and researchers to pertinent studies indexed in PubMed about health topics specific to immigrant populations. Methods: We applied a systematic and multi-step process that combined information from expert input, authoritative sources, automation, and manual review of sources. We established a focused scope and eligibility criteria, which we used to create the development and validation sets. We formed a term ranking system that resulted in the creation of two filters: an immigrant-specific and an immigrant-sensitive search filter. Results: When tested against the validation set, the specific filter sensitivity was 88.09%, specificity 97.26%, precision 97.88%, and the NNR 1.02. The sensitive filter sensitivity was 97.76%when tested against the development set. The sensitive filter had a sensitivity of 97.14%, specificity of 82.05%, precision of 88.59%, accuracy of 90.94%, and NNR [See Table 1] of 1.13 when tested against the validation set. Conclusion: We accomplished our goal of developing PubMed search filters to help researchers retrieve studies about immigrants. The specific and sensitive PubMed search filters give information professionals and researchers options to maximize the specificity and precision or increase the sensitivity of their search for relevant studies in PubMed. Both search filters generated strong performance measurements and can be used as-is, to capture a subset of immigrant-related literature, or adapted and revised to fit the unique research needs of specific project teams (e.g. remove US-centric language, add location-specific terminology, or expand the search strategy to include terms for the topic/s being investigated in the immigrant population identified by the filter). There is also a potential for teams to employ the search filter development process described here for their own topics and use.


Subject(s)
Emigrants and Immigrants , PubMed , Emigrants and Immigrants/statistics & numerical data , Humans , Information Storage and Retrieval/methods , Information Storage and Retrieval/standards , Search Engine/standards
9.
BMC Med Res Methodol ; 24(1): 139, 2024 Jun 25.
Article in English | MEDLINE | ID: mdl-38918736

ABSTRACT

BACKGROUND: Large language models (LLMs) that can efficiently screen and identify studies meeting specific criteria would streamline literature reviews. Additionally, those capable of extracting data from publications would enhance knowledge discovery by reducing the burden on human reviewers. METHODS: We created an automated pipeline utilizing OpenAI GPT-4 32 K API version "2023-05-15" to evaluate the accuracy of the LLM GPT-4 responses to queries about published papers on HIV drug resistance (HIVDR) with and without an instruction sheet. The instruction sheet contained specialized knowledge designed to assist a person trying to answer questions about an HIVDR paper. We designed 60 questions pertaining to HIVDR and created markdown versions of 60 published HIVDR papers in PubMed. We presented the 60 papers to GPT-4 in four configurations: (1) all 60 questions simultaneously; (2) all 60 questions simultaneously with the instruction sheet; (3) each of the 60 questions individually; and (4) each of the 60 questions individually with the instruction sheet. RESULTS: GPT-4 achieved a mean accuracy of 86.9% - 24.0% higher than when the answers to papers were permuted. The overall recall and precision were 72.5% and 87.4%, respectively. The standard deviation of three replicates for the 60 questions ranged from 0 to 5.3% with a median of 1.2%. The instruction sheet did not significantly increase GPT-4's accuracy, recall, or precision. GPT-4 was more likely to provide false positive answers when the 60 questions were submitted individually compared to when they were submitted together. CONCLUSIONS: GPT-4 reproducibly answered 3600 questions about 60 papers on HIVDR with moderately high accuracy, recall, and precision. The instruction sheet's failure to improve these metrics suggests that more sophisticated approaches are necessary. Either enhanced prompt engineering or finetuning an open-source model could further improve an LLM's ability to answer questions about highly specialized HIVDR papers.


Subject(s)
HIV Infections , Humans , Reproducibility of Results , HIV Infections/drug therapy , PubMed , Publications/statistics & numerical data , Publications/standards , Information Storage and Retrieval/methods , Information Storage and Retrieval/standards , Software
10.
Int J Med Inform ; 189: 105500, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38815316

ABSTRACT

OBJECTIVE: The rapid expansion of the biomedical literature challenges traditional review methods, especially during outbreaks of emerging infectious diseases when quick action is critical. Our study aims to explore the potential of ChatGPT to automate the biomedical literature review for rapid drug discovery. MATERIALS AND METHODS: We introduce a novel automated pipeline helping to identify drugs for a given virus in response to a potential future global health threat. Our approach can be used to select PubMed articles identifying a drug target for the given virus. We tested our approach on two known pathogens: SARS-CoV-2, where the literature is vast, and Nipah, where the literature is sparse. Specifically, a panel of three experts reviewed a set of PubMed articles and labeled them as either describing a drug target for the given virus or not. The same task was given to the automated pipeline and its performance was based on whether it labeled the articles similarly to the human experts. We applied a number of prompt engineering techniques to improve the performance of ChatGPT. RESULTS: Our best configuration used GPT-4 by OpenAI and achieved an out-of-sample validation performance with accuracy/F1-score/sensitivity/specificity of 92.87%/88.43%/83.38%/97.82% for SARS-CoV-2 and 87.40%/73.90%/74.72%/91.36% for Nipah. CONCLUSION: These results highlight the utility of ChatGPT in drug discovery and development and reveal their potential to enable rapid drug target identification during a pandemic-level health emergency.


Subject(s)
COVID-19 , Drug Discovery , Pandemics , SARS-CoV-2 , Humans , Drug Discovery/methods , COVID-19/epidemiology , Antiviral Agents/therapeutic use , Antiviral Agents/pharmacology , COVID-19 Drug Treatment , Nipah Virus/drug effects , PubMed , Data Mining/methods
11.
J Biomed Inform ; 155: 104658, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38782169

ABSTRACT

OBJECTIVE: Relation extraction is an essential task in the field of biomedical literature mining and offers significant benefits for various downstream applications, including database curation, drug repurposing, and literature-based discovery. The broad-coverage natural language processing (NLP) tool SemRep has established a solid baseline for extracting subject-predicate-object triples from biomedical text and has served as the backbone of the Semantic MEDLINE Database (SemMedDB), a PubMed-scale repository of semantic triples. While SemRep achieves reasonable precision (0.69), its recall is relatively low (0.42). In this study, we aimed to enhance SemRep using a relation classification approach, in order to eventually increase the size and the utility of SemMedDB. METHODS: We combined and extended existing SemRep evaluation datasets to generate training data. We leveraged the pre-trained PubMedBERT model, enhancing it through additional contrastive pre-training and fine-tuning. We experimented with three entity representations: mentions, semantic types, and semantic groups. We evaluated the model performance on a portion of the SemRep Gold Standard dataset and compared it to SemRep performance. We also assessed the effect of the model on a larger set of 12K randomly selected PubMed abstracts. RESULTS: Our results show that the best model yields a precision of 0.62, recall of 0.81, and F1 score of 0.70. Assessment on 12K abstracts shows that the model could double the size of SemMedDB, when applied to entire PubMed. We also manually assessed the quality of 506 triples predicted by the model that SemRep had not previously identified, and found that 67% of these triples were correct. CONCLUSION: These findings underscore the promise of our model in achieving a more comprehensive coverage of relationships mentioned in biomedical literature, thereby showing its potential in enhancing various downstream applications of biomedical literature mining. Data and code related to this study are available at https://github.com/Michelle-Mings/SemRep_RelationClassification.


Subject(s)
Data Mining , Natural Language Processing , Semantics , Data Mining/methods , MEDLINE , PubMed , Algorithms , Humans , Databases, Factual
12.
J Am Med Inform Assoc ; 31(7): 1551-1560, 2024 Jun 20.
Article in English | MEDLINE | ID: mdl-38758667

ABSTRACT

OBJECTIVE: Synthesizing and evaluating inconsistent medical evidence is essential in evidence-based medicine. This study aimed to employ ChatGPT as a sophisticated scientific reasoning engine to identify conflicting clinical evidence and summarize unresolved questions to inform further research. MATERIALS AND METHODS: We evaluated ChatGPT's effectiveness in identifying conflicting evidence and investigated its principles of logical reasoning. An automated framework was developed to generate a PubMed dataset focused on controversial clinical topics. ChatGPT analyzed this dataset to identify consensus and controversy, and to formulate unsolved research questions. Expert evaluations were conducted 1) on the consensus and controversy for factual consistency, comprehensiveness, and potential harm and, 2) on the research questions for relevance, innovation, clarity, and specificity. RESULTS: The gpt-4-1106-preview model achieved a 90% recall rate in detecting inconsistent claim pairs within a ternary assertions setup. Notably, without explicit reasoning prompts, ChatGPT provided sound reasoning for the assertions between claims and hypotheses, based on an analysis grounded in relevance, specificity, and certainty. ChatGPT's conclusions of consensus and controversies in clinical literature were comprehensive and factually consistent. The research questions proposed by ChatGPT received high expert ratings. DISCUSSION: Our experiment implies that, in evaluating the relationship between evidence and claims, ChatGPT considered more detailed information beyond a straightforward assessment of sentimental orientation. This ability to process intricate information and conduct scientific reasoning regarding sentiment is noteworthy, particularly as this pattern emerged without explicit guidance or directives in prompts, highlighting ChatGPT's inherent logical reasoning capabilities. CONCLUSION: This study demonstrated ChatGPT's capacity to evaluate and interpret scientific claims. Such proficiency can be generalized to broader clinical research literature. ChatGPT effectively aids in facilitating clinical studies by proposing unresolved challenges based on analysis of existing studies. However, caution is advised as ChatGPT's outputs are inferences drawn from the input literature and could be harmful to clinical practice.


Subject(s)
Evidence-Based Medicine , Humans , PubMed
13.
Med Ref Serv Q ; 43(2): 106-118, 2024.
Article in English | MEDLINE | ID: mdl-38722606

ABSTRACT

The objective of this study was to examine the accuracy of indexing for "Appalachian Region"[Mesh]. Researchers performed a search in PubMed for articles published in 2019 using "Appalachian Region"[Mesh] or "Appalachia" or "Appalachian" in the title or abstract. Only 17.88% of the articles retrieved by the search were about Appalachia according to the ARC definition. Most articles retrieved appeared because they were indexed with state terms that were included as part of the mesh term. Database indexing and searching transparency is of growing importance as indexers rely increasingly on automated systems to catalog information and publications.


Subject(s)
Abstracting and Indexing , Appalachian Region , Abstracting and Indexing/methods , Humans , Medical Subject Headings , PubMed , Bibliometrics
14.
BMC Med Inform Decis Mak ; 24(Suppl 3): 98, 2024 Apr 17.
Article in English | MEDLINE | ID: mdl-38632621

ABSTRACT

BACKGROUND: Tremendous research efforts have been made in the Alzheimer's disease (AD) field to understand the disease etiology, progression and discover treatments for AD. Many mechanistic hypotheses, therapeutic targets and treatment strategies have been proposed in the last few decades. Reviewing previous work and staying current on this ever-growing body of AD publications is an essential yet difficult task for AD researchers. METHODS: In this study, we designed and implemented a natural language processing (NLP) pipeline to extract gene-specific neurodegenerative disease (ND) -focused information from the PubMed database. The collected publication information was filtered and cleaned to construct AD-related gene-specific publication profiles. Six categories of AD-related information are extracted from the processed publication data: publication trend by year, dementia type occurrence, brain region occurrence, mouse model information, keywords occurrence, and co-occurring genes. A user-friendly web portal is then developed using Django framework to provide gene query functions and data visualizations for the generalized and summarized publication information. RESULTS: By implementing the NLP pipeline, we extracted gene-specific ND-related publication information from the abstracts of the publications in the PubMed database. The results are summarized and visualized through an interactive web query portal. Multiple visualization windows display the ND publication trends, mouse models used, dementia types, involved brain regions, keywords to major AD-related biological processes, and co-occurring genes. Direct links to PubMed sites are provided for all recorded publications on the query result page of the web portal. CONCLUSION: The resulting portal is a valuable tool and data source for quick querying and displaying AD publications tailored to users' interested research areas and gene targets, which is especially convenient for users without informatic mining skills. Our study will not only keep AD field researchers updated with the progress of AD research, assist them in conducting preliminary examinations efficiently, but also offers additional support for hypothesis generation and validation which will contribute significantly to the communication, dissemination, and progress of AD research.


Subject(s)
Alzheimer Disease , Neurodegenerative Diseases , Animals , Mice , Data Mining/methods , PubMed , Databases, Factual
15.
Nucleic Acids Res ; 52(W1): W540-W546, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38572754

ABSTRACT

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.


Subject(s)
PubMed , Artificial Intelligence , Humans , Software , Data Mining/methods , Semantics , Internet
17.
PLoS One ; 19(4): e0300701, 2024.
Article in English | MEDLINE | ID: mdl-38564591

ABSTRACT

Space medicine is a vital discipline with often time-intensive and costly projects and constrained opportunities for studying various elements such as space missions, astronauts, and simulated environments. Moreover, private interests gain increasing influence in this discipline. In scientific disciplines with these features, transparent and rigorous methods are essential. Here, we undertook an evaluation of transparency indicators in publications within the field of space medicine. A meta-epidemiological assessment of PubMed Central Open Access (PMC OA) eligible articles within the field of space medicine was performed for prevalence of code sharing, data sharing, pre-registration, conflicts of interest, and funding. Text mining was performed with the rtransparent text mining algorithms with manual validation of 200 random articles to obtain corrected estimates. Across 1215 included articles, 39 (3%) shared code, 258 (21%) shared data, 10 (1%) were registered, 110 (90%) contained a conflict-of-interest statement, and 1141 (93%) included a funding statement. After manual validation, the corrected estimates for code sharing, data sharing, and registration were 5%, 27%, and 1%, respectively. Data sharing was 32% when limited to original articles and highest in space/parabolic flights (46%). Overall, across space medicine we observed modest rates of data sharing, rare sharing of code and almost non-existent protocol registration. Enhancing transparency in space medicine research is imperative for safeguarding its scientific rigor and reproducibility.


Subject(s)
Aerospace Medicine , Data Mining , Information Dissemination , PubMed , Reproducibility of Results
18.
Bioinformatics ; 40(5)2024 May 02.
Article in English | MEDLINE | ID: mdl-38597890

ABSTRACT

MOTIVATION: The rapid increase of bio-medical literature makes it harder and harder for scientists to keep pace with the discoveries on which they build their studies. Therefore, computational tools have become more widespread, among which network analysis plays a crucial role in several life-science contexts. Nevertheless, building correct and complete networks about some user-defined biomedical topics on top of the available literature is still challenging. RESULTS: We introduce NetMe 2.0, a web-based platform that automatically extracts relevant biomedical entities and their relations from a set of input texts-i.e. in the form of full-text or abstract of PubMed Central's papers, free texts, or PDFs uploaded by users-and models them as a BioMedical Knowledge Graph (BKG). NetMe 2.0 also implements an innovative Retrieval Augmented Generation module (Graph-RAG) that works on top of the relationships modeled by the BKG and allows the distilling of well-formed sentences that explain their content. The experimental results show that NetMe 2.0 can infer comprehensive and reliable biological networks with significant Precision-Recall metrics when compared to state-of-the-art approaches. AVAILABILITY AND IMPLEMENTATION: https://netme.click/.


Subject(s)
Internet , Software , Data Mining/methods , Computational Biology/methods , PubMed
19.
Database (Oxford) ; 20242024 Apr 02.
Article in English | MEDLINE | ID: mdl-38564426

ABSTRACT

The CoMentG resource contains millions of relationships between terms of biomedical interest obtained from the scientific literature. At the core of the system is a methodology for detecting significant co-mentions of concepts in the entire PubMed corpus. That method was applied to nine sets of terms covering the most important classes of biomedical concepts: diseases, symptoms/clinical signs, molecular functions, biological processes, cellular compartments, anatomic parts, cell types, bacteria and chemical compounds. We obtained more than 7 million relationships between more than 74 000 terms, and many types of relationships were not available in any other resource. As the terms were obtained from widely used resources and ontologies, the relationships are given using the standard identifiers provided by them and hence can be linked to other data. A web interface allows users to browse these associations, searching for relationships for a set of terms of interests provided as input, such as between a disease and their associated symptoms, underlying molecular processes or affected tissues. The results are presented in an interactive interface where the user can explore the reported relationships in different ways and follow links to other resources. Database URL: https://csbg.cnb.csic.es/CoMentG/.


Subject(s)
Publications , PubMed , Databases, Factual
20.
CNS Neurosci Ther ; 30(4): e14704, 2024 04.
Article in English | MEDLINE | ID: mdl-38584341

ABSTRACT

BACKGROUND: The gut microbiome is composed of various microorganisms such as bacteria, fungi, and protozoa, and constitutes an important part of the human gut. Its composition is closely related to human health and disease. Alzheimer's disease (AD) is a neurodegenerative disease whose underlying mechanism has not been fully elucidated. Recent research has shown that there are significant differences in the gut microbiota between AD patients and healthy individuals. Changes in the composition of gut microbiota may lead to the development of harmful factors associated with AD. In addition, the gut microbiota may play a role in the development and progression of AD through the gut-brain axis. However, the exact nature of this relationship has not been fully understood. AIMS: This review will elucidate the types and functions of gut microbiota and their relationship with AD and explore in depth the potential mechanisms of gut microbiota in the occurrence of AD and the prospects for treatment strategies. METHODS: Reviewed literature from PubMed and Web of Science using key terminologies related to AD and the gut microbiome. RESULTS: Research indicates that the gut microbiota can directly or indirectly influence the occurrence and progression of AD through metabolites, endotoxins, and the vagus nerve. DISCUSSION: This review discusses the future challenges and research directions regarding the gut microbiota in AD. CONCLUSION: While many unresolved issues remain regarding the gut microbiota and AD, the feasibility and immense potential of treating AD by modulating the gut microbiota are evident.


Subject(s)
Alzheimer Disease , Gastrointestinal Microbiome , Neurodegenerative Diseases , Humans , Alzheimer Disease/therapy , Brain-Gut Axis , PubMed , Brain
SELECTION OF CITATIONS
SEARCH DETAIL