Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 27.160
Filtrar
Más filtros

Publication year range
1.
Proc Natl Acad Sci U S A ; 121(28): e2320870121, 2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-38959033

RESUMEN

Efficient storage and sharing of massive biomedical data would open up their wide accessibility to different institutions and disciplines. However, compressors tailored for natural photos/videos are rapidly limited for biomedical data, while emerging deep learning-based methods demand huge training data and are difficult to generalize. Here, we propose to conduct Biomedical data compRession with Implicit nEural Function (BRIEF) by representing the target data with compact neural networks, which are data specific and thus have no generalization issues. Benefiting from the strong representation capability of implicit neural function, BRIEF achieves 2[Formula: see text]3 orders of magnitude compression on diverse biomedical data at significantly higher fidelity than existing techniques. Besides, BRIEF is of consistent performance across the whole data volume, and supports customized spatially varying fidelity. BRIEF's multifold advantageous features also serve reliable downstream tasks at low bandwidth. Our approach will facilitate low-bandwidth data sharing and promote collaboration and progress in the biomedical field.


Asunto(s)
Difusión de la Información , Redes Neurales de la Computación , Humanos , Difusión de la Información/métodos , Compresión de Datos/métodos , Aprendizaje Profundo , Investigación Biomédica/métodos
2.
Annu Rev Genomics Hum Genet ; 24: 369-391, 2023 08 25.
Artículo en Inglés | MEDLINE | ID: mdl-36791787

RESUMEN

The Human Cell Atlas (HCA) is striving to build an open community that is inclusive of all researchers adhering to its principles and as open as possible with respect to data access and use. However, open data sharing can pose certain challenges. For instance, being a global initiative, the HCA must contend with a patchwork of local and regional privacy rules. A notable example is the implementation of the European Union General Data Protection Regulation (GDPR), which caused some concern in the biomedical and genomic data-sharing community. We examine how the HCA's large, international group of researchers is investing tremendous efforts into ensuring appropriate sharing of data. We describe the HCA's objectives and governance, how it defines open data sharing, and ethico-legal challenges encountered early in its development; in particular, we describe the challenges prompted by the GDPR. Finally, we broaden the discussion to address tools and strategies that can be used to address ethical data governance.


Asunto(s)
Aminas , Ascomicetos , Humanos , Impulso (Psicología) , Unión Europea , Seguridad Computacional
3.
Brief Bioinform ; 25(Supplement_1)2024 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-39041915

RESUMEN

This manuscript describes the development of a resources module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on implementing deep learning algorithms for biomedical image data in an interactive format that uses appropriate cloud resources for data access and analyses. Biomedical-related datasets are widely used in both research and clinical settings, but the ability for professionally trained clinicians and researchers to interpret datasets becomes difficult as the size and breadth of these datasets increases. Artificial intelligence, and specifically deep learning neural networks, have recently become an important tool in novel biomedical research. However, use is limited due to their computational requirements and confusion regarding different neural network architectures. The goal of this learning module is to introduce types of deep learning neural networks and cover practices that are commonly used in biomedical research. This module is subdivided into four submodules that cover classification, augmentation, segmentation and regression. Each complementary submodule was written on the Google Cloud Platform and contains detailed code and explanations, as well as quizzes and challenges to facilitate user training. Overall, the goal of this learning module is to enable users to identify and integrate the correct type of neural network with their data while highlighting the ease-of-use of cloud computing for implementing neural networks. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Humanos , Investigación Biomédica , Algoritmos , Nube Computacional
4.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38836701

RESUMEN

Biomedical data are generated and collected from various sources, including medical imaging, laboratory tests and genome sequencing. Sharing these data for research can help address unmet health needs, contribute to scientific breakthroughs, accelerate the development of more effective treatments and inform public health policy. Due to the potential sensitivity of such data, however, privacy concerns have led to policies that restrict data sharing. In addition, sharing sensitive data requires a secure and robust infrastructure with appropriate storage solutions. Here, we examine and compare the centralized and federated data sharing models through the prism of five large-scale and real-world use cases of strategic significance within the European data sharing landscape: the French Health Data Hub, the BBMRI-ERIC Colorectal Cancer Cohort, the federated European Genome-phenome Archive, the Observational Medical Outcomes Partnership/OHDSI network and the EBRAINS Medical Informatics Platform. Our analysis indicates that centralized models facilitate data linkage, harmonization and interoperability, while federated models facilitate scaling up and legal compliance, as the data typically reside on the data generator's premises, allowing for better control of how data are shared. This comparative study thus offers guidance on the selection of the most appropriate sharing strategy for sensitive datasets and provides key insights for informed decision-making in data sharing efforts.


Asunto(s)
Disciplinas de las Ciencias Biológicas , Difusión de la Información , Humanos , Informática Médica/métodos
5.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38314912

RESUMEN

Increasing volumes of biomedical data are amassing in databases. Large-scale analyses of these data have wide-ranging applications in biology and medicine. Such analyses require tools to characterize and process entries at scale. However, existing tools, mainly centered on extracting predefined fields, often fail to comprehensively process database entries or correct evident errors-a task humans can easily perform. These tools also lack the ability to reason like domain experts, hindering their robustness and analytical depth. Recent advances with large language models (LLMs) provide a fundamentally new way to query databases. But while a tool such as ChatGPT is adept at answering questions about manually input records, challenges arise when scaling up this process. First, interactions with the LLM need to be automated. Second, limitations on input length may require a record pruning or summarization pre-processing step. Third, to behave reliably as desired, the LLM needs either well-designed, short, 'few-shot' examples, or fine-tuning based on a larger set of well-curated examples. Here, we report ChIP-GPT, based on fine-tuning of the generative pre-trained transformer (GPT) model Llama and on a program prompting the model iteratively and handling its generation of answer text. This model is designed to extract metadata from the Sequence Read Archive, emphasizing the identification of chromatin immunoprecipitation (ChIP) targets and cell lines. When trained with 100 examples, ChIP-GPT demonstrates 90-94% accuracy. Notably, it can seamlessly extract data from records with typos or absent field labels. Our proposed method is easily adaptable to customized questions and different databases.


Asunto(s)
Medicina , Humanos , Línea Celular , Inmunoprecipitación de Cromatina , Bases de Datos Factuales , Lenguaje
6.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38113073

RESUMEN

Researchers increasingly turn to explainable artificial intelligence (XAI) to analyze omics data and gain insights into the underlying biological processes. Yet, given the interdisciplinary nature of the field, many findings have only been shared in their respective research community. An overview of XAI for omics data is needed to highlight promising approaches and help detect common issues. Toward this end, we conducted a systematic mapping study. To identify relevant literature, we queried Scopus, PubMed, Web of Science, BioRxiv, MedRxiv and arXiv. Based on keywording, we developed a coding scheme with 10 facets regarding the studies' AI methods, explainability methods and omics data. Our mapping study resulted in 405 included papers published between 2010 and 2023. The inspected papers analyze DNA-based (mostly genomic), transcriptomic, proteomic or metabolomic data by means of neural networks, tree-based methods, statistical methods and further AI methods. The preferred post-hoc explainability methods are feature relevance (n = 166) and visual explanation (n = 52), while papers using interpretable approaches often resort to the use of transparent models (n = 83) or architecture modifications (n = 72). With many research gaps still apparent for XAI for omics data, we deduced eight research directions and discuss their potential for the field. We also provide exemplary research questions for each direction. Many problems with the adoption of XAI for omics data in clinical practice are yet to be resolved. This systematic mapping study outlines extant research on the topic and provides research directions for researchers and practitioners.


Asunto(s)
Inteligencia Artificial , Proteómica , Perfilación de la Expresión Génica , Genómica , Redes Neurales de la Computación
7.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37291761

RESUMEN

Adverse drug-drug interactions (DDIs) have become an increasingly serious problem in the medical and health system. Recently, the effective application of deep learning and biomedical knowledge graphs (KGs) have improved the DDI prediction performance of computational models. However, the problems of feature redundancy and KG noise also arise, bringing new challenges for researchers. To overcome these challenges, we proposed a Multi-Channel Feature Fusion model for multi-typed DDI prediction (MCFF-MTDDI). Specifically, we first extracted drug chemical structure features, drug pairs' extra label features, and KG features of drugs. Then, these different features were effectively fused by a multi-channel feature fusion module. Finally, multi-typed DDIs were predicted through the fully connected neural network. To our knowledge, we are the first to integrate the extra label information into KG-based multi-typed DDI prediction; besides, we innovatively proposed a novel KG feature learning method and a State Encoder to obtain target drug pairs' KG-based features which contained more abundant and more key drug-related KG information with less noise; furthermore, a Gated Recurrent Unit-based multi-channel feature fusion module was proposed in an innovative way to yield more comprehensive feature information about drug pairs, effectively alleviating the problem of feature redundancy. We experimented with four datasets in the multi-class and the multi-label prediction tasks to comprehensively evaluate the performance of MCFF-MTDDI for predicting interactions of known-known drugs, known-new drugs and new-new drugs. In addition, we further conducted ablation studies and case studies. All the results fully demonstrated the effectiveness of MCFF-MTDDI.


Asunto(s)
Sistemas de Liberación de Medicamentos , Redes Neurales de la Computación , Humanos , Interacciones Farmacológicas , Investigadores
8.
Annu Rev Microbiol ; 74: 337-359, 2020 09 08.
Artículo en Inglés | MEDLINE | ID: mdl-32660390

RESUMEN

The ability to detect disease early and deliver precision therapy would be transformative for the treatment of human illnesses. To achieve these goals, biosensors that can pinpoint when and where diseases emerge are needed. Rapid advances in synthetic biology are enabling us to exploit the information-processing abilities of living cells to diagnose disease and then treat it in a controlled fashion. For example, living sensors could be designed to precisely sense disease biomarkers, such as by-products of inflammation, and to respond by delivering targeted therapeutics in situ. Here, we provide an overview of ongoing efforts in microbial biosensor design, highlight translational opportunities, and discuss challenges for enabling sense-and-respond precision medicines.


Asunto(s)
Bacterias/metabolismo , Tecnología Biomédica , Técnicas Biosensibles/métodos , Biología Sintética/métodos , Bacterias/genética , Biotecnología/organización & administración , Humanos , Inflamación/diagnóstico , Procesamiento Proteico-Postraduccional
9.
Artículo en Inglés | MEDLINE | ID: mdl-38877204

RESUMEN

Between early April 2020 and late August 2020, nearly 100,000 patients hospitalized with SARS-CoV2 infections were treated with COVID-19 convalescent plasma (CCP) in the US under the auspices of an FDA-authorized Expanded Access Program (EAP) housed at the Mayo Clinic. Clinicians wishing to provide CCP to their patients during that 5-month period early in the COVID pandemic had to register their patients and provide clinical information to the EAP program. This program was utilized by some 2,200 US hospitals located in every state ranging from academic medical centers to small rural hospitals and facilitated the treatment of an ethnically and socio-economically diverse cross section of patients. Within 6 weeks of program initiation, the first signals of safety were found in 5,000 recipients of CCP, supported by a later analysis of 20,000 recipients (Joyner et al. in J Clin Invest 130:4791-4797, 2020a; Joyner et al. in Mayo Clin Proc 95:1888-1897, 2020b). By mid-summer of 2020, strong evidence was produced showing that high-titer CCP given early in the course of hospitalization could lower mortality by as much as a third (Joyner et al. in N Engl J Med 384:1015-1027, 2021; Senefeld et al. in PLoS Med 18, 2021a). These data were used by the FDA in its August decision to grant Emergency Use Authorization for CCP use in hospitals. This chapter provides a personal narrative by the principal investigator of the EAP that describes the events leading up to the program, some of its key outcomes, and some lessons learned that may be applicable to the next pandemic. This vast effort was a complete team response to a crisis and included an exceptional level of collaboration both inside and outside of the Mayo Clinic. Writing just 4 years after the initiation of the EAP, this intense professional effort, comprising many moving parts, remains hard to completely understand or fully explain in this brief narrative. As Nelson Mandela said of the perception of time during his decades in prison, "the days seemed like years, and the years seemed like days."

10.
Hum Genomics ; 18(1): 86, 2024 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-39113147

RESUMEN

BACKGROUND: The international disclosure of Chinese human genetic data continues to be a contentious issue in China, generating public debates in both traditional and social media channels. Concerns have intensified after Chinese scientists' research on pangenome data was published in the prestigious journal Nature. METHODS: This study scrutinized microblogs posted on Weibo, a popular Chinese social media site, in the two months immediately following the publication (June 14, 2023-August 21, 2023). Content analysis was conducted to assess the nature of public responses, justifications for positive or negative attitudes, and the users' overall knowledge of how Chinese human genetic information is regulated and managed in China. RESULTS: Weibo users displayed contrasting attitudes towards the article's public disclose of pangenome research data, with 18% positive, 64% negative, and 18% neutral. Positive attitudes came primarily from verified government and media accounts, which praised the publication. In contrast, negative attitudes originated from individual users who were concerned about national security and health risks and often believed that the researchers have betrayed China. The benefits of data sharing highlighted in the commentaries included advancements in disease research and scientific progress. Approximately 16% of the microblogs indicated that Weibo users had misunderstood existing regulations and laws governing data sharing and stewardship. CONCLUSIONS: Based on the predominantly negative public attitudes toward scientific data sharing established by our study, we recommend enhanced outreach by scientists and scientific institutions to increase the public understanding of developments in genetic research, international data sharing, and associated regulations. Additionally, governmental agencies can alleviate public fears and concerns by being more transparent about their security reviews of international collaborative research involving Chinese human genetic data and its cross-border transfer.


Asunto(s)
Investigación Biomédica , Difusión de la Información , Opinión Pública , Medios de Comunicación Sociales , Humanos , China , Genoma Humano/genética
11.
Methods ; 226: 9-18, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38604412

RESUMEN

Biomedical event extraction is an information extraction task to obtain events from biomedical text, whose targets include the type, the trigger, and the respective arguments involved in an event. Traditional biomedical event extraction usually adopts a pipelined approach, which contains trigger identification, argument role recognition, and finally event construction either using specific rules or by machine learning. In this paper, we propose an n-ary relation extraction method based on the BERT pre-training model to construct Binding events, in order to capture the semantic information about an event's context and its participants. The experimental results show that our method achieves promising results on the GE11 and GE13 corpora of the BioNLP shared task with F1 scores of 63.14% and 59.40%, respectively. It demonstrates that by significantly improving the performance of Binding events, the overall performance of the pipelined event extraction approach or even exceeds those of current joint learning methods.


Asunto(s)
Minería de Datos , Aprendizaje Automático , Minería de Datos/métodos , Humanos , Semántica , Procesamiento de Lenguaje Natural , Algoritmos
12.
Methods ; 226: 71-77, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38641084

RESUMEN

Biomedical Named Entity Recognition (BioNER) is one of the most basic tasks in biomedical text mining, which aims to automatically identify and classify biomedical entities in text. Recently, deep learning-based methods have been applied to Biomedical Named Entity Recognition and have shown encouraging results. However, many biological entities are polysemous and ambiguous, which is one of the main obstacles to the task of biomedical named entity recognition. Deep learning methods require large amounts of training data, so the lack of data also affect the performance of model recognition. To solve the problem of polysemous words and insufficient data, for the task of biomedical named entity recognition, we propose a multi-task learning framework fused with language model based on the BiLSTM-CRF architecture. Our model uses a language model to design a differential encoding of the context, which could obtain dynamic word vectors to distinguish words in different datasets. Moreover, we use a multi-task learning method to collectively share the dynamic word vector of different types of entities to improve the recognition performance of each type of entity. Experimental results show that our model reduces the false positives caused by polysemous words through differentiated coding, and improves the performance of each subtask by sharing information between different entity data. Compared with other state-of-the art methods, our model achieved superior results in four typical training sets, and achieved the best results in F1 values.


Asunto(s)
Minería de Datos , Aprendizaje Profundo , Minería de Datos/métodos , Humanos , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Lenguaje
13.
Methods ; 226: 78-88, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38643910

RESUMEN

In recent years, there has been a surge in the publication of clinical trial reports, making it challenging to conduct systematic reviews. Automatically extracting Population, Intervention, Comparator, and Outcome (PICO) from clinical trial studies can alleviate the traditionally time-consuming process of manually scrutinizing systematic reviews. Existing approaches of PICO frame extraction involves supervised approach that relies on the existence of manually annotated data points in the form of BIO label tagging. Recent approaches, such as In-Context Learning (ICL), which has been shown to be effective for a number of downstream NLP tasks, require the use of labeled examples. In this work, we adopt ICL strategy by employing the pretrained knowledge of Large Language Models (LLMs), gathered during the pretraining phase of an LLM, to automatically extract the PICO-related terminologies from clinical trial documents in unsupervised set up to bypass the availability of large number of annotated data instances. Additionally, to showcase the highest effectiveness of LLM in oracle scenario where large number of annotated samples are available, we adopt the instruction tuning strategy by employing Low Rank Adaptation (LORA) to conduct the training of gigantic model in low resource environment for the PICO frame extraction task. More specifically, both of the proposed frameworks utilize AlpaCare as base LLM which employs both few-shot in-context learning and instruction tuning techniques to extract PICO-related terms from the clinical trial reports. We applied these approaches to the widely used coarse-grained datasets such as EBM-NLP, EBM-COMET and fine-grained datasets such as EBM-NLPrev and EBM-NLPh. Our empirical results show that our proposed ICL-based framework produces comparable results on all the version of EBM-NLP datasets and the proposed instruction tuned version of our framework produces state-of-the-art results on all the different EBM-NLP datasets. Our project is available at https://github.com/shrimonmuke0202/AlpaPICO.git.


Asunto(s)
Ensayos Clínicos como Asunto , Procesamiento de Lenguaje Natural , Humanos , Ensayos Clínicos como Asunto/métodos , Minería de Datos/métodos , Aprendizaje Automático
14.
Methods ; 231: 8-14, 2024 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-39241919

RESUMEN

Biomedical event causal relation extraction (BECRE), as a subtask of biomedical information extraction, aims to extract event causal relation facts from unstructured biomedical texts and plays an essential role in many downstream tasks. The existing works have two main problems: i) Only shallow features are limited in helping the model establish potential relationships between biomedical events. ii) Using the traditional oversampling method to solve the data imbalance problem of the BECRE tasks ignores the requirements for data diversifying. This paper proposes a novel biomedical event causal relation extraction method to solve the above problems using deep knowledge fusion and Roberta-based data augmentation. To address the first problem, we fuse deep knowledge, including structural event representation and entity relation path, for establishing potential semantic connections between biomedical events. We use the Graph Convolutional Neural network (GCN) and the predicated tensor model to acquire structural event representation, and entity relation paths are encoded based on the external knowledge bases (GTD, CDR, CHR, GDA and UMLS). We introduce the triplet attention mechanism to fuse structural event representation and entity relation path information. Besides, this paper proposes the Roberta-based data augmentation method to address the second problem, some words of biomedical text, except biomedical events, are masked proportionally and randomly, and then pre-trained Roberta generates data instances for the imbalance BECRE dataset. Extensive experimental results on Hahn-Powell's and BioCause datasets confirm that the proposed method achieves state-of-the-art performance compared to current advances.

15.
Nano Lett ; 2024 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-38833276

RESUMEN

Inspired by the imbalance between extrinsic and intrinsic tendon healing, this study fabricated a new biofilter scaffold with a hierarchical structure based on a melt electrowriting technique. The outer multilayered fibrous structure with connected porous characteristics provides a novel passageway for vascularization and isolates the penetration of scar fibers, which can be referred to as a biofilter process. In vitro experiments found that the porous architecture in the outer layer can effectively prevent cell infiltration, whereas the aligned fibers in the inner layer can promote cell recruitment and growth, as well as the expression of tendon-associated proteins in a simulated friction condition. It was shown in vivo that the biofilter process could promote tendon healing and reduce scar invasion. Herein, this novel strategy indicates great potential to design new biomaterials for balancing extrinsic and intrinsic healing and realizing scarless tendon healing.

16.
J Infect Dis ; 229(1): 7-9, 2024 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-37345952

RESUMEN

Since coronavirus disease 2019 (COVID-19) first emerged more than 3 years ago, more than 1200 articles have been written describing "lessons learned" from the pandemic. While these articles may contain valuable insights, reading them all would be impossible. A machine learning clustering analysis was therefore performed to obtain an overview of these publications and to highlight the benefits of using machine learning to analyze the vast and ever-growing COVID-19 literature.


Asunto(s)
COVID-19 , Humanos , SARS-CoV-2 , Aprendizaje Automático
17.
BMC Bioinformatics ; 25(1): 1, 2024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-38166530

RESUMEN

Graph embedding techniques are using deep learning algorithms in data analysis to solve problems of such as node classification, link prediction, community detection, and visualization. Although typically used in the context of guessing friendships in social media, several applications for graph embedding techniques in biomedical data analysis have emerged. While these approaches remain computationally demanding, several developments over the last years facilitate their application to study biomedical data and thus may help advance biological discoveries. Therefore, in this review, we discuss the principles of graph embedding techniques and explore the usefulness for understanding biological network data derived from mass spectrometry and sequencing experiments, the current workhorses of systems biology studies. In particular, we focus on recent examples for characterizing protein-protein interaction networks and predicting novel drug functions.


Asunto(s)
Algoritmos , Medios de Comunicación Sociales , Humanos , Espectrometría de Masas , Análisis de Datos , Mapas de Interacción de Proteínas
18.
BMC Bioinformatics ; 25(1): 112, 2024 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-38486137

RESUMEN

BACKGROUND: The constant growth of biomedical data is accompanied by the need for new methodologies to effectively and efficiently extract machine-readable knowledge for training and testing purposes. A crucial aspect in this regard is creating large, often manually or semi-manually, annotated corpora vital for developing effective and efficient methods for tasks like relation extraction, topic recognition, and entity linking. However, manual annotation is expensive and time-consuming especially if not assisted by interactive, intuitive, and collaborative computer-aided tools. To support healthcare experts in the annotation process and foster annotated corpora creation, we present MetaTron. MetaTron is an open-source and free-to-use web-based annotation tool to annotate biomedical data interactively and collaboratively; it supports both mention-level and document-level annotations also integrating automatic built-in predictions. Moreover, MetaTron enables relation annotation with the support of ontologies, functionalities often overlooked by off-the-shelf annotation tools. RESULTS: We conducted a qualitative analysis to compare MetaTron with a set of manual annotation tools including TeamTat, INCEpTION, LightTag, MedTAG, and brat, on three sets of criteria: technical, data, and functional. A quantitative evaluation allowed us to assess MetaTron performances in terms of time and number of clicks to annotate a set of documents. The results indicated that MetaTron fulfills almost all the selected criteria and achieves the best performances. CONCLUSIONS: MetaTron stands out as one of the few annotation tools targeting the biomedical domain supporting the annotation of relations, and fully customizable with documents in several formats-PDF included, as well as abstracts retrieved from PubMed, Semantic Scholar, and OpenAIRE. To meet any user need, we released MetaTron both as an online instance and as a Docker image locally deployable.


Asunto(s)
Poder Psicológico , Semántica , PubMed
19.
BMC Bioinformatics ; 25(1): 152, 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38627652

RESUMEN

BACKGROUND: Text summarization is a challenging problem in Natural Language Processing, which involves condensing the content of textual documents without losing their overall meaning and information content, In the domain of bio-medical research, summaries are critical for efficient data analysis and information retrieval. While several bio-medical text summarizers exist in the literature, they often miss out on an essential text aspect: text semantics. RESULTS: This paper proposes a novel extractive summarizer that preserves text semantics by utilizing bio-semantic models. We evaluate our approach using ROUGE on a standard dataset and compare it with three state-of-the-art summarizers. Our results show that our approach outperforms existing summarizers. CONCLUSION: The usage of semantics can improve summarizer performance and lead to better summaries. Our summarizer has the potential to aid in efficient data analysis and information retrieval in the field of biomedical research.


Asunto(s)
Algoritmos , Investigación Biomédica , Semántica , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural
20.
BMC Bioinformatics ; 25(1): 281, 2024 Aug 27.
Artículo en Inglés | MEDLINE | ID: mdl-39192204

RESUMEN

BACKGROUND: Mining the vast pool of biomedical literature to extract accurate responses and relevant references is challenging due to the domain's interdisciplinary nature, specialized jargon, and continuous evolution. Early natural language processing (NLP) approaches often led to incorrect answers as they failed to comprehend the nuances of natural language. However, transformer models have significantly advanced the field by enabling the creation of large language models (LLMs), enhancing question-answering (QA) tasks. Despite these advances, current LLM-based solutions for specialized domains like biology and biomedicine still struggle to generate up-to-date responses while avoiding "hallucination" or generating plausible but factually incorrect responses. RESULTS: Our work focuses on enhancing prompts using a retrieval-augmented architecture to guide LLMs in generating meaningful responses for biomedical QA tasks. We evaluated two approaches: one relying on text embedding and vector similarity in a high-dimensional space, and our proposed method, which uses explicit signals in user queries to extract meaningful contexts. For robust evaluation, we tested these methods on 50 specific and challenging questions from diverse biomedical topics, comparing their performance against a baseline model, BM25. Retrieval performance of our method was significantly better than others, achieving a median Precision@10 of 0.95, which indicates the fraction of the top 10 retrieved chunks that are relevant. We used GPT-4, OpenAI's most advanced LLM to maximize the answer quality and manually accessed LLM-generated responses. Our method achieved a median answer quality score of 2.5, surpassing both the baseline model and the text embedding-based approach. We developed a QA bot, WeiseEule ( https://github.com/wasimaftab/WeiseEule-LocalHost ), which utilizes these methods for comparative analysis and also offers advanced features for review writing and identifying relevant articles for citation. CONCLUSIONS: Our findings highlight the importance of prompt enhancement methods that utilize explicit signals in user queries over traditional text embedding-based approaches to improve LLM-generated responses for specialized queries in specialized domains such as biology and biomedicine. By providing users complete control over the information fed into the LLM, our approach addresses some of the major drawbacks of existing web-based chatbots and LLM-based QA systems, including hallucinations and the generation of irrelevant or outdated responses.


Asunto(s)
Minería de Datos , Procesamiento de Lenguaje Natural , Minería de Datos/métodos , Almacenamiento y Recuperación de la Información/métodos
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda