RESUMO
We envision "AI scientists" as systems capable of skeptical learning and reasoning that empower biomedical research through collaborative agents that integrate AI models and biomedical tools with experimental platforms. Rather than taking humans out of the discovery process, biomedical AI agents combine human creativity and expertise with AI's ability to analyze large datasets, navigate hypothesis spaces, and execute repetitive tasks. AI agents are poised to be proficient in various tasks, planning discovery workflows and performing self-assessment to identify and mitigate gaps in their knowledge. These agents use large language models and generative models to feature structured memory for continual learning and use machine learning tools to incorporate scientific knowledge, biological principles, and theories. AI agents can impact areas ranging from virtual cell simulation, programmable control of phenotypes, and the design of cellular circuits to developing new therapies.
Assuntos
Inteligência Artificial , Pesquisa Biomédica , Humanos , Aprendizado de MáquinaRESUMO
The biomechanical properties of cells and tissues play an important role in our fundamental understanding of the structures and functions of biological systems at both the cellular and subcellular levels. Recently, Brillouin microscopy, which offers a label-free spectroscopic means of assessing viscoelastic properties in vivo, has emerged as a powerful way to interrogate those properties on a microscopic level in living tissues. However, susceptibility to photodamage and photobleaching, particularly when high-intensity laser beams are used to induce Brillouin scattering, poses a significant challenge. This article introduces a transformative approach designed to mitigate photodamage in biological and biomedical studies, enabling nondestructive, label-free assessments of mechanical properties in live biological samples. By leveraging quantum-light-enhanced stimulated Brillouin scattering (SBS) imaging contrast, the signal-to-noise ratio is significantly elevated, thereby increasing sample viability and extending interrogation times without compromising the integrity of living samples. The tangible impact of this methodology is evidenced by a notable three-fold increase in sample viability observed after subjecting the samples to three hours of continuous squeezed-light illumination, surpassing the traditional coherent light-based approaches. The quantum-enhanced SBS imaging holds promise across diverse fields, such as cancer biology and neuroscience where preserving sample vitality is of paramount significance. By mitigating concerns regarding photodamage and photobleaching associated with high-intensity lasers, this technological breakthrough expands our horizons for exploring the mechanical properties of live biological systems, paving the way for an era of research and clinical applications.
Assuntos
Luz , Animais , Humanos , Fenômenos Biomecânicos , Microscopia/métodos , CamundongosRESUMO
Efficient storage and sharing of massive biomedical data would open up their wide accessibility to different institutions and disciplines. However, compressors tailored for natural photos/videos are rapidly limited for biomedical data, while emerging deep learning-based methods demand huge training data and are difficult to generalize. Here, we propose to conduct Biomedical data compRession with Implicit nEural Function (BRIEF) by representing the target data with compact neural networks, which are data specific and thus have no generalization issues. Benefiting from the strong representation capability of implicit neural function, BRIEF achieves 2[Formula: see text]3 orders of magnitude compression on diverse biomedical data at significantly higher fidelity than existing techniques. Besides, BRIEF is of consistent performance across the whole data volume, and supports customized spatially varying fidelity. BRIEF's multifold advantageous features also serve reliable downstream tasks at low bandwidth. Our approach will facilitate low-bandwidth data sharing and promote collaboration and progress in the biomedical field.
Assuntos
Disseminação de Informação , Redes Neurais de Computação , Humanos , Disseminação de Informação/métodos , Compressão de Dados/métodos , Aprendizado Profundo , Pesquisa Biomédica/métodosRESUMO
The Human Cell Atlas (HCA) is striving to build an open community that is inclusive of all researchers adhering to its principles and as open as possible with respect to data access and use. However, open data sharing can pose certain challenges. For instance, being a global initiative, the HCA must contend with a patchwork of local and regional privacy rules. A notable example is the implementation of the European Union General Data Protection Regulation (GDPR), which caused some concern in the biomedical and genomic data-sharing community. We examine how the HCA's large, international group of researchers is investing tremendous efforts into ensuring appropriate sharing of data. We describe the HCA's objectives and governance, how it defines open data sharing, and ethico-legal challenges encountered early in its development; in particular, we describe the challenges prompted by the GDPR. Finally, we broaden the discussion to address tools and strategies that can be used to address ethical data governance.
Assuntos
Aminas , Ascomicetos , Humanos , Impulso (Psicologia) , União Europeia , Segurança ComputacionalRESUMO
Batch effects introduce significant variability into high-dimensional data, complicating accurate analysis and leading to potentially misleading conclusions if not adequately addressed. Despite technological and algorithmic advancements in biomedical research, effectively managing batch effects remains a complex challenge requiring comprehensive considerations. This paper underscores the necessity of a flexible and holistic approach for selecting batch effect correction algorithms (BECAs), advocating for proper BECA evaluations and consideration of artificial intelligence-based strategies. We also discuss key challenges in batch effect correction, including the importance of uncovering hidden batch factors and understanding the impact of design imbalance, missing values, and aggressive correction. Our aim is to provide researchers with a robust framework for effective batch effects management and enhancing the reliability of high-dimensional data analyses.
Assuntos
Algoritmos , Humanos , Inteligência Artificial , Pesquisa Biomédica , Biologia Computacional/métodos , Reprodutibilidade dos TestesRESUMO
Biomedical data are generated and collected from various sources, including medical imaging, laboratory tests and genome sequencing. Sharing these data for research can help address unmet health needs, contribute to scientific breakthroughs, accelerate the development of more effective treatments and inform public health policy. Due to the potential sensitivity of such data, however, privacy concerns have led to policies that restrict data sharing. In addition, sharing sensitive data requires a secure and robust infrastructure with appropriate storage solutions. Here, we examine and compare the centralized and federated data sharing models through the prism of five large-scale and real-world use cases of strategic significance within the European data sharing landscape: the French Health Data Hub, the BBMRI-ERIC Colorectal Cancer Cohort, the federated European Genome-phenome Archive, the Observational Medical Outcomes Partnership/OHDSI network and the EBRAINS Medical Informatics Platform. Our analysis indicates that centralized models facilitate data linkage, harmonization and interoperability, while federated models facilitate scaling up and legal compliance, as the data typically reside on the data generator's premises, allowing for better control of how data are shared. This comparative study thus offers guidance on the selection of the most appropriate sharing strategy for sensitive datasets and provides key insights for informed decision-making in data sharing efforts.
Assuntos
Disciplinas das Ciências Biológicas , Disseminação de Informação , Humanos , Informática Médica/métodosRESUMO
This manuscript describes the development of a resources module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on implementing deep learning algorithms for biomedical image data in an interactive format that uses appropriate cloud resources for data access and analyses. Biomedical-related datasets are widely used in both research and clinical settings, but the ability for professionally trained clinicians and researchers to interpret datasets becomes difficult as the size and breadth of these datasets increases. Artificial intelligence, and specifically deep learning neural networks, have recently become an important tool in novel biomedical research. However, use is limited due to their computational requirements and confusion regarding different neural network architectures. The goal of this learning module is to introduce types of deep learning neural networks and cover practices that are commonly used in biomedical research. This module is subdivided into four submodules that cover classification, augmentation, segmentation and regression. Each complementary submodule was written on the Google Cloud Platform and contains detailed code and explanations, as well as quizzes and challenges to facilitate user training. Overall, the goal of this learning module is to enable users to identify and integrate the correct type of neural network with their data while highlighting the ease-of-use of cloud computing for implementing neural networks. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.
Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Humanos , Pesquisa Biomédica , Algoritmos , Computação em NuvemRESUMO
Increasing volumes of biomedical data are amassing in databases. Large-scale analyses of these data have wide-ranging applications in biology and medicine. Such analyses require tools to characterize and process entries at scale. However, existing tools, mainly centered on extracting predefined fields, often fail to comprehensively process database entries or correct evident errors-a task humans can easily perform. These tools also lack the ability to reason like domain experts, hindering their robustness and analytical depth. Recent advances with large language models (LLMs) provide a fundamentally new way to query databases. But while a tool such as ChatGPT is adept at answering questions about manually input records, challenges arise when scaling up this process. First, interactions with the LLM need to be automated. Second, limitations on input length may require a record pruning or summarization pre-processing step. Third, to behave reliably as desired, the LLM needs either well-designed, short, 'few-shot' examples, or fine-tuning based on a larger set of well-curated examples. Here, we report ChIP-GPT, based on fine-tuning of the generative pre-trained transformer (GPT) model Llama and on a program prompting the model iteratively and handling its generation of answer text. This model is designed to extract metadata from the Sequence Read Archive, emphasizing the identification of chromatin immunoprecipitation (ChIP) targets and cell lines. When trained with 100 examples, ChIP-GPT demonstrates 90-94% accuracy. Notably, it can seamlessly extract data from records with typos or absent field labels. Our proposed method is easily adaptable to customized questions and different databases.
Assuntos
Medicina , Humanos , Linhagem Celular , Imunoprecipitação da Cromatina , Bases de Dados Factuais , IdiomaRESUMO
Researchers increasingly turn to explainable artificial intelligence (XAI) to analyze omics data and gain insights into the underlying biological processes. Yet, given the interdisciplinary nature of the field, many findings have only been shared in their respective research community. An overview of XAI for omics data is needed to highlight promising approaches and help detect common issues. Toward this end, we conducted a systematic mapping study. To identify relevant literature, we queried Scopus, PubMed, Web of Science, BioRxiv, MedRxiv and arXiv. Based on keywording, we developed a coding scheme with 10 facets regarding the studies' AI methods, explainability methods and omics data. Our mapping study resulted in 405 included papers published between 2010 and 2023. The inspected papers analyze DNA-based (mostly genomic), transcriptomic, proteomic or metabolomic data by means of neural networks, tree-based methods, statistical methods and further AI methods. The preferred post-hoc explainability methods are feature relevance (n = 166) and visual explanation (n = 52), while papers using interpretable approaches often resort to the use of transparent models (n = 83) or architecture modifications (n = 72). With many research gaps still apparent for XAI for omics data, we deduced eight research directions and discuss their potential for the field. We also provide exemplary research questions for each direction. Many problems with the adoption of XAI for omics data in clinical practice are yet to be resolved. This systematic mapping study outlines extant research on the topic and provides research directions for researchers and practitioners.
Assuntos
Inteligência Artificial , Proteômica , Perfilação da Expressão Gênica , Genômica , Redes Neurais de ComputaçãoRESUMO
Adverse drug-drug interactions (DDIs) have become an increasingly serious problem in the medical and health system. Recently, the effective application of deep learning and biomedical knowledge graphs (KGs) have improved the DDI prediction performance of computational models. However, the problems of feature redundancy and KG noise also arise, bringing new challenges for researchers. To overcome these challenges, we proposed a Multi-Channel Feature Fusion model for multi-typed DDI prediction (MCFF-MTDDI). Specifically, we first extracted drug chemical structure features, drug pairs' extra label features, and KG features of drugs. Then, these different features were effectively fused by a multi-channel feature fusion module. Finally, multi-typed DDIs were predicted through the fully connected neural network. To our knowledge, we are the first to integrate the extra label information into KG-based multi-typed DDI prediction; besides, we innovatively proposed a novel KG feature learning method and a State Encoder to obtain target drug pairs' KG-based features which contained more abundant and more key drug-related KG information with less noise; furthermore, a Gated Recurrent Unit-based multi-channel feature fusion module was proposed in an innovative way to yield more comprehensive feature information about drug pairs, effectively alleviating the problem of feature redundancy. We experimented with four datasets in the multi-class and the multi-label prediction tasks to comprehensively evaluate the performance of MCFF-MTDDI for predicting interactions of known-known drugs, known-new drugs and new-new drugs. In addition, we further conducted ablation studies and case studies. All the results fully demonstrated the effectiveness of MCFF-MTDDI.
Assuntos
Sistemas de Liberação de Medicamentos , Redes Neurais de Computação , Humanos , Interações Medicamentosas , PesquisadoresRESUMO
The ability to detect disease early and deliver precision therapy would be transformative for the treatment of human illnesses. To achieve these goals, biosensors that can pinpoint when and where diseases emerge are needed. Rapid advances in synthetic biology are enabling us to exploit the information-processing abilities of living cells to diagnose disease and then treat it in a controlled fashion. For example, living sensors could be designed to precisely sense disease biomarkers, such as by-products of inflammation, and to respond by delivering targeted therapeutics in situ. Here, we provide an overview of ongoing efforts in microbial biosensor design, highlight translational opportunities, and discuss challenges for enabling sense-and-respond precision medicines.
Assuntos
Bactérias/metabolismo , Tecnologia Biomédica , Técnicas Biossensoriais/métodos , Biologia Sintética/métodos , Bactérias/genética , Biotecnologia/organização & administração , Humanos , Inflamação/diagnóstico , Processamento de Proteína Pós-TraducionalRESUMO
Between early April 2020 and late August 2020, nearly 100,000 patients hospitalized with SARS-CoV2 infections were treated with COVID-19 convalescent plasma (CCP) in the US under the auspices of an FDA-authorized Expanded Access Program (EAP) housed at the Mayo Clinic. Clinicians wishing to provide CCP to their patients during that 5-month period early in the COVID pandemic had to register their patients and provide clinical information to the EAP program. This program was utilized by some 2,200 US hospitals located in every state ranging from academic medical centers to small rural hospitals and facilitated the treatment of an ethnically and socio-economically diverse cross section of patients. Within 6 weeks of program initiation, the first signals of safety were found in 5,000 recipients of CCP, supported by a later analysis of 20,000 recipients (Joyner et al. in J Clin Invest 130:4791-4797, 2020a; Joyner et al. in Mayo Clin Proc 95:1888-1897, 2020b). By mid-summer of 2020, strong evidence was produced showing that high-titer CCP given early in the course of hospitalization could lower mortality by as much as a third (Joyner et al. in N Engl J Med 384:1015-1027, 2021; Senefeld et al. in PLoS Med 18, 2021a). These data were used by the FDA in its August decision to grant Emergency Use Authorization for CCP use in hospitals. This chapter provides a personal narrative by the principal investigator of the EAP that describes the events leading up to the program, some of its key outcomes, and some lessons learned that may be applicable to the next pandemic. This vast effort was a complete team response to a crisis and included an exceptional level of collaboration both inside and outside of the Mayo Clinic. Writing just 4 years after the initiation of the EAP, this intense professional effort, comprising many moving parts, remains hard to completely understand or fully explain in this brief narrative. As Nelson Mandela said of the perception of time during his decades in prison, "the days seemed like years, and the years seemed like days."
RESUMO
BACKGROUND: The international disclosure of Chinese human genetic data continues to be a contentious issue in China, generating public debates in both traditional and social media channels. Concerns have intensified after Chinese scientists' research on pangenome data was published in the prestigious journal Nature. METHODS: This study scrutinized microblogs posted on Weibo, a popular Chinese social media site, in the two months immediately following the publication (June 14, 2023-August 21, 2023). Content analysis was conducted to assess the nature of public responses, justifications for positive or negative attitudes, and the users' overall knowledge of how Chinese human genetic information is regulated and managed in China. RESULTS: Weibo users displayed contrasting attitudes towards the article's public disclose of pangenome research data, with 18% positive, 64% negative, and 18% neutral. Positive attitudes came primarily from verified government and media accounts, which praised the publication. In contrast, negative attitudes originated from individual users who were concerned about national security and health risks and often believed that the researchers have betrayed China. The benefits of data sharing highlighted in the commentaries included advancements in disease research and scientific progress. Approximately 16% of the microblogs indicated that Weibo users had misunderstood existing regulations and laws governing data sharing and stewardship. CONCLUSIONS: Based on the predominantly negative public attitudes toward scientific data sharing established by our study, we recommend enhanced outreach by scientists and scientific institutions to increase the public understanding of developments in genetic research, international data sharing, and associated regulations. Additionally, governmental agencies can alleviate public fears and concerns by being more transparent about their security reviews of international collaborative research involving Chinese human genetic data and its cross-border transfer.
Assuntos
Pesquisa Biomédica , Disseminação de Informação , Opinião Pública , Mídias Sociais , Humanos , China , Genoma Humano/genéticaRESUMO
Biomedical event extraction is an information extraction task to obtain events from biomedical text, whose targets include the type, the trigger, and the respective arguments involved in an event. Traditional biomedical event extraction usually adopts a pipelined approach, which contains trigger identification, argument role recognition, and finally event construction either using specific rules or by machine learning. In this paper, we propose an n-ary relation extraction method based on the BERT pre-training model to construct Binding events, in order to capture the semantic information about an event's context and its participants. The experimental results show that our method achieves promising results on the GE11 and GE13 corpora of the BioNLP shared task with F1 scores of 63.14% and 59.40%, respectively. It demonstrates that by significantly improving the performance of Binding events, the overall performance of the pipelined event extraction approach or even exceeds those of current joint learning methods.
Assuntos
Mineração de Dados , Aprendizado de Máquina , Mineração de Dados/métodos , Humanos , Semântica , Processamento de Linguagem Natural , AlgoritmosRESUMO
Biomedical Named Entity Recognition (BioNER) is one of the most basic tasks in biomedical text mining, which aims to automatically identify and classify biomedical entities in text. Recently, deep learning-based methods have been applied to Biomedical Named Entity Recognition and have shown encouraging results. However, many biological entities are polysemous and ambiguous, which is one of the main obstacles to the task of biomedical named entity recognition. Deep learning methods require large amounts of training data, so the lack of data also affect the performance of model recognition. To solve the problem of polysemous words and insufficient data, for the task of biomedical named entity recognition, we propose a multi-task learning framework fused with language model based on the BiLSTM-CRF architecture. Our model uses a language model to design a differential encoding of the context, which could obtain dynamic word vectors to distinguish words in different datasets. Moreover, we use a multi-task learning method to collectively share the dynamic word vector of different types of entities to improve the recognition performance of each type of entity. Experimental results show that our model reduces the false positives caused by polysemous words through differentiated coding, and improves the performance of each subtask by sharing information between different entity data. Compared with other state-of-the art methods, our model achieved superior results in four typical training sets, and achieved the best results in F1 values.
Assuntos
Mineração de Dados , Aprendizado Profundo , Mineração de Dados/métodos , Humanos , Processamento de Linguagem Natural , Redes Neurais de Computação , IdiomaRESUMO
In recent years, there has been a surge in the publication of clinical trial reports, making it challenging to conduct systematic reviews. Automatically extracting Population, Intervention, Comparator, and Outcome (PICO) from clinical trial studies can alleviate the traditionally time-consuming process of manually scrutinizing systematic reviews. Existing approaches of PICO frame extraction involves supervised approach that relies on the existence of manually annotated data points in the form of BIO label tagging. Recent approaches, such as In-Context Learning (ICL), which has been shown to be effective for a number of downstream NLP tasks, require the use of labeled examples. In this work, we adopt ICL strategy by employing the pretrained knowledge of Large Language Models (LLMs), gathered during the pretraining phase of an LLM, to automatically extract the PICO-related terminologies from clinical trial documents in unsupervised set up to bypass the availability of large number of annotated data instances. Additionally, to showcase the highest effectiveness of LLM in oracle scenario where large number of annotated samples are available, we adopt the instruction tuning strategy by employing Low Rank Adaptation (LORA) to conduct the training of gigantic model in low resource environment for the PICO frame extraction task. More specifically, both of the proposed frameworks utilize AlpaCare as base LLM which employs both few-shot in-context learning and instruction tuning techniques to extract PICO-related terms from the clinical trial reports. We applied these approaches to the widely used coarse-grained datasets such as EBM-NLP, EBM-COMET and fine-grained datasets such as EBM-NLPrev and EBM-NLPh. Our empirical results show that our proposed ICL-based framework produces comparable results on all the version of EBM-NLP datasets and the proposed instruction tuned version of our framework produces state-of-the-art results on all the different EBM-NLP datasets. Our project is available at https://github.com/shrimonmuke0202/AlpaPICO.git.
Assuntos
Ensaios Clínicos como Assunto , Processamento de Linguagem Natural , Humanos , Ensaios Clínicos como Assunto/métodos , Mineração de Dados/métodos , Aprendizado de MáquinaRESUMO
Biomedical event causal relation extraction (BECRE), as a subtask of biomedical information extraction, aims to extract event causal relation facts from unstructured biomedical texts and plays an essential role in many downstream tasks. The existing works have two main problems: i) Only shallow features are limited in helping the model establish potential relationships between biomedical events. ii) Using the traditional oversampling method to solve the data imbalance problem of the BECRE tasks ignores the requirements for data diversifying. This paper proposes a novel biomedical event causal relation extraction method to solve the above problems using deep knowledge fusion and Roberta-based data augmentation. To address the first problem, we fuse deep knowledge, including structural event representation and entity relation path, for establishing potential semantic connections between biomedical events. We use the Graph Convolutional Neural network (GCN) and the predicated tensor model to acquire structural event representation, and entity relation paths are encoded based on the external knowledge bases (GTD, CDR, CHR, GDA and UMLS). We introduce the triplet attention mechanism to fuse structural event representation and entity relation path information. Besides, this paper proposes the Roberta-based data augmentation method to address the second problem, some words of biomedical text, except biomedical events, are masked proportionally and randomly, and then pre-trained Roberta generates data instances for the imbalance BECRE dataset. Extensive experimental results on Hahn-Powell's and BioCause datasets confirm that the proposed method achieves state-of-the-art performance compared to current advances.
Assuntos
Mineração de Dados , Redes Neurais de Computação , Humanos , Mineração de Dados/métodos , Aprendizado Profundo , Semântica , Bases de ConhecimentoRESUMO
Inspired by the imbalance between extrinsic and intrinsic tendon healing, this study fabricated a new biofilter scaffold with a hierarchical structure based on a melt electrowriting technique. The outer multilayered fibrous structure with connected porous characteristics provides a novel passageway for vascularization and isolates the penetration of scar fibers, which can be referred to as a biofilter process. In vitro experiments found that the porous architecture in the outer layer can effectively prevent cell infiltration, whereas the aligned fibers in the inner layer can promote cell recruitment and growth, as well as the expression of tendon-associated proteins in a simulated friction condition. It was shown in vivo that the biofilter process could promote tendon healing and reduce scar invasion. Herein, this novel strategy indicates great potential to design new biomaterials for balancing extrinsic and intrinsic healing and realizing scarless tendon healing.
RESUMO
Since coronavirus disease 2019 (COVID-19) first emerged more than 3 years ago, more than 1200 articles have been written describing "lessons learned" from the pandemic. While these articles may contain valuable insights, reading them all would be impossible. A machine learning clustering analysis was therefore performed to obtain an overview of these publications and to highlight the benefits of using machine learning to analyze the vast and ever-growing COVID-19 literature.
Assuntos
COVID-19 , Humanos , SARS-CoV-2 , Aprendizado de MáquinaRESUMO
Graph embedding techniques are using deep learning algorithms in data analysis to solve problems of such as node classification, link prediction, community detection, and visualization. Although typically used in the context of guessing friendships in social media, several applications for graph embedding techniques in biomedical data analysis have emerged. While these approaches remain computationally demanding, several developments over the last years facilitate their application to study biomedical data and thus may help advance biological discoveries. Therefore, in this review, we discuss the principles of graph embedding techniques and explore the usefulness for understanding biological network data derived from mass spectrometry and sequencing experiments, the current workhorses of systems biology studies. In particular, we focus on recent examples for characterizing protein-protein interaction networks and predicting novel drug functions.