Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
1.
Nucleic Acids Res ; 2024 Sep 13.
Article in English | MEDLINE | ID: mdl-39271119

ABSTRACT

The escalating costs and high failure rates have decelerated the pace of drug development, which amplifies the research interests in developing combinatorial/repurposed drugs and understanding off-target adverse drug reaction (ADR). In other words, it is demanded to delineate the molecular atlas and pharma-information for the combinatorial/repurposed drugs and off-target interactions. However, such invaluable data were inadequately covered by existing databases. In this study, a major update was thus conducted to the DrugMAP, which accumulated (a) 20831 combinatorial drugs and their interacting atlas involving 1583 pharmacologically important molecules; (b) 842 repurposed drugs and their interacting atlas with 795 molecules; (c) 3260 off-targets relevant to the ADRs of 2731 drugs and (d) various types of pharmaceutical information, including diverse ADMET properties, versatile diseases, and various ADRs/off-targets. With the growing demands for discovering combinatorial/repurposed therapies and the rapidly emerging interest in AI-based drug discovery, DrugMAP was highly expected to act as an indispensable supplement to existing databases facilitating drug discovery, which was accessible at: https://idrblab.org/drugmap/.

2.
Nucleic Acids Res ; 52(D1): D859-D870, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37855686

ABSTRACT

Large-scale studies of single-cell sequencing and biological experiments have successfully revealed expression patterns that distinguish different cell types in tissues, emphasizing the importance of studying cellular heterogeneity and accurately annotating cell types. Analysis of gene expression profiles in these experiments provides two essential types of data for cell type annotation: annotated references and canonical markers. In this study, the first comprehensive database of single-cell transcriptomic annotation resource (CellSTAR) was thus developed. It is unique in (a) offering the comprehensive expertly annotated reference data for annotating hundreds of cell types for the first time and (b) enabling the collective consideration of reference data and marker genes by incorporating tens of thousands of markers. Given its unique features, CellSTAR is expected to attract broad research interests from the technological innovations in single-cell transcriptomics, the studies of cellular heterogeneity & dynamics, and so on. It is now publicly accessible without any login requirement at: https://idrblab.org/cellstar.


Subject(s)
Databases, Factual , Gene Expression Profiling , Single-Cell Analysis , Transcriptome
3.
Nucleic Acids Res ; 2024 Oct 07.
Article in English | MEDLINE | ID: mdl-39373530

ABSTRACT

The measurement of cell-based molecular bioactivity (CMB) is critical for almost every step of drug development. With the booming application of AI in biomedicine, it is essential to have the CMB data to promote the learning of cell-based patterns for guiding modern drug discovery, but no database providing such information has been constructed yet. In this study, we introduce MolBiC, a knowledge base designed to describe valuable data on molecular bioactivity measured within a cellular context. MolBiC features 550 093 experimentally validated CMBs, encompassing 321 086 molecules and 2666 targets across 988 cell lines. Our MolBiC database is unique in describing the valuable data of CMB, which meets the critical demands for CMB-based big data promoting the learning of cell-based molecular/pharmaceutical pattern in drug discovery and development. MolBiC is now freely accessible without any login requirement at: https://idrblab.org/MolBiC/.

4.
Nucleic Acids Res ; 52(D1): D1450-D1464, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37850638

ABSTRACT

Distinct from the traditional diagnostic/prognostic biomarker (adopted as the indicator of disease state/process), the therapeutic biomarker (ThMAR) has emerged to be very crucial in the clinical development and clinical practice of all therapies. There are five types of ThMAR that have been found to play indispensable roles in various stages of drug discovery, such as: Pharmacodynamic Biomarker essential for guaranteeing the pharmacological effects of a therapy, Safety Biomarker critical for assessing the extent or likelihood of therapy-induced toxicity, Monitoring Biomarker indispensable for guiding clinical management by serially measuring patients' status, Predictive Biomarker crucial for maximizing the clinical outcome of a therapy for specific individuals, and Surrogate Endpoint fundamental for accelerating the approval of a therapy. However, these data of ThMARs has not been comprehensively described by any of the existing databases. Herein, a database, named 'TheMarker', was therefore constructed to (a) systematically offer all five types of ThMAR used at different stages of drug development, (b) comprehensively describe ThMAR information for the largest number of drugs among available databases, (c) extensively cover the widest disease classes by not just focusing on anticancer therapies. These data in TheMarker are expected to have great implication and significant impact on drug discovery and clinical practice, and it is freely accessible without any login requirement at: https://idrblab.org/themarker.


Subject(s)
Biomarkers , Databases, Factual , Humans , Drug Discovery , Therapeutics , Prognosis , Disease
5.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36631399

ABSTRACT

Due to its promising capacity in improving drug efficacy, polypharmacology has emerged to be a new theme in the drug discovery of complex disease. In the process of novel multi-target drugs (MTDs) discovery, in silico strategies come to be quite essential for the advantage of high throughput and low cost. However, current researchers mostly aim at typical closely related target pairs. Because of the intricate pathogenesis networks of complex diseases, many distantly related targets are found to play crucial role in synergistic treatment. Therefore, an innovational method to develop drugs which could simultaneously target distantly related target pairs is of utmost importance. At the same time, reducing the false discovery rate in the design of MTDs remains to be the daunting technological difficulty. In this research, effective small molecule clustering in the positive dataset, together with a putative negative dataset generation strategy, was adopted in the process of model constructions. Through comprehensive assessment on 10 target pairs with hierarchical similarity-levels, the proposed strategy turned out to reduce the false discovery rate successfully. Constructed model types with much smaller numbers of inhibitor molecules gained considerable yields and showed better false-hit controllability than before. To further evaluate the generalization ability, an in-depth assessment of high-throughput virtual screening on ChEMBL database was conducted. As a result, this novel strategy could hierarchically improve the enrichment factors for each target pair (especially for those distantly related/unrelated target pairs), corresponding to target pair similarity-levels.


Subject(s)
Drug Discovery , Polypharmacology , Drug Discovery/methods , High-Throughput Screening Assays
6.
Nucleic Acids Res ; 51(D1): D1263-D1275, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36243960

ABSTRACT

Widespread drug resistance has become the key issue in global healthcare. Extensive efforts have been made to reveal not only diverse diseases experiencing drug resistance, but also the six distinct types of molecular mechanisms underlying this resistance. A database that describes a comprehensive list of diseases with drug resistance (not just cancers/infections) and all types of resistance mechanisms is now urgently needed. However, no such database has been available to date. In this study, a comprehensive database describing drug resistance information named 'DRESIS' was therefore developed. It was introduced to (i) systematically provide, for the first time, all existing types of molecular mechanisms underlying drug resistance, (ii) extensively cover the widest range of diseases among all existing databases and (iii) explicitly describe the clinically/experimentally verified resistance data for the largest number of drugs. Since drug resistance has become an ever-increasing clinical issue, DRESIS is expected to have great implications for future new drug discovery and clinical treatment optimization. It is now publicly accessible without any login requirement at: https://idrblab.org/dresis/.


Subject(s)
Drug Discovery , Databases, Factual , Drug Resistance
7.
Nucleic Acids Res ; 51(D1): D1288-D1299, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36243961

ABSTRACT

The efficacy and safety of drugs are widely known to be determined by their interactions with multiple molecules of pharmacological importance, and it is therefore essential to systematically depict the molecular atlas and pharma-information of studied drugs. However, our understanding of such information is neither comprehensive nor precise, which necessitates the construction of a new database providing a network containing a large number of drugs and their interacting molecules. Here, a new database describing the molecular atlas and pharma-information of drugs (DrugMAP) was therefore constructed. It provides a comprehensive list of interacting molecules for >30 000 drugs/drug candidates, gives the differential expression patterns for >5000 interacting molecules among different disease sites, ADME (absorption, distribution, metabolism and excretion)-relevant organs and physiological tissues, and weaves a comprehensive and precise network containing >200 000 interactions among drugs and molecules. With the great efforts made to clarify the complex mechanism underlying drug pharmacokinetics and pharmacodynamics and rapidly emerging interests in artificial intelligence (AI)-based network analyses, DrugMAP is expected to become an indispensable supplement to existing databases to facilitate drug discovery. It is now fully and freely accessible at: https://idrblab.org/drugmap/.


Subject(s)
Artificial Intelligence , Drug Discovery , Databases, Factual , Pharmaceutical Preparations , Atlases as Topic
8.
Nucleic Acids Res ; 51(21): e110, 2023 Nov 27.
Article in English | MEDLINE | ID: mdl-37889083

ABSTRACT

RNAs play essential roles in diverse physiological and pathological processes by interacting with other molecules (RNA/protein/compound), and various computational methods are available for identifying these interactions. However, the encoding features provided by existing methods are limited and the existing tools does not offer an effective way to integrate the interacting partners. In this study, a task-specific encoding algorithm for RNAs and RNA-associated interactions was therefore developed. This new algorithm was unique in (a) realizing comprehensive RNA feature encoding by introducing a great many of novel features and (b) enabling task-specific integration of interacting partners using convolutional autoencoder-directed feature embedding. Compared with existing methods/tools, this novel algorithm demonstrated superior performances in diverse benchmark testing studies. This algorithm together with its source code could be readily accessed by all user at: https://idrblab.org/corain/ and https://github.com/idrblab/corain/.


Subject(s)
Computational Biology , RNA , RNA/genetics , Computational Biology/methods , Algorithms , Software
9.
Anal Chem ; 2024 Jul 16.
Article in English | MEDLINE | ID: mdl-39011990

ABSTRACT

Analyzing drug-related interactions in the field of biomedicine has been a critical aspect of drug discovery and development. While various artificial intelligence (AI)-based tools have been proposed to analyze drug biomedical associations (DBAs), their feature encoding did not adequately account for crucial biomedical functions and semantic concepts, thereby still hindering their progress. Since the advent of ChatGPT by OpenAI in 2022, large language models (LLMs) have demonstrated rapid growth and significant success across various applications. Herein, LEDAP was introduced, which uniquely leveraged LLM-based biotext feature encoding for predicting drug-disease associations, drug-drug interactions, and drug-side effect associations. Benefiting from the large-scale knowledgebase pre-training, LLMs had great potential in drug development analysis owing to their holistic understanding of natural language and human topics. LEDAP illustrated its notable competitiveness in comparison with other popular DBA analysis tools. Specifically, even in simple conjunction with classical machine learning methods, LLM-based feature representations consistently enabled satisfactory performance across diverse DBA tasks like binary classification, multiclass classification, and regression. Our findings underpinned the considerable potential of LLMs in drug development research, indicating a catalyst for further progress in related fields.

10.
Anal Chem ; 96(12): 4745-4755, 2024 03 26.
Article in English | MEDLINE | ID: mdl-38417094

ABSTRACT

Despite the well-established connection between systematic metabolic abnormalities and the pathophysiology of pituitary adenoma (PA), current metabolomic studies have reported an extremely limited number of metabolites associated with PA. Moreover, there was very little consistency in the identified metabolite signatures, resulting in a lack of robust metabolic biomarkers for the diagnosis and treatment of PA. Herein, we performed a global untargeted plasma metabolomic profiling on PA and identified a highly robust metabolomic signature based on a strategy. Specifically, this strategy is unique in (1) integrating repeated random sampling and a consensus evaluation-based feature selection algorithm and (2) evaluating the consistency of metabolomic signatures among different sample groups. This strategy demonstrated superior robustness and stronger discriminative ability compared with that of other feature selection methods including Student's t-test, partial least-squares-discriminant analysis, support vector machine recursive feature elimination, and random forest recursive feature elimination. More importantly, a highly robust metabolomic signature comprising 45 PA-specific differential metabolites was identified. Moreover, metabolite set enrichment analysis of these potential metabolic biomarkers revealed altered lipid metabolism in PA. In conclusion, our findings contribute to a better understanding of the metabolic changes in PA and may have implications for the development of diagnostic and therapeutic approaches targeting lipid metabolism in PA. We believe that the proposed strategy serves as a valuable tool for screening robust, discriminating metabolic features in the field of metabolomics.


Subject(s)
Lipid Metabolism , Pituitary Neoplasms , Humans , Pituitary Neoplasms/diagnosis , Metabolomics/methods , Discriminant Analysis , Biomarkers
11.
Brief Bioinform ; 23(6)2022 11 19.
Article in English | MEDLINE | ID: mdl-36198065

ABSTRACT

In recent years, many studies have illustrated the significant role that non-coding RNA (ncRNA) plays in biological activities, in which lncRNA, miRNA and especially their interactions have been proved to affect many biological processes. Some in silico methods have been proposed and applied to identify novel lncRNA-miRNA interactions (LMIs), but there are still imperfections in their RNA representation and information extraction approaches, which imply there is still room for further improving their performances. Meanwhile, only a few of them are accessible at present, which limits their practical applications. The construction of a new tool for LMI prediction is thus imperative for the better understanding of their relevant biological mechanisms. This study proposed a novel method, ncRNAInter, for LMI prediction. A comprehensive strategy for RNA representation and an optimized deep learning algorithm of graph neural network were utilized in this study. ncRNAInter was robust and showed better performance of 26.7% higher Matthews correlation coefficient than existing reputable methods for human LMI prediction. In addition, ncRNAInter proved its universal applicability in dealing with LMIs from various species and successfully identified novel LMIs associated with various diseases, which further verified its effectiveness and usability. All source code and datasets are freely available at https://github.com/idrblab/ncRNAInter.


Subject(s)
MicroRNAs , RNA, Long Noncoding , Humans , RNA, Long Noncoding/genetics , MicroRNAs/genetics , Neural Networks, Computer , Software , Algorithms
12.
Brief Bioinform ; 23(5)2022 09 20.
Article in English | MEDLINE | ID: mdl-35524477

ABSTRACT

In a drug formulation (DFM), the major components by mass are not Active Pharmaceutical Ingredient (API) but rather Drug Inactive Ingredients (DIGs). DIGs can reach much higher concentrations than that achieved by API, which raises great concerns about their clinical toxicities. Therefore, the biological activities of DIG on physiologically relevant target are widely demanded by both clinical investigation and pharmaceutical industry. However, such activity data are not available in any existing pharmaceutical knowledge base, and their potentials in predicting the DIG-target interaction have not been evaluated yet. In this study, the comprehensive assessment and analysis on the biological activities of DIGs were therefore conducted. First, the largest number of DIGs and DFMs were systematically curated and confirmed based on all drugs approved by US Food and Drug Administration. Second, comprehensive activities for both DIGs and DFMs were provided for the first time to pharmaceutical community. Third, the biological targets of each DIG and formulation were fully referenced to available databases that described their pharmaceutical/biological characteristics. Finally, a variety of popular artificial intelligence techniques were used to assess the predictive potential of DIGs' activity data, which was the first evaluation on the possibility to predict DIG's activity. As the activities of DIGs are critical for current pharmaceutical studies, this work is expected to have significant implications for the future practice of drug discovery and precision medicine.


Subject(s)
Artificial Intelligence , Databases, Factual , Pharmaceutical Preparations , United States , United States Food and Drug Administration
13.
J Chem Inf Model ; 64(7): 2720-2732, 2024 04 08.
Article in English | MEDLINE | ID: mdl-38373720

ABSTRACT

In the context of precision medicine, multiomics data integration provides a comprehensive understanding of underlying biological processes and is critical for disease diagnosis and biomarker discovery. One commonly used integration method is early integration through concatenation of multiple dimensionally reduced omics matrices due to its simplicity and ease of implementation. However, this approach is seriously limited by information loss and lack of latent feature interaction. Herein, a novel multiomics early integration framework (MOINER) based on information enhancement and image representation learning is thus presented to address the challenges. MOINER employs the self-attention mechanism to capture the intrinsic correlations of omics-features, which make it significantly outperform the existing state-of-the-art methods for multiomics data integration. Moreover, visualizing the attention embedding and identifying potential biomarkers offer interpretable insights into the prediction results. All source codes and model for MOINER are freely available https://github.com/idrblab/MOINER.


Subject(s)
Learning , Multiomics , Software
14.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32510556

ABSTRACT

Metaproteomics suffers from the issues of dimensionality and sparsity. Data reduction methods can maximally identify the relevant subset of significant differential features and reduce data redundancy. Feature selection (FS) methods were applied to obtain the significant differential subset. So far, a variety of feature selection methods have been developed for metaproteomic study. However, due to FS's performance depended heavily on the data characteristics of a given research, the well-suitable feature selection method must be carefully selected to obtain the reproducible differential proteins. Moreover, it is critical to evaluate the performance of each FS method according to comprehensive criteria, because the single criterion is not sufficient to reflect the overall performance of the FS method. Therefore, we developed an online tool named MetaFS, which provided 13 types of FS methods and conducted the comprehensive evaluation on the complex FS methods using four widely accepted and independent criteria. Furthermore, the function and reliability of MetaFS were systematically tested and validated via two case studies. In sum, MetaFS could be a distinguished tool for discovering the overall well-performed FS method for selecting the potential biomarkers in microbiome studies. The online tool is freely available at https://idrblab.org/metafs/.


Subject(s)
Databases, Protein , Microbiota , Proteomics , Software , Biomarkers/metabolism , Humans
15.
Nucleic Acids Res ; 49(D1): D715-D722, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33045729

ABSTRACT

Besides the environmental factors having tremendous impacts on the composition of microbial community, the host factors have recently gained extensive attentions on their roles in shaping human microbiota. There are two major types of host factors: host genetic factors (HGFs) and host immune factors (HIFs). These factors of each type are essential for defining the chemical and physical landscapes inhabited by microbiota, and the collective consideration of both types have great implication to serve comprehensive health management. However, no database was available to provide the comprehensive factors of both types. Herein, a database entitled 'Host Genetic and Immune Factors Shaping Human Microbiota (GIMICA)' was constructed. Based on the 4257 microbes confirmed to inhabit nine sites of human body, 2851 HGFs (1368 single nucleotide polymorphisms (SNPs), 186 copy number variations (CNVs), and 1297 non-coding ribonucleic acids (RNAs)) modulating the expression of 370 microbes were collected, and 549 HIFs (126 lymphocytes and phagocytes, 387 immune proteins, and 36 immune pathways) regulating the abundance of 455 microbes were also provided. All in all, GIMICA enabled the collective consideration not only between different types of host factor but also between the host and environmental ones, which is freely accessible without login requirement at: https://idrblab.org/gimica/.


Subject(s)
Immunologic Factors/genetics , Microbiota/genetics , Software , Humans , Information Storage and Retrieval , Reference Standards
16.
Nucleic Acids Res ; 49(D1): D1233-D1243, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33045737

ABSTRACT

Drug-metabolizing enzymes (DMEs) are critical determinant of drug safety and efficacy, and the interactome of DMEs has attracted extensive attention. There are 3 major interaction types in an interactome: microbiome-DME interaction (MICBIO), xenobiotics-DME interaction (XEOTIC) and host protein-DME interaction (HOSPPI). The interaction data of each type are essential for drug metabolism, and the collective consideration of multiple types has implication for the future practice of precision medicine. However, no database was designed to systematically provide the data of all types of DME interactions. Here, a database of the Interactome of Drug-Metabolizing Enzymes (INTEDE) was therefore constructed to offer these interaction data. First, 1047 unique DMEs (448 host and 599 microbial) were confirmed, for the first time, using their metabolizing drugs. Second, for these newly confirmed DMEs, all types of their interactions (3359 MICBIOs between 225 microbial species and 185 DMEs; 47 778 XEOTICs between 4150 xenobiotics and 501 DMEs; 7849 HOSPPIs between 565 human proteins and 566 DMEs) were comprehensively collected and then provided, which enabled the crosstalk analysis among multiple types. Because of the huge amount of accumulated data, the INTEDE made it possible to generalize key features for revealing disease etiology and optimizing clinical treatment. INTEDE is freely accessible at: https://idrblab.org/intede/.


Subject(s)
Databases, Factual , Drugs, Investigational/metabolism , Enzymes/metabolism , Inactivation, Metabolic/genetics , Prescription Drugs/metabolism , Protein Processing, Post-Translational , Xenobiotics/metabolism , Bacteria/enzymology , DNA Methylation , Enzymes/classification , Fungi/enzymology , Histones/genetics , Histones/metabolism , Humans , Internet , Metabolic Clearance Rate , Microbiota/genetics , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Software
17.
Brief Bioinform ; 21(5): 1825-1836, 2020 09 25.
Article in English | MEDLINE | ID: mdl-31860715

ABSTRACT

The type IV bacterial secretion system (SS) is reported to be one of the most ubiquitous SSs in nature and can induce serious conditions by secreting type IV SS effectors (T4SEs) into the host cells. Recent studies mainly focus on annotating new T4SE from the huge amount of sequencing data, and various computational tools are therefore developed to accelerate T4SE annotation. However, these tools are reported as heavily dependent on the selected methods and their annotation performance need to be further enhanced. Herein, a convolution neural network (CNN) technique was used to annotate T4SEs by integrating multiple protein encoding strategies. First, the annotation accuracies of nine encoding strategies integrated with CNN were assessed and compared with that of the popular T4SE annotation tools based on independent benchmark. Second, false discovery rates of various models were systematically evaluated by (1) scanning the genome of Legionella pneumophila subsp. ATCC 33152 and (2) predicting the real-world non-T4SEs validated using published experiments. Based on the above analyses, the encoding strategies, (a) position-specific scoring matrix (PSSM), (b) protein secondary structure & solvent accessibility (PSSSA) and (c) one-hot encoding scheme (Onehot), were identified as well-performing when integrated with CNN. Finally, a novel strategy that collectively considers the three well-performing models (CNN-PSSM, CNN-PSSSA and CNN-Onehot) was proposed, and a new tool (CNN-T4SE, https://idrblab.org/cnnt4se/) was constructed to facilitate T4SE annotation. All in all, this study conducted a comprehensive analysis on the performance of a collection of encoding strategies when integrated with CNN, which could facilitate the suppression of T4SS in infection and limit the spread of antimicrobial resistance.


Subject(s)
Neural Networks, Computer , Type IV Secretion Systems , Algorithms , Position-Specific Scoring Matrices
18.
J Chem Inf Model ; 62(23): 5875-5895, 2022 Dec 12.
Article in English | MEDLINE | ID: mdl-36378082

ABSTRACT

Spatial proteomics is an interdisciplinary field that investigates the localization and dynamics of proteins, and it has gained extensive attention in recent years, especially the subcellular proteomics. Numerous evidence indicate that the subcellular localization of proteins is associated with various cellular processes and disease progression. Mass spectrometry (MS)-based and imaging-based experimental approaches have been developed to acquire large-scale spatial proteomic data. To allow the reliable analysis of increasingly complex spatial proteomics data, machine learning (ML) methods have been widely used in both MS-based and imaging-based spatial proteomic data analysis pipelines. Here, we comprehensively survey the applications of ML in spatial proteomics from following aspects: (1) data resources for spatial proteome are comprehensively introduced; (2) the roles of different ML algorithms in data analysis pipelines are elaborated; (3) successful applications of spatial proteomics and several analytical tools integrating ML methods are presented; (4) challenges existing in modern ML-based spatial proteomics studies are discussed. This review provides guidelines for researchers seeking to apply ML methods to analyze spatial proteomic data and can facilitate insightful understanding of cell biology as well as the future research in medical and drug discovery communities.


Subject(s)
Proteome , Proteomics , Proteomics/methods , Proteome/metabolism , Mass Spectrometry/methods , Machine Learning , Algorithms
19.
Comput Biol Med ; 169: 107811, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38168647

ABSTRACT

Graph Neural Networks (GNNs) have gained significant traction in various sectors of AI-driven drug design. Over recent years, the integration of fragmentation concepts into GNNs has emerged as a potent strategy to augment the efficacy of molecular generative models. Nonetheless, challenges such as symmetry breaking and potential misrepresentation of intricate cycles and undefined functional groups raise questions about the superiority of fragment-based graph representation over traditional methods. In our research, we undertook a rigorous evaluation, contrasting the predictive prowess of eight models-developed using deep learning algorithms-across 12 benchmark datasets that span a range of properties. These models encompass established methods like GCN, AttentiveFP, and D-MPNN, as well as innovative fragment-based representation techniques. Our results indicate that fragment-based methodologies, notably PharmHGT, significantly improve model performance and interpretability, particularly in scenarios characterized by limited data availability. However, in situations with extensive training, fragment-based molecular graph representations may not necessarily eclipse traditional methods. In summation, we posit that the integration of fragmentation, as an avant-garde technique in drug design, harbors considerable promise for the future of AI-enhanced drug design.


Subject(s)
Algorithms , Benchmarking , Drug Design , Models, Molecular , Neural Networks, Computer
20.
Genome Biol ; 25(1): 41, 2024 02 01.
Article in English | MEDLINE | ID: mdl-38303023

ABSTRACT

Protein function annotation has been one of the longstanding issues in biological sciences, and various computational methods have been developed. However, the existing methods suffer from a serious long-tail problem, with a large number of GO families containing few annotated proteins. Herein, an innovative strategy named AnnoPRO was therefore constructed by enabling sequence-based multi-scale protein representation, dual-path protein encoding using pre-training, and function annotation by long short-term memory-based decoding. A variety of case studies based on different benchmarks were conducted, which confirmed the superior performance of AnnoPRO among available methods. Source code and models have been made freely available at: https://github.com/idrblab/AnnoPRO and https://zenodo.org/records/10012272.


Subject(s)
Deep Learning , Humans , Computational Biology/methods , Proteins/metabolism , Software , Molecular Sequence Annotation
SELECTION OF CITATIONS
SEARCH DETAIL