Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 339
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38517693

RESUMEN

Numerous investigations increasingly indicate the significance of microRNA (miRNA) in human diseases. Hence, unearthing associations between miRNA and diseases can contribute to precise diagnosis and efficacious remediation of medical conditions. The detection of miRNA-disease linkages via computational techniques utilizing biological information has emerged as a cost-effective and highly efficient approach. Here, we introduced a computational framework named ReHoGCNES, designed for prospective miRNA-disease association prediction (ReHoGCNES-MDA). This method constructs homogenous graph convolutional network with regular graph structure (ReHoGCN) encompassing disease similarity network, miRNA similarity network and known MDA network and then was tested on four experimental tasks. A random edge sampler strategy was utilized to expedite processes and diminish training complexity. Experimental results demonstrate that the proposed ReHoGCNES-MDA method outperforms both homogenous graph convolutional network and heterogeneous graph convolutional network with non-regular graph structure in all four tasks, which implicitly reveals steadily degree distribution of a graph does play an important role in enhancement of model performance. Besides, ReHoGCNES-MDA is superior to several machine learning algorithms and state-of-the-art methods on the MDA prediction. Furthermore, three case studies were conducted to further demonstrate the predictive ability of ReHoGCNES. Consequently, 93.3% (breast neoplasms), 90% (prostate neoplasms) and 93.3% (prostate neoplasms) of the top 30 forecasted miRNAs were validated by public databases. Hence, ReHoGCNES-MDA might serve as a dependable and beneficial model for predicting possible MDAs.


Asunto(s)
MicroARNs , Neoplasias de la Próstata , Humanos , Masculino , Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas , MicroARNs/genética , Estudios Prospectivos , Neoplasias de la Próstata/genética , Femenino
2.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36631407

RESUMEN

Recently, peptide-based drugs have gained unprecedented interest in discovering and developing antifungal drugs due to their high efficacy, broad-spectrum activity, low toxicity and few side effects. However, it is time-consuming and expensive to identify antifungal peptides (AFPs) experimentally. Therefore, computational methods for accurately predicting AFPs are highly required. In this work, we develop AFP-MFL, a novel deep learning model that predicts AFPs only relying on peptide sequences without using any structural information. AFP-MFL first constructs comprehensive feature profiles of AFPs, including contextual semantic information derived from a pre-trained protein language model, evolutionary information, and physicochemical properties. Subsequently, the co-attention mechanism is utilized to integrate contextual semantic information with evolutionary information and physicochemical properties separately. Extensive experiments show that AFP-MFL outperforms state-of-the-art models on four independent test datasets. Furthermore, the SHAP method is employed to explore each feature contribution to the AFPs prediction. Finally, a user-friendly web server of the proposed AFP-MFL is developed and freely accessible at http://inner.wei-group.net/AFPMFL/, which can be considered as a powerful tool for the rapid screening and identification of novel AFPs.


Asunto(s)
Antifúngicos , alfa-Fetoproteínas , Antifúngicos/farmacología , Algoritmos , Péptidos/química , Biología Computacional/métodos
3.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36702755

RESUMEN

Due to the high heterogeneity and complexity of cancers, patients with different cancer subtypes often have distinct groups of genomic and clinical characteristics. Therefore, the discovery and identification of cancer subtypes are crucial to cancer diagnosis, prognosis and treatment. Recent technological advances have accelerated the increasing availability of multi-omics data for cancer subtyping. To take advantage of the complementary information from multi-omics data, it is necessary to develop computational models that can represent and integrate different layers of data into a single framework. Here, we propose a decoupled contrastive clustering method (Subtype-DCC) based on multi-omics data integration for clustering to identify cancer subtypes. The idea of contrastive learning is introduced into deep clustering based on deep neural networks to learn clustering-friendly representations. Experimental results demonstrate the superior performance of the proposed Subtype-DCC model in identifying cancer subtypes over the currently available state-of-the-art clustering methods. The strength of Subtype-DCC is also supported by the survival and clinical analysis.


Asunto(s)
Multiómica , Neoplasias , Humanos , Algoritmos , Genómica/métodos , Neoplasias/genética , Análisis por Conglomerados , Receptor DCC
4.
PLoS Comput Biol ; 20(2): e1011935, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38416785

RESUMEN

Spatial transcriptomic (ST) clustering employs spatial and transcription information to group spots spatially coherent and transcriptionally similar together into the same spatial domain. Graph convolution network (GCN) and graph attention network (GAT), fed with spatial coordinates derived adjacency and transcription profile derived feature matrix are often used to solve the problem. Our proposed method STGIC (spatial transcriptomic clustering with graph and image convolution) is designed for techniques with regular lattices on chips. It utilizes an adaptive graph convolution (AGC) to get high quality pseudo-labels and then resorts to dilated convolution framework (DCF) for virtual image converted from gene expression information and spatial coordinates of spots. The dilation rates and kernel sizes are set appropriately and updating of weight values in the kernels is made to be subject to the spatial distance from the position of corresponding elements to kernel centers so that feature extraction of each spot is better guided by spatial distance to neighbor spots. Self-supervision realized by Kullback-Leibler (KL) divergence, spatial continuity loss and cross entropy calculated among spots with high confidence pseudo-labels make up the training objective of DCF. STGIC attains state-of-the-art (SOTA) clustering performance on the benchmark dataset of 10x Visium human dorsolateral prefrontal cortex (DLPFC). Besides, it's capable of depicting fine structures of other tissues from other species as well as guiding the identification of marker genes. Also, STGIC is expandable to Stereo-seq data with high spatial resolution.


Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Humanos , Transcriptoma/genética , Benchmarking , Análisis por Conglomerados , Entropía
5.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35180781

RESUMEN

Although there are a large number of structural variations in the chromosomes of each individual, there is a lack of more accurate methods for identifying clinical pathogenic variants. Here, we proposed SVPath, a machine learning-based method to predict the pathogenicity of deletions, insertions and duplications structural variations that occur in exons. We constructed three types of annotation features for each structural variation event in the ClinVar database. First, we treated complex structural variations as multiple consecutive single nucleotide polymorphisms events, and annotated them with correlation scores based on single nucleic acid substitutions, such as the impact on protein function. Second, we determined which genes the variation occurred in, and constructed gene-based annotation features for each structural variation. Third, we also calculated related features based on the transcriptome, such as histone signal, the overlap ratio of variation and genomic element definitions, etc. Finally, we employed a gradient boosting decision tree machine learning method, and used the deletions, insertions and duplications in the ClinVar database to train a structural variation pathogenicity prediction model SVPath. These structural variations are clearly indicated as pathogenic or benign. Experimental results show that our SVPath has achieved excellent predictive performance and outperforms existing state-of-the-art tools. SVPath is very promising in evaluating the clinical pathogenicity of structural variants. SVPath can be used in clinical research to predict the clinical significance of unknown pathogenicity and new structural variation, so as to explore the relationship between diseases and structural variations in a computational way.


Asunto(s)
Aprendizaje Automático , Polimorfismo de Nucleótido Simple , Exones , Humanos , Anotación de Secuencia Molecular , Virulencia
6.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34671814

RESUMEN

One of the main problems with the joint use of multiple drugs is that it may cause adverse drug interactions and side effects that damage the body. Therefore, it is important to predict potential drug interactions. However, most of the available prediction methods can only predict whether two drugs interact or not, whereas few methods can predict interaction events between two drugs. Accurately predicting interaction events of two drugs is more useful for researchers to study the mechanism of the interaction of two drugs. In the present study, we propose a novel method, MDF-SA-DDI, which predicts drug-drug interaction (DDI) events based on multi-source drug fusion, multi-source feature fusion and transformer self-attention mechanism. MDF-SA-DDI is mainly composed of two parts: multi-source drug fusion and multi-source feature fusion. First, we combine two drugs in four different ways and input the combined drug feature representation into four different drug fusion networks (Siamese network, convolutional neural network and two auto-encoders) to obtain the latent feature vectors of the drug pairs, in which the two auto-encoders have the same structure, and their main difference is the number of neurons in the input layer of the two auto-encoders. Then, we use transformer blocks that include self-attention mechanism to perform latent feature fusion. We conducted experiments on three different tasks with two datasets. On the small dataset, the area under the precision-recall-curve (AUPR) and F1 scores of our method on task 1 reached 0.9737 and 0.8878, respectively, which were better than the state-of-the-art method. On the large dataset, the AUPR and F1 scores of our method on task 1 reached 0.9773 and 0.9117, respectively. In task 2 and task 3 of two datasets, our method also achieved the same or better performance as the state-of-the-art method. More importantly, the case studies on five DDI events are conducted and achieved satisfactory performance. The source codes and data are available at https://github.com/ShenggengLin/MDF-SA-DDI.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Redes Neurales de la Computación , Interacciones Farmacológicas , Humanos , Oligosacáridos , Programas Informáticos
7.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-36027578

RESUMEN

Anatomical Therapeutic Chemical (ATC) classification for compounds/drugs plays an important role in drug development and basic research. However, previous methods depend on interactions extracted from STITCH dataset which may make it depend on lab experiments. We present a pilot study to explore the possibility of conducting the ATC prediction solely based on the molecular structures. The motivation is to eliminate the reliance on the costly lab experiments so that the characteristics of a drug can be pre-assessed for better decision-making and effort-saving before the actual development. To this end, we construct a new benchmark consisting of 4545 compounds which is with larger scale than the one used in previous study. A light-weight prediction model is proposed. The model is with better explainability in the sense that it is consists of a straightforward tokenization that extracts and embeds statistically and physicochemically meaningful tokens, and a deep network backed by a set of pyramid kernels to capture multi-resolution chemical structural characteristics. Its efficacy has been validated in the experiments where it outperforms the state-of-the-art methods by 15.53% in accuracy and by 69.66% in terms of efficiency. We make the benchmark dataset, source code and web server open to ease the reproduction of this study.


Asunto(s)
Benchmarking , Programas Informáticos , Proyectos Piloto
8.
Bioinformatics ; 39(12)2023 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-38015872

RESUMEN

MOTIVATION: Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. RESULTS: In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. AVAILABILITY AND IMPLEMENTATION: The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.


Asunto(s)
Péptidos , Proteínas , Unión Proteica , Proteínas/química , Sitios de Unión , Programas Informáticos
9.
Microb Pathog ; 189: 106572, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38354987

RESUMEN

The JCV (John Cunningham Virus) is known to cause progressive multifocal leukoencephalopathy, a condition that results in the formation of tumors. Symptoms of this condition such as sensory defects, cognitive dysfunction, muscle weakness, homonosapobia, difficulties with coordination, and aphasia. To date, there is no specific and effective treatment to completely cure or prevent John Cunningham polyomavirus infections. Since the best way to control the disease is vaccination. In this study, the immunoinformatic tools were used to predict the high immunogenic and non-allergenic B cells, helper T cells (HTL), and cytotoxic T cells (CTL) epitopes from capsid, major capsid, and T antigen proteins of JC virus to design the highly efficient subunit vaccines. The specific immunogenic linkers were used to link together the predicted epitopes and subjected to 3D modeling by using the Robetta server. MD simulation was used to confirm that the newly constructed vaccines are stable and properly fold. Additionally, the molecular docking approach revealed that the vaccines have a strong binding affinity with human TLR-7. The codon adaptation index (CAI) and GC content values verified that the constructed vaccines would be highly expressed in E. coli pET28a (+) plasmid. The immune simulation analysis indicated that the human immune system would have a strong response to the vaccines, with a high titer of IgM and IgG antibodies being produced. In conclusion, this study will provide a pre-clinical concept to construct an effective, highly antigenic, non-allergenic, and thermostable vaccine to combat the infection of the John Cunningham virus.


Asunto(s)
Virus JC , Vacunas , Humanos , Epítopos/genética , Simulación del Acoplamiento Molecular , Escherichia coli , Vacunología , Vacunas de Subunidad/genética , Epítopos de Linfocito T/genética , Biología Computacional , Epítopos de Linfocito B , Simulación de Dinámica Molecular
10.
Methods ; 220: 1-10, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37858611

RESUMEN

The joint use of multiple drugs can result in adverse drug-drug interactions (DDIs) and side effects that harm the body. Accurate identification of DDIs is crucial for avoiding accidental drug side effects and understanding potential mechanisms underlying DDIs. Several computational methods have been proposed for multi-type DDI prediction, but most rely on the similarity profiles of drugs as the drug feature vectors, which may result in information leakage and overoptimistic performance when predicting interactions between new drugs. To address this issue, we propose a novel method, MATT-DDI, for predicting multi-type DDIs based on the original feature vectors of drugs and multiple attention mechanisms. MATT-DDI consists of three main modules: the top k most similar drug pair selection module, heterogeneous attention mechanism module and multi­type DDI prediction module. Firstly, based on the feature vector of the input drug pair (IDP), k drug pairs that are most similar to the input drug pair from the training dataset are selected according to cosine similarity between drug pairs. Then, the vectors of k selected drug pairs are averaged to obtain a new drug pair (NDP). Next, IDP and NDP are fed into heterogeneous attention modules, including scaled dot product attention and bilinear attention, to extract latent feature vectors. Finally, these latent feature vectors are taken as input of the classification module to predict DDI types. We evaluated MATT-DDI on three different tasks. The experimental results show that MATT-DDI provides better or comparable performance compared to several state-of-the-art methods, and its feasibility is supported by case studies. MATT-DDI is a robust model for predicting multi-type DDIs with excellent performance and no information leakage.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Interacciones Farmacológicas
11.
Biotechnol Appl Biochem ; 71(2): 402-413, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38287712

RESUMEN

Malonyl-CoA serves as the main building block for the biosynthesis of many important polyketides, as well as fatty acid-derived compounds, such as biofuel. Escherichia coli, Corynebacterium gultamicum, and Saccharomyces cerevisiae have recently been engineered for the biosynthesis of such compounds. However, the developed processes and strains often have insufficient productivity. In the current study, we used enzyme-engineering approach to improve the binding of acetyl-CoA with ACC. We generated different mutations, and the impact was calculated, which reported that three mutations, that is, S343A, T347W, and S350W, significantly improve the substrate binding. Molecular docking investigation revealed an altered binding network compared to the wild type. In mutants, additional interactions stabilize the binding of the inner tail of acetyl-CoA. Using molecular simulation, the stability, compactness, hydrogen bonding, and protein motions were estimated, revealing different dynamic properties owned by the mutants only but not by the wild type. The findings were further validated by using the binding-free energy (BFE) method, which revealed these mutations as favorable substitutions. The total BFE was reported to be -52.66 ± 0.11 kcal/mol for the wild type, -55.87 ± 0.16 kcal/mol for the S343A mutant, -60.52 ± 0.25 kcal/mol for T347W mutant, and -59.64 ± 0.25 kcal/mol for the S350W mutant. This shows that the binding of the substrate is increased due to the induced mutations and strongly corroborates with the docking results. In sum, this study provides information regarding the essential hotspot residues for the substrate binding and can be used for application in industrial processes.


Asunto(s)
Acetil-CoA Carboxilasa , Streptomyces antibioticus , Acetil-CoA Carboxilasa/genética , Acetil-CoA Carboxilasa/metabolismo , Streptomyces antibioticus/metabolismo , Acetilcoenzima A/genética , Simulación del Acoplamiento Molecular , Mutación , Saccharomyces cerevisiae/metabolismo , Escherichia coli/metabolismo
12.
BMC Genomics ; 24(1): 661, 2023 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-37919660

RESUMEN

Microproteins, prevalent across all kingdoms of life, play a crucial role in cell physiology and human health. Although global gene transcription is widely explored and abundantly available, our understanding of microprotein functions using transcriptome data is still limited. To mitigate this problem, we present a database, Mip-mining ( https://weilab.sjtu.edu.cn/mipmining/ ), underpinned by high-quality RNA-sequencing data exclusively aimed at analyzing microprotein functions. The Mip-mining hosts 336 sets of high-quality transcriptome data from 8626 samples and nine representative living organisms, including microorganisms, plants, animals, and humans, in our Mip-mining database. Our database specifically provides a focus on a range of diseases and environmental stress conditions, taking into account chemical, physical, biological, and diseases-related stresses. Comparatively, our platform enables customized analysis by inputting desired data sets with self-determined cutoff values. The practicality of Mip-mining is demonstrated by identifying essential microproteins in different species and revealing the importance of ATP15 in the acetic acid stress tolerance of budding yeast. We believe that Mip-mining will facilitate a greater understanding and application of microproteins in biotechnology. Moreover, it will be beneficial for designing therapeutic strategies under various biological conditions.


Asunto(s)
Biotecnología , Transcriptoma , Animales , Humanos , Análisis de Secuencia de ARN , Micropéptidos
13.
Funct Integr Genomics ; 23(2): 94, 2023 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-36943579

RESUMEN

Breast cancer is one of the leading causes of death in women worldwide. Initially, it develops in the epithelium of the ducts or lobules of the breast glandular tissues with limited growth and the potential to metastasize. It is a highly heterogeneous malignancy; however, the common molecular mechanisms could help identify new targeted drugs for treating its subtypes. This study uses computational drug repositioning approaches to explore fresh drug candidates for breast cancer treatment. We also implemented reversal gene expression and gene expression-based signatures to explore novel drug candidates computationally. The drug activity profiles and related gene expression changes were acquired from the DrugBank, PubChem, and LINCS databases, and then in silico drug screening, molecular dynamics (MD) simulation, replica exchange MD simulations, and simulated annealing molecular dynamics (SAMD) simulations were conducted to discover and verify the valid drug candidates. We have found that compounds like furosemide, gold, and dopamine showed significant outcomes. Furthermore, the expression of genes related to breast cancer was observed to be reversed by these shortlisted drugs. Therefore, we postulate that combining furosemide, gold, and dopamine would be a potential combination therapy measurement for breast cancer patients.


Asunto(s)
Neoplasias de la Mama , Humanos , Femenino , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Dopamina/uso terapéutico , Furosemida/farmacología , Furosemida/uso terapéutico , Oro/uso terapéutico , Transcriptoma
14.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32520339

RESUMEN

The long non-coding RNAs (lncRNAs) are subject of intensive recent studies due to its association with various human diseases. It is desirable to build the artificial intelligence-based models for prediction of diseases or tissues based on the lncRNAs data, which will be useful in disease diagnosis and therapy. The accuracy and robustness of existing models based on the machine learning techniques are subject to further improvement. In this study, we propose a deep learning model, called Multi-Label Classifications with Deep Forest, termed MLCDForest, to address multi-label classification on tissue prediction for a given lncRNA, which can be regarded as an implementation of the deep forest model in multi-label classification. The MLCDForest is a sequential multi-label-grained scanning method, which distinguishes from the standard deep forest model. It is proposed to train in sequential of multi-labels with label correlation considered. A systematic comparison using the lncRNA-disease association datasets demonstrates that our method consistently shows superior performance over the state-of-the-art methods in disease prediction. Considering label correlation in the sequential multi-label-grained scanning, our model provides a powerful tool to make multi-label classification and tissue prediction based on given lncRNAs.


Asunto(s)
Biología Computacional , Aprendizaje Profundo , Enfermedad/genética , Modelos Genéticos , ARN Largo no Codificante/genética , Humanos
15.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32743640

RESUMEN

BACKGROUND: The most frequently mutated gene pairs in pancreatic adenocarcinoma (PAAD) are KRAS and TP53, and our goal is to illustrate the multiomics and molecular dynamics landscapes of KRAS/TP53 mutation and also to obtain prospective novel drugs for KRAS- and TP53-mutated PAAD patients. Moreover, we also made an attempt to discover the probable link amid KRAS and TP53 on the basis of the abovementioned multiomics data. METHOD: We utilized TCGA & Cancer Cell Line Encyclopedia data for the analysis of KRAS/TP53 mutation in a multiomics manner. In addition to that, we performed molecular dynamics analysis of KRAS and TP53 to produce mechanistic descriptions of particular mutations and carcinogenesis. RESULT: We discover that there is a significant difference in the genomics, transcriptomics, methylomics, and molecular dynamics pattern of KRAS and TP53 mutation from the matching wild type in PAAD, and the prognosis of pancreatic cancer is directly linked with a particular mutation of KRAS and protein stability. Screened drugs are potentially effective in PAAD patients. CONCLUSIONS: KRAS and TP53 prognosis of PAAD is directly associated with a specific mutation of KRAS. Irinotecan and vandetanib are prospective drugs for PAAD patients with KRASG12Dmutation and TP53 mutation.


Asunto(s)
Adenocarcinoma , Protocolos de Quimioterapia Combinada Antineoplásica/administración & dosificación , Mutación , Neoplasias Pancreáticas , Proteínas Proto-Oncogénicas p21(ras)/genética , Proteína p53 Supresora de Tumor/genética , Adenocarcinoma/tratamiento farmacológico , Adenocarcinoma/genética , Adenocarcinoma/mortalidad , Supervivencia sin Enfermedad , Sinergismo Farmacológico , Femenino , Humanos , Irinotecán/administración & dosificación , Irinotecán/agonistas , Masculino , Neoplasias Pancreáticas/tratamiento farmacológico , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/mortalidad , Piperidinas/administración & dosificación , Piperidinas/agonistas , Quinazolinas/administración & dosificación , Quinazolinas/agonistas , Tasa de Supervivencia
16.
Brief Bioinform ; 22(1): 451-462, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-31885041

RESUMEN

Drug-target interactions (DTIs) play a crucial role in target-based drug discovery and development. Computational prediction of DTIs can effectively complement experimental wet-lab techniques for the identification of DTIs, which are typically time- and resource-consuming. However, the performances of the current DTI prediction approaches suffer from a problem of low precision and high false-positive rate. In this study, we aim to develop a novel DTI prediction method for improving the prediction performance based on a cascade deep forest (CDF) model, named DTI-CDF, with multiple similarity-based features between drugs and the similarity-based features between target proteins extracted from the heterogeneous graph, which contains known DTIs. In the experiments, we built five replicates of 10-fold cross-validation under three different experimental settings of data sets, namely, corresponding DTI values of certain drugs (SD), targets (ST), or drug-target pairs (SP) in the training sets are missed but existed in the test sets. The experimental results demonstrate that our proposed approach DTI-CDF achieves a significantly higher performance than that of the traditional ensemble learning-based methods such as random forest and XGBoost, deep neural network, and the state-of-the-art methods such as DDR. Furthermore, there are 1352 newly predicted DTIs which are proved to be correct by KEGG and DrugBank databases. The data sets and source code are freely available at https://github.com//a96123155/DTI-CDF.


Asunto(s)
Desarrollo de Medicamentos/métodos , Proteómica/métodos , Programas Informáticos , Humanos , Simulación del Acoplamiento Molecular/métodos , Análisis de Secuencia de Proteína/métodos
17.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32964234

RESUMEN

Identifying drug-target interactions (DTIs) is an important step for drug discovery and drug repositioning. To reduce the experimental cost, a large number of computational approaches have been proposed for this task. The machine learning-based models, especially binary classification models, have been developed to predict whether a drug-target pair interacts or not. However, there is still much room for improvement in the performance of current methods. Multi-label learning can overcome some difficulties caused by single-label learning in order to improve the predictive performance. The key challenge faced by multi-label learning is the exponential-sized output space, and considering label correlations can help to overcome this challenge. In this paper, we facilitate multi-label classification by introducing community detection methods for DTI prediction, named DTI-MLCD. Moreover, we updated the gold standard data set by adding 15,000 more positive DTI samples in comparison to the data set, which has widely been used by most of previously published DTI prediction methods since 2008. The proposed DTI-MLCD is applied to both data sets, demonstrating its superiority over other machine learning methods and several existing methods. The data sets and source code of this study are freely available at https://github.com/a96123155/DTI-MLCD.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Aprendizaje Automático , Preparaciones Farmacéuticas/metabolismo , Proteínas/metabolismo , Simulación por Computador , Descubrimiento de Drogas/métodos , Reposicionamiento de Medicamentos/métodos , Internet , Terapia Molecular Dirigida/métodos , Preparaciones Farmacéuticas/administración & dosificación , Preparaciones Farmacéuticas/química , Unión Proteica , Proteínas/antagonistas & inhibidores , Proteínas/química , Reproducibilidad de los Resultados
18.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34009265

RESUMEN

Accurate identification of the miRNA-disease associations (MDAs) helps to understand the etiology and mechanisms of various diseases. However, the experimental methods are costly and time-consuming. Thus, it is urgent to develop computational methods towards the prediction of MDAs. Based on the graph theory, the MDA prediction is regarded as a node classification task in the present study. To solve this task, we propose a novel method MDA-GCNFTG, which predicts MDAs based on Graph Convolutional Networks (GCNs) via graph sampling through the Feature and Topology Graph to improve the training efficiency and accuracy. This method models both the potential connections of feature space and the structural relationships of MDA data. The nodes of the graphs are represented by the disease semantic similarity, miRNA functional similarity and Gaussian interaction profile kernel similarity. Moreover, we considered six tasks simultaneously on the MDA prediction problem at the first time, which ensure that under both balanced and unbalanced sample distribution, MDA-GCNFTG can predict not only new MDAs but also new diseases without known related miRNAs and new miRNAs without known related diseases. The results of 5-fold cross-validation show that the MDA-GCNFTG method has achieved satisfactory performance on all six tasks and is significantly superior to the classic machine learning methods and the state-of-the-art MDA prediction methods. Moreover, the effectiveness of GCNs via the graph sampling strategy and the feature and topology graph in MDA-GCNFTG has also been demonstrated. More importantly, case studies for two diseases and three miRNAs are conducted and achieved satisfactory performance.


Asunto(s)
Biomarcadores , Biología Computacional/métodos , Susceptibilidad a Enfermedades , Regulación de la Expresión Génica , MicroARNs/genética , Programas Informáticos , Algoritmos , Bases de Datos Genéticas , Humanos , Reproducibilidad de los Resultados , Flujo de Trabajo
19.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34396388

RESUMEN

Neuropeptides acting as signaling molecules in the nervous system of various animals play crucial roles in a wide range of physiological functions and hormone regulation behaviors. Neuropeptides offer many opportunities for the discovery of new drugs and targets for the treatment of neurological diseases. In recent years, there have been several data-driven computational predictors of various types of bioactive peptides, but the relevant work about neuropeptides is little at present. In this work, we developed an interpretable stacking model, named NeuroPpred-Fuse, for the prediction of neuropeptides through fusing a variety of sequence-derived features and feature selection methods. Specifically, we used six types of sequence-derived features to encode the peptide sequences and then combined them. In the first layer, we ensembled three base classifiers and four feature selection algorithms, which select non-redundant important features complementarily. In the second layer, the output of the first layer was merged and fed into logistic regression (LR) classifier to train the model. Moreover, we analyzed the selected features and explained the feasibility of the selected features. Experimental results show that our model achieved 90.6% accuracy and 95.8% AUC on the independent test set, outperforming the state-of-the-art models. In addition, we exhibited the distribution of selected features by these tree models and compared the results on the training set to that on the test set. These results fully showed that our model has a certain generalization ability. Therefore, we expect that our model would provide important advances in the discovery of neuropeptides as new drugs for the treatment of neurological diseases.


Asunto(s)
Modelos Biológicos , Neuropéptidos/química , Algoritmos , Biología Computacional/métodos , Aprendizaje Automático
20.
J Med Virol ; 95(2): e28542, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36727647

RESUMEN

The ongoing pandemic with the emergence of immune evasion potential and, particularly, the current omicron subvariants intensified the situation further. Although vaccines are available, the immune evasion capabilities of the recent variants demand further efficient therapeutic choices to control the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic. Hence, considering the necessity of the small molecule inhibitor, we target the main protease (3CLpro), which is an appealing target for the development of antiviral drugs against SARS-CoV-2. High-throughput molecular in silico screening of South African natural compounds database reported Isojacareubin and Glabranin as the potential inhibitors for the main protease. The calculated docking scores were reported to be -8.47 and -8.03 kcal/mol, respectively. Moreover, the structural dynamic assessment reported that Isojacareubin in complex with 3CLpro exhibit a more stable dynamic behavior than Glabranin. Inhibition assay indicated that Isojacareubin could inhibit SARS-CoV-2 3CLpro in a time- and dose-dependent manner, with half maximal inhibitory concentration values of 16.00 ± 1.35 µM (60 min incubation). Next, the covalent binding sites of Isojacareubin on SARS-CoV-2 3CLpro was identified by biomass spectrometry, which reported that Isojacareubin can covalently bind to thiols or Cysteine through Michael addition. To evaluate the inactivation potency of Isojacareubin, the inactivation kinetics was further investigated. The inactivation kinetic curves were plotted according to various concentrations with gradient-ascending incubation times. The KI value of Isojacareubin was determined as 30.71 µM, whereas the Kinact value was calculated as 0.054 min-1 . These results suggest that Isojacareubin is a covalent inhibitor of SARS-CoV-2 3CLpro .


Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , Proteasas 3C de Coronavirus , Inhibidores de Proteasas/química , Simulación del Acoplamiento Molecular , Antivirales/farmacología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA