Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38600667

RESUMO

Human leukocyte antigen (HLA) recognizes foreign threats and triggers immune responses by presenting peptides to T cells. Computationally modeling the binding patterns between peptide and HLA is very important for the development of tumor vaccines. However, it is still a big challenge to accurately predict HLA molecules binding peptides. In this paper, we develop a new model TripHLApan for predicting HLA molecules binding peptides by integrating triple coding matrix, BiGRU + Attention models, and transfer learning strategy. We have found the main interaction site regions between HLA molecules and peptides, as well as the correlation between HLA encoding and binding motifs. Based on the discovery, we make the preprocessing and coding closer to the natural biological process. Besides, due to the input being based on multiple types of features and the attention module focused on the BiGRU hidden layer, TripHLApan has learned more sequence level binding information. The application of transfer learning strategies ensures the accuracy of prediction results under special lengths (peptides in length 8) and model scalability with the data explosion. Compared with the current optimal models, TripHLApan exhibits strong predictive performance in various prediction environments with different positive and negative sample ratios. In addition, we validate the superiority and scalability of TripHLApan's predictive performance using additional latest data sets, ablation experiments and binding reconstitution ability in the samples of a melanoma patient. The results show that TripHLApan is a powerful tool for predicting the binding of HLA-I and HLA-II molecular peptides for the synthesis of tumor vaccines. TripHLApan is publicly available at https://github.com/CSUBioGroup/TripHLApan.git.


Assuntos
Vacinas Anticâncer , Humanos , Ligação Proteica , Peptídeos/química , Antígenos HLA/química , Antígenos de Histocompatibilidade Classe II/química , Antígenos de Histocompatibilidade Classe I/química , Aprendizado de Máquina
2.
Bioinformatics ; 39(9)2023 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-37606993

RESUMO

MOTIVATION: Cancer heterogeneity drastically affects cancer therapeutic outcomes. Predicting drug response in vitro is expected to help formulate personalized therapy regimens. In recent years, several computational models based on machine learning and deep learning have been proposed to predict drug response in vitro. However, most of these methods capture drug features based on a single drug description (e.g. drug structure), without considering the relationships between drugs and biological entities (e.g. target, diseases, and side effects). Moreover, most of these methods collect features separately for drugs and cell lines but fail to consider the pairwise interactions between drugs and cell lines. RESULTS: In this paper, we propose a deep learning framework, named MSDRP for drug response prediction. MSDRP uses an interaction module to capture interactions between drugs and cell lines, and integrates multiple associations/interactions between drugs and biological entities through similarity network fusion algorithms, outperforming some state-of-the-art models in all performance measures for all experiments. The experimental results of de novo test and independent test demonstrate the excellent performance of our model for new drugs. Furthermore, several case studies illustrate the rationality for using feature vectors derived from drug similarity matrices from multisource data to represent drugs and the interpretability of our model. AVAILABILITY AND IMPLEMENTATION: The codes of MSDRP are available at https://github.com/xyzhang-10/MSDRP.


Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Algoritmos , Linhagem Celular , Aprendizado de Máquina
3.
Bioinformatics ; 39(39 Suppl 1): i368-i376, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387178

RESUMO

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) offers a powerful tool to dissect the complexity of biological tissues through cell sub-population identification in combination with clustering approaches. Feature selection is a critical step for improving the accuracy and interpretability of single-cell clustering. Existing feature selection methods underutilize the discriminatory potential of genes across distinct cell types. We hypothesize that incorporating such information could further boost the performance of single cell clustering. RESULTS: We develop CellBRF, a feature selection method that considers genes' relevance to cell types for single-cell clustering. The key idea is to identify genes that are most important for discriminating cell types through random forests guided by predicted cell labels. Moreover, it proposes a class balancing strategy to mitigate the impact of unbalanced cell type distributions on feature importance evaluation. We benchmark CellBRF on 33 scRNA-seq datasets representing diverse biological scenarios and demonstrate that it substantially outperforms state-of-the-art feature selection methods in terms of clustering accuracy and cell neighborhood consistency. Furthermore, we demonstrate the outstanding performance of our selected features through three case studies on cell differentiation stage identification, non-malignant cell subtype identification, and rare cell identification. CellBRF provides a new and effective tool to boost single-cell clustering accuracy. AVAILABILITY AND IMPLEMENTATION: All source codes of CellBRF are freely available at https://github.com/xuyp-csu/CellBRF.


Assuntos
Benchmarking , Algoritmo Florestas Aleatórias , Diferenciação Celular , Análise por Conglomerados
4.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2353-2363, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-32248123

RESUMO

A growing amount of evidence suggests that long non-coding RNAs (lncRNAs) play important roles in the regulation of biological processes in many human diseases. However, the number of experimentally verified lncRNA-disease associations is very limited. Thus, various computational approaches are proposed to predict lncRNA-disease associations. Current matrix factorization-based methods cannot capture the complex non-linear relationship between lncRNAs and diseases, and traditional machine learning-based methods are not sufficiently powerful to learn the representation of lncRNAs and diseases. Considering these limitations in existing computational methods, we propose a deep matrix factorization model to predict lncRNA-disease associations (DMFLDA in short). DMFLDA uses a cascade of non-linear hidden layers to learn latent representation to represent lncRNAs and diseases. By using non-linear hidden layers, DMFLDA captures the more complex non-linear relationship between lncRNAs and diseases than traditional matrix factorization-based methods. In addition, DMFLDA learns features directly from the lncRNA-disease interaction matrix and thus can obtain more accurate representation learning for lncRNAs and diseases than traditional machine learning methods. The low dimensional representations of the lncRNAs and diseases are fused to estimate the new interaction value. To evaluate the performance of DMFLDA, we perform leave-one-out cross-validation and 5-fold cross-validation on known experimentally verified lncRNA-disease associations. The experimental results show that DMFLDA performs better than the existing methods. The case studies show that many predicted interactions of colorectal cancer, prostate cancer, and renal cancer have been verified by recent biomedical literature. The source code and datasets can be obtained from https://github.com/CSUBioGroup/DMFLDA.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Neoplasias/genética , RNA Longo não Codificante/genética , Predisposição Genética para Doença/genética , Humanos , Neoplasias/metabolismo , RNA Longo não Codificante/metabolismo , Transcriptoma/genética
5.
Artigo em Inglês | MEDLINE | ID: mdl-31150344

RESUMO

High-throughput screening technologies have provided a large amount of drug sensitivity data for a panel of cancer cell lines and hundreds of compounds. Computational approaches to analyzing these data can benefit anticancer therapeutics by identifying molecular genomic determinants of drug sensitivity and developing new anticancer drugs. In this study, we have developed a deep learning architecture to improve the performance of drug sensitivity prediction based on these data. We integrated both genomic features of cell lines and chemical information of compounds to predict the half maximal inhibitory concentrations [Formula: see text] on the Cancer Cell Line Encyclopedia (CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC) datasets using a deep neural network, which we called DeepDSC. Specifically, we first applied a stacked deep autoencoder to extract genomic features of cell lines from gene expression data, and then combined the compounds' chemical features to these genomic features to produce final response data. We conducted 10-fold cross-validation to demonstrate the performance of our deep model in terms of root-mean-square error (RMSE) and coefficient of determination [Formula: see text]. We show that our model outperforms the previous approaches with RMSE of 0.23 and [Formula: see text] of 0.78 on CCLE dataset, and RMSE of 0.52 and [Formula: see text] of 0.78 on GDSC dataset, respectively. Moreover, to demonstrate the prediction ability of our models on novel cell lines or novel compounds, we left cell lines originating from the same tissue and each compound out as the test sets, respectively, and the rest as training sets. The performance was comparable to other methods.


Assuntos
Antineoplásicos/farmacologia , Linhagem Celular Tumoral/efeitos dos fármacos , Aprendizado Profundo , Resistencia a Medicamentos Antineoplásicos , Modelos Estatísticos , Ensaios de Triagem em Larga Escala , Humanos
6.
Methods ; 179: 73-80, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32387314

RESUMO

In recent years, accumulating studies have shown that long non-coding RNAs (lncRNAs) not only play an important role in the regulation of various biological processes but also are the foundation for understanding mechanisms of human diseases. Due to the high cost of traditional biological experiments, the number of experimentally verified lncRNA-disease associations is very limited. Thus, many computational approaches have been proposed to discover the underlying associations between lncRNAs and diseases. However, the associations between lncRNAs and diseases are too complicated to model by using only traditional matrix factorization-based methods. In this study, we propose a hybrid computational framework (SDLDA) for the lncRNA-disease association prediction. In our computational framework, we use singular value decomposition and deep learning to extract linear and non-linear features of lncRNAs and diseases, respectively. Then we train SDLDA by combing the linear and non-linear features. Compared to previous computational methods, the combination of linear and non-linear features reinforces each other, which is better than using only either matrix factorization or deep learning. The computational results show that SDLDA has a better performance over existing methods in the leave-one-out cross-validation. Furthermore, the case studies show that 28 out of 30 cancer-related lncRNAs (10 for gastric cancer, 10 for colon cancer and 8 for renal cancer) are verified by mining recent biomedical literature. Code and data can be accessed at https://github.com/CSUBioGroup/SDLDA.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Estudos de Associação Genética/métodos , RNA Longo não Codificante/metabolismo , Mineração de Dados/métodos , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Regulação da Expressão Gênica , Predisposição Genética para Doença , Humanos , Neoplasias/genética , RNA Longo não Codificante/genética
7.
IEEE J Biomed Health Inform ; 24(8): 2420-2429, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-31825885

RESUMO

Recently, increasing evidences reveal that dysregulations of long non-coding RNAs (lncRNAs) are relevant to diverse diseases. However, the number of experimentally verified lncRNA-disease associations is limited. Prioritizing potential associations is beneficial not only for disease diagnosis, but also disease treatment, more important apprehending disease mechanisms at lncRNA level. Various computational methods have been proposed, but precise prediction and full use of data's intrinsic structure are still challenging. In this work, we design a new method, denominated GMCLDA (Geometric Matrix Completion lncRNA-Disease Association), to infer underlying associations based on geometric matrix completion. Utilizing association patterns among functionally similar lncRNAs and phenotypically similar diseases, GMCLCA makes use of the intrinsic structure embedded in the association matrix. Besides, limiting the scope of the predicted values gives rise to a certain sparsity in computation and enhances the robustness of GMCLDA. GMCLDA computes disease semantic similarity according to the Disease Ontology (DO) hierarchy and lncRNA Gaussian interaction profile kernel similarity according to known interaction profiles. Then, GMCLDA measures lncRNA sequence similarity using Needleman-Wunsch algorithm. For a new lncRNA, GMCLDA prefills interaction profile on account of its K-nearest neighbors defined by sequence similarity. Finally, GMCLDA estimates the missing entries of the association matrix based on geometric matrix completion model. Compared with state-of-the-art methods, GMCLDA can provide more accurate lncRNA-disease prediction. Further case studies prove that GMCLDA is able to correctly infer possible lncRNAs for renal cancer.


Assuntos
Biologia Computacional/métodos , Predisposição Genética para Doença , RNA Longo não Codificante/genética , Algoritmos , Bases de Dados Factuais , Feminino , Predisposição Genética para Doença/epidemiologia , Predisposição Genética para Doença/genética , Humanos , Masculino , Informática Médica , Neoplasias/epidemiologia , Neoplasias/genética
8.
Artigo em Inglês | MEDLINE | ID: mdl-28541220

RESUMO

A cellular signal transduction network is an important means to describe biological responses to environmental stimuli and exchange of biological signals. Constructing the cellular signal transduction network provides an important basis for the study of the biological activities, the mechanism of the diseases, drug targets and so on. The statistical approaches to network inference are popular in literature. Granger test has been used as an effective method for causality inference. Compared with bivariate granger tests, multivariate granger tests reduce the indirect causality and were used widely for the construction of cellular signal transduction networks. A multivariate Granger test requires that the number of time points in the time-series data is more than the number of nodes involved in the network. However, there are many real datasets with a few time points which are much less than the number of nodes in the network. In this study, we propose a new multivariate Granger test-based framework to construct cellular signal transduction network, called MGT-SM. Our MGT-SM uses SVD to compute the coefficient matrix from gene expression data and adopts Monte Carlo simulation to estimate the significance of directed edges in the constructed networks. We apply the proposed MGT-SM to Yeast Synthetic Network and MDA-MB-468, and evaluate its performance in terms of the recall and the AUC. The results show that MGT-SM achieves better results, compared with other popular methods (CGC2SPR, PGC, and DBN).


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Algoritmos , Simulação por Computador , Humanos , Modelos Lineares , Método de Monte Carlo , Neoplasias/genética , Transdução de Sinais/genética , Leveduras/genética
9.
Bioinformatics ; 34(19): 3357-3364, 2018 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-29718113

RESUMO

Motivation: Accumulating evidences indicate that long non-coding RNAs (lncRNAs) play pivotal roles in various biological processes. Mutations and dysregulations of lncRNAs are implicated in miscellaneous human diseases. Predicting lncRNA-disease associations is beneficial to disease diagnosis as well as treatment. Although many computational methods have been developed, precisely identifying lncRNA-disease associations, especially for novel lncRNAs, remains challenging. Results: In this study, we propose a method (named SIMCLDA) for predicting potential lncRNA-disease associations based on inductive matrix completion. We compute Gaussian interaction profile kernel of lncRNAs from known lncRNA-disease interactions and functional similarity of diseases based on disease-gene and gene-gene onotology associations. Then, we extract primary feature vectors from Gaussian interaction profile kernel of lncRNAs and functional similarity of diseases by principal component analysis, respectively. For a new lncRNA, we calculate the interaction profile according to the interaction profiles of its neighbors. At last, we complete the association matrix based on the inductive matrix completion framework using the primary feature vectors from the constructed feature matrices. Computational results show that SIMCLDA can effectively predict lncRNA-disease associations with higher accuracy compared with previous methods. Furthermore, case studies show that SIMCLDA can effectively predict candidate lncRNAs for renal cancer, gastric cancer and prostate cancer. Availability and implementation: https://github.com//bioinfomaticsCSU/SIMCLDA. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
RNA Longo não Codificante/genética , Algoritmos , Humanos , Neoplasias Renais/genética , Software , Neoplasias Gástricas/genética
10.
BMC Bioinformatics ; 14 Suppl 13: S9, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24267383

RESUMO

BACKGROUND: Disulfide bonds play an important role in protein folding and structure stability. Accurately predicting disulfide bonds from protein sequences is important for modeling the structural and functional characteristics of many proteins. METHODS: In this work, we introduce an approach of enhancing disulfide bonding prediction accuracy by taking advantage of context-based features. We firstly derive the first-order and second-order mean-force potentials according to the amino acid environment around the cysteine residues from large number of cysteine samples. The mean-force potentials are integrated as context-based scores to estimate the favorability of a cysteine residue in disulfide bonding state as well as a cysteine pair in disulfide bond connectivity. These context-based scores are then incorporated as features together with other sequence and evolutionary information to train neural networks for disulfide bonding state prediction and connectivity prediction. RESULTS: The 10-fold cross validated accuracy is 90.8% at residue-level and 85.6% at protein-level in classifying an individual cysteine residue as bonded or free, which is around 2% accuracy improvement. The average accuracy for disulfide bonding connectivity prediction is also improved, which yields overall sensitivity of 73.42% and specificity of 91.61%. CONCLUSIONS: Our computational results have shown that the context-based scores are effective features to enhance the prediction accuracies of both disulfide bonding state prediction and connectivity prediction. Our disulfide prediction algorithm is implemented on a web server named "Dinosolve" available at: http://hpcr.cs.odu.edu/dinosolve.


Assuntos
Sequência de Aminoácidos , Cisteína/classificação , Dissulfetos , Redes Neurais de Computação , Dobramento de Proteína , Algoritmos , Dissulfetos/química , Dissulfetos/metabolismo , Valor Preditivo dos Testes
11.
Proteins ; 52(3): 339-48, 2003 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-12866048

RESUMO

Predicting the long-time, nonequilibrium dynamics of receptor-ligand interactions for structured proteins in a host fluid is a formidable task, but of great importance to predicting and analyzing cell-signaling processes and small molecule drug efficacies. Such processes take place on timescales on the order of milliseconds to seconds, so "brute-force" real-time, molecular or atomic simulations to determine absolute ligand-binding rates to receptor targets and over a statistical ensemble of systems are not currently feasible. In the current study, we implement on real protein systems a previously developed 3-5 hybrid molecular dynamics/Brownian dynamics algorithm, which takes advantage of the underlying, disparate timescales involved and overcomes the limitations of brute-force approaches. The algorithm is based on a multiple timescale analysis of the total system Hamiltonian, including all atomic and molecular structure information for the system: water, ligand, and receptor. In general, the method can account for the complex hydrodynamic, translational-orientational diffusion aspects of ligand-docking dynamics as well as predict the actual or absolute rates of ligand binding. To test some of the underlying features of the method, simulations were conducted here for an artificially constructed spherical protein "made" from the real protein insulin. Excellent comparisons of simulation calculations of the so-called grand particle friction tensor to analytical values were obtained for this system when protein charge effects were neglected. When protein charges were included, we found anomalous results caused by the alteration of the spatial, microscopic structure of water proximal to the protein surface. Protein charge effects were found to be highly significant and consistent with the recent hypothesis of Hoppert and Mayer (Am Sci 1999;87:518-525) for charged macromolecules in water, which involves the formation of a "water dense region" proximal to the charged protein surface followed by a "dilute water region." We further studied the algorithm on a D-peptide/HIV capside protein system and demonstrated the algorithms utility to study the nonequilibrium docking dynamics in this contemporary problem. In general, protein charge effects, which alter water structural properties in an anomalous fashion proximal to the protein surface, were found to be much more important than the so-called hydrodynamic interaction effects between ligand and receptor. The diminished role of hydrodynamic interactions in protein systems allows for a much simpler overall dynamic algorithm for the nonequilibrium protein-docking process. Further studies are now underway to critically examine this simpler overall algorithm in analyzing the nonequilibrium protein-docking problem.


Assuntos
Algoritmos , Receptores de Superfície Celular/química , Ligação Competitiva , Simulação por Computador , Proteína gp41 do Envelope de HIV/química , Proteína gp41 do Envelope de HIV/metabolismo , Insulina/química , Insulina/metabolismo , Cinética , Ligantes , Modelos Moleculares , Peptídeos/química , Peptídeos/metabolismo , Proteínas/química , Proteínas/metabolismo , Receptores de Superfície Celular/metabolismo , Fatores de Tempo , Água/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA