Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Front Genet ; 13: 859626, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35571037

RESUMO

Predicting peptide inter-residue contact maps plays an important role in computational biology, which determines the topology of the peptide structure. However, due to the limited number of known homologous structures, there is still much room for inter-residue contact map prediction. Current models are not sufficient for capturing the high accuracy relationship between the residues, especially for those with a long-range distance. In this article, we developed a novel deep neural network framework to refine the rough contact map produced by the existing methods. The rough contact map is used to construct the residue graph that is processed by the graph convolutional neural network (GCN). GCN can better capture the global information and is therefore used to grasp the long-range contact relationship. The residual convolutional neural network is also applied in the framework for learning local information. We conducted the experiments on four different test datasets, and the inter-residue long-range contact map prediction accuracy demonstrates the effectiveness of our proposed method.

2.
Interdiscip Sci ; 14(4): 937-946, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35713780

RESUMO

Protein subcellular localization prediction is an important research area in bioinformatics, which plays an essential role in understanding protein function and mechanism. Many machine learning and deep learning algorithms have been employed for this task, but most of them do not use structural information of proteins. With the advances in protein structure research in recent years, protein contact map prediction has been dramatically enhanced. In this paper, we present GraphLoc, a deep learning model that predicts the localization of proteins at the subcellular level. The cores of the model are a graph convolutional neural network module and a multi-head attention module. The protein topology graph is constructed based on a contact map predicted from protein sequences, which is used as the input of the GCN module to take full advantage of the structural information of proteins. Multi-head attention module learns the weighted contribution of different amino acids to subcellular localization in different feature representation subspaces. Experiments on the benchmark dataset show that the performance of our model is better than others. The code can be accessed at https://github.com/GoodGuy398/GraphLoc . The proposed GraphLoc model consists of three parts. The first part is a graph convolutional network (GCN) module, which utilizes the predicted contact maps to construct protein graph, taking benefit of protein information accordingly. The second part is the multi-head attention module, which learns the weighted contribution of different amino acids in different feature representation subspace, and weighted average the feature map across all amino acid nodes. The last part is a fully connected layer that maps the flatten graph representation vector to another vector with a category number dimension, followed by a softmax layer to predict the protein subcellular localization.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Biologia Computacional/métodos , Proteínas/química , Aprendizado de Máquina , Aminoácidos
3.
J Bioinform Comput Biol ; 20(1): 2150032, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34775920

RESUMO

Proteins are engines involved in almost all functions of life. They have specific spatial structures formed by twisting and folding of one or more polypeptide chains composed of amino acids. Protein sites are protein structure microenvironments that can be identified by three-dimensional locations and local neighborhoods in which the structure or function exists. Understanding the amino acid environment affinity is essential for additional protein structural or functional studies, such as mutation analysis and functional site detection. In this study, an amino acid environment affinity model based on the graph attention network was developed. Initially, we constructed a protein graph according to the distance between amino acid pairs. Then, we extracted a set of structural features for each node. Finally, the protein graph and the associated node feature set were set to input the graph attention network model and to obtain the amino acid affinities. Numerical results show that our proposed method significantly outperforms a recent 3DCNN-based method by almost 30%.


Assuntos
Aminoácidos , Redes Neurais de Computação , Peptídeos , Proteínas/química
4.
Bioinformatics ; 25(20): 2708-14, 2009 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-19661242

RESUMO

MOTIVATION: Mislabeled samples often appear in gene expression profile because of the similarity of different sub-type of disease and the subjective misdiagnosis. The mislabeled samples deteriorate supervised learning procedures. The LOOE-sensitivity algorithm is an approach for mislabeled sample detection for microarray based on data perturbation. However, the failure of measuring the perturbing effect makes the LOOE-sensitivity algorithm a poor performance. The purpose of this article is to design a novel detection method for mislabeled samples of microarray, which could take advantage of the measuring effect of data perturbations. RESULTS: To measure the effect of data perturbation, we define an index named perturbing influence value (PIV), based on the support vector machine (SVM) regression model. The Column Algorithm (CAPIV), Row Algorithm (RAPIV) and progressive Row Algorithm (PRAPIV) based on the PIV value are proposed to detect the mislabeled samples. Experimental results obtained by using six artificial datasets and five microarray datasets demonstrate that all proposed methods in this article are superior to LOOE-sensitivity. Moreover, compared with the simple SVM and CL-stability, the PRAPIV algorithm shows an increase in precision and high recall. AVAILABILITY: The program and source code (in JAVA) are publicly available at http://ccst.jlu.edu.cn/CSBG/PIVS/index.htm


Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Bases de Dados Factuais , Reconhecimento Automatizado de Padrão
5.
BMC Med Genomics ; 12(Suppl 1): 14, 2019 01 31.
Artigo em Inglês | MEDLINE | ID: mdl-30704464

RESUMO

BACKGROUND: Lots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal. However, it is still an open question on how to extract the robust gene features. METHODS: In this work, a gene signature selection strategy for TCGA data was proposed by integrating the gene expression data, the methylation data and the prior knowledge about cancer biomarkers. Different from the traditional integration method, the expanded 450 K methylation data were applied instead of the original 450 K array data, and the reported biomarkers were weighted in the feature selection. Fuzzy rule based classification method and cross validation strategy were applied in the model construction for performance evaluation. RESULTS: Our selected gene features showed prediction accuracy close to 100% in the cross validation with fuzzy rule based classification model on 6 cancers from TCGA. The cross validation performance of our proposed model is similar to other integrative models or RNA-seq only model, while the prediction performance on independent data is obviously better than other 5 models. The gene signatures extracted with our fuzzy rule based integrative feature selection strategy were more robust, and had the potential to get better prediction results. CONCLUSION: The results indicated that the integration of expanded methylation data would cover more genes, and had greater capacity to retrieve the signature genes compared with the original 450 K methylation data. Also, the integration of the reported biomarkers was a promising way to improve the performance. PTCHD3 gene was selected as a discriminating gene in 3 out of the 6 cancers, which suggested that it might play important role in the cancer risk and would be worthy for the intensive investigation.


Assuntos
Biologia Computacional/métodos , Lógica Fuzzy , Perfilação da Expressão Gênica , Genômica/métodos , Neoplasias/genética , Genoma Humano/genética , Humanos
6.
Phys Rev E Stat Nonlin Soft Matter Phys ; 70(1 Pt 2): 016701, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15324198

RESUMO

Traveling salesman problems (TSP) and generalized traveling salesman problems (GTSP) are two kinds of well known and challenging combinatorial optimization problems with much diversified application fields. Between the two application problems the GTSP is more complex than TSP. Many researchers have studied TSP extensively, but relatively fewer studies pay attention to GTSP, and also its solution using genetic algorithm (GA). In this paper, the structure of conventional chromosome is generalized to be a chromosome termed as a generalized chromosome (GC). A genetic scheme named as generalized-chromosome-based genetic algorithm (GCGA) is also presented. The proposed GCGA enables GTSP and TSP to be solved under a uniform algorithm mode. Forty one benchmark test problems have been solved with the known optimal solutions using the proposed algorithm to verify its validity. The test results show that GCGA can directly solve GTSP without the need of intermediate transformation to TSP.

7.
Sci China Life Sci ; 57(11): 1121-30, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25234108

RESUMO

The availability of a large number of sequenced bacterial genomes facilitates in-depth studies about why genes (operons) in a bacterial genome are globally organized the way they are. We have previously discovered that (the relative) transcription- activation frequencies among different biological pathways encoded in a genome have a dominating role in the global arrangement of operons. One complicating factor in such a study is that some operons may be involved in multiple pathways with different activation frequencies. A quantitative model has been developed that captures this information, which tends to be minimized by the current global arrangement of operons in a bacterial (and archaeal) genome compared to possible alternative arrangements. A study is carried out here using this model on a collection of 52 closely related E. coli genomes, which revealed interesting new insights about how bacterial genomes evolve to optimally adapt to their environments through adjusting the (relative) genomic locations of the encoding operons of biological pathways once their utilization and hence transcription activation frequencies change, to maintain the above energy-efficiency property. More specifically we observed that it is the frequencies of the transcription activation of pathways relative to those of the other encoded pathways in an organism as well as the variation in the activation frequencies of a specific pathway across the related genomes that play a key role in the observed commonalities and differences in the genomic organizations of genes (and operons) encoding specific pathways across different genomes.


Assuntos
Biologia Computacional/métodos , Genoma Bacteriano , Genômica , Meio Ambiente , Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos , Ativação Transcricional , Transcriptoma
8.
Int J Data Min Bioinform ; 6(3): 255-71, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23155761

RESUMO

Based on the assumption that only a subset of disease group has differential gene expression, traditional detection of differentially expressed genes is under the constraint that cancer genes are up- or down-regulated in all disease samples compared with normal samples. However, in 2005, Tomlins assumed and discussed the situation that only a subset of disease samples would be activated, which are often referred to as outliers.


Assuntos
Expressão Gênica , Genômica/métodos , Neoplasias/genética , Perfilação da Expressão Gênica/métodos , Humanos , Neoplasias/diagnóstico , Análise de Sequência com Séries de Oligonucleotídeos , Oncogenes/genética
9.
PLoS One ; 6(5): e20060, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21655325

RESUMO

BACKGROUND: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short), by using a single equation for detecting differential gene expression (DGE) in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability. METHODOLOGY: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods. CONCLUSIONS: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.


Assuntos
Expressão Gênica/genética , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Estatísticas não Paramétricas , Perfilação da Expressão Gênica/métodos , Método de Monte Carlo
10.
IEEE Trans Syst Man Cybern B Cybern ; 39(4): 910-23, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19380276

RESUMO

Ant colony optimization (ACO) has widely been applied to solve combinatorial optimization problems in recent years. There are few studies, however, on its convergence time, which reflects how many iteration times ACO algorithms spend in converging to the optimal solution. Based on the absorbing Markov chain model, we analyze the ACO convergence time in this paper. First, we present a general result for the estimation of convergence time to reveal the relationship between convergence time and pheromone rate. This general result is then extended to a two-step analysis of the convergence time, which includes the following: 1) the iteration time that the pheromone rate spends on reaching the objective value and 2) the convergence time that is calculated with the objective pheromone rate in expectation. Furthermore, four brief ACO algorithms are investigated by using the proposed theoretical results as case studies. Finally, the conclusions of the case studies that the pheromone rate and its deviation determine the expected convergence time are numerically verified with the experiment results of four one-ant ACO algorithms and four ten-ant ACO algorithms.


Assuntos
Algoritmos , Formigas/fisiologia , Cibernética/métodos , Cadeias de Markov , Feromônios/fisiologia , Animais , Modelos Biológicos , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa