Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3171-3178, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34529571

RESUMEN

Lots of experimental studies have revealed the significant associations between lncRNAs and diseases. Identifying accurate associations will provide a new perspective for disease therapy. Calculation-based methods have been developed to solve these problems, but these methods have some limitations. In this paper, we proposed an accurate method, named MLGCNET, to discover potential lncRNA-disease associations. Firstly, we reconstructed similarity networks for both lncRNAs and diseases using top k similar information, and constructed a lncRNA-disease heterogeneous network (LDN). Then, we applied Multi-Layer Graph Convolutional Network on LDN to obtain latent feature representations of nodes. Finally, the Extra Trees was used to calculate the probability of association between disease and lncRNA. The results of extensive 5-fold cross-validation experiments show that MLGCNET has superior prediction performance compared to the state-of-the-art methods. Case studies confirm the performance of our model on specific diseases. All the experiment results prove the effectiveness and practicality of MLGCNET in predicting potential lncRNA-disease associations.


Asunto(s)
Neoplasias , ARN Largo no Codificante , Humanos , Neoplasias/genética , ARN Largo no Codificante/genética , Biología Computacional/métodos , Probabilidad , Algoritmos
2.
BMC Bioinformatics ; 22(1): 307, 2021 Jun 08.
Artículo en Inglés | MEDLINE | ID: mdl-34103016

RESUMEN

BACKGROUND: Circular RNAs (circRNAs) are a class of single-stranded RNA molecules with a closed-loop structure. A growing body of research has shown that circRNAs are closely related to the development of diseases. Because biological experiments to verify circRNA-disease associations are time-consuming and wasteful of resources, it is necessary to propose a reliable computational method to predict the potential candidate circRNA-disease associations for biological experiments to make them more efficient. RESULTS: In this paper, we propose a double matrix completion method (DMCCDA) for predicting potential circRNA-disease associations. First, we constructed a similarity matrix of circRNA and disease according to circRNA sequence information and semantic disease information. We also built a Gauss interaction profile similarity matrix for circRNA and disease based on experimentally verified circRNA-disease associations. Then, the corresponding circRNA sequence similarity and semantic similarity of disease are used to update the association matrix from the perspective of circRNA and disease, respectively, by matrix multiplication. Finally, from the perspective of circRNA and disease, matrix completion is used to update the matrix block, which is formed by splicing the association matrix obtained in the previous step with the corresponding Gaussian similarity matrix. Compared with other approaches, the model of DMCCDA has a relatively good result in leave-one-out cross-validation and five-fold cross-validation. Additionally, the results of the case studies illustrate the effectiveness of the DMCCDA model. CONCLUSION: The results show that our method works well for recommending the potential circRNAs for a disease for biological experiments.


Asunto(s)
ARN Circular , ARN , Distribución Normal , ARN/genética
3.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33415333

RESUMEN

Predicting disease-related long non-coding RNAs (lncRNAs) is beneficial to finding of new biomarkers for prevention, diagnosis and treatment of complex human diseases. In this paper, we proposed a machine learning techniques-based classification approach to identify disease-related lncRNAs by graph auto-encoder (GAE) and random forest (RF) (GAERF). First, we combined the relationship of lncRNA, miRNA and disease into a heterogeneous network. Then, low-dimensional representation vectors of nodes were learned from the network by GAE, which reduce the dimension and heterogeneity of biological data. Taking these feature vectors as input, we trained a RF classifier to predict new lncRNA-disease associations (LDAs). Related experiment results show that the proposed method for the representation of lncRNA-disease characterizes them accurately. GAERF achieves superior performance owing to the ensemble learning method, outperforming other methods significantly. Moreover, case studies further demonstrated that GAERF is an effective method to predict LDAs.


Asunto(s)
Neoplasias Pulmonares/genética , Aprendizaje Automático , Redes Neurales de la Computación , Neoplasias de la Próstata/genética , ARN Largo no Codificante/genética , Neoplasias Gástricas/genética , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Biología Computacional/métodos , Gráficos por Computador/estadística & datos numéricos , Árboles de Decisión , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patología , Masculino , MicroARNs/clasificación , MicroARNs/genética , MicroARNs/metabolismo , Neoplasias de la Próstata/diagnóstico , Neoplasias de la Próstata/metabolismo , Neoplasias de la Próstata/patología , ARN Largo no Codificante/clasificación , ARN Largo no Codificante/metabolismo , Curva ROC , Factores de Riesgo , Neoplasias Gástricas/diagnóstico , Neoplasias Gástricas/metabolismo , Neoplasias Gástricas/patología
4.
Comput Biol Med ; 72: 22-9, 2016 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-26995027

RESUMEN

New-generation high-throughput technologies, including next-generation sequencing technology, have been extensively applied to solve biological problems. As a result, large cancer genomics projects such as the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium are producing large amount of rich and diverse data in multiple cancer types. The identification of mutated driver genes and driver pathways from these data is a significant challenge. Genome aberrations in cancer cells can be divided into two types: random 'passenger mutation' and functional 'driver mutation'. In this paper, we introduced a Multi-objective Optimization model based on a Genetic Algorithm (MOGA) to solve the maximum weight submatrix problem, which can be employed to identify driver genes and driver pathways promoting cancer proliferation. The maximum weight submatrix problem defined to find mutated driver pathways is based on two specific properties, i.e., high coverage and high exclusivity. The multi-objective optimization model can adjust the trade-off between high coverage and high exclusivity. We proposed an integrative model by combining gene expression data and mutation data to improve the performance of the MOGA algorithm in a biological context.


Asunto(s)
Modelos Teóricos , Mutación , Neoplasias/genética , Algoritmos , Humanos
5.
Biomed Res Int ; 2015: 836929, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26339648

RESUMEN

More and more studies have shown that many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional biological pathway or network and are highly correlated. Differential coexpression analysis, as a more comprehensive technique to the differential expression analysis, was raised to research gene regulatory networks and biological pathways of phenotypic changes through measuring gene correlation changes between disease and normal conditions. In this paper, we propose a gene differential coexpression analysis algorithm in the level of gene sets and apply the algorithm to a publicly available type 2 diabetes (T2D) expression dataset. Firstly, we calculate coexpression biweight midcorrelation coefficients between all gene pairs. Then, we select informative correlation pairs using the "differential coexpression threshold" strategy. Finally, we identify the differential coexpression gene modules using maximum clique concept and k-clique algorithm. We apply the proposed differential coexpression analysis method on simulated data and T2D data. Two differential coexpression gene modules about T2D were detected, which should be useful for exploring the biological function of the related genes.


Asunto(s)
Diabetes Mellitus Tipo 2/genética , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Algoritmos , Diabetes Mellitus Tipo 2/patología , Regulación de la Expresión Génica/genética , Humanos , Redes y Vías Metabólicas/genética , Análisis de Secuencia por Matrices de Oligonucleótidos
6.
Protein Pept Lett ; 17(9): 1123-8, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20509847

RESUMEN

In this study, we propose a new method to predict hairpins in proteins and its evaluation based on the support vector machine. Different from previous methods, new feature representation scheme based on auto covariance is adopted. We also investigate two structure properties of proteins (protein secondary structure and residue conformation propensity), and examine their effects on prediction. Moreover, we employ an ensemble classifier approach based on the majority voting to improve prediction accuracy on hairpins. Experimental results on a dataset of 1926 protein chains show that our approach outperforms those previously published in the literature, which demonstrates the effectiveness of the proposed method.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Aminoácidos/química , Interacciones Hidrofóbicas e Hidrofílicas , Estructura Secundaria de Proteína
7.
Protein Pept Lett ; 17(9): 1117-22, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20509848

RESUMEN

Phi-turns are irregular secondary structure elements consisting of short backbone fragments (six-amino-acid residues) where the backbone reverses its overall direction. They play an important role in proteins from both the structural and functional points of view. Recently, some methods have been proposed to predict phi-turns. In this study, a new method of phi-turn prediction that uses a two-stage classification scheme is proposed based on support vector machine. In addition, different from previous methods, new coding schemes based on the physicochemical properties and the structural properties of proteins are adopted. Seven-fold cross validation based on a dataset of 640 non-homologue protein chains is used to evaluate the performance of our method. The experiment results show our method can yield a promising performance, which confirms the effectiveness of the proposed approach.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Algoritmos , Estructura Secundaria de Proteína
8.
Protein Pept Lett ; 17(9): 1069-78, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20509849

RESUMEN

Protein-protein interactions (PPIs) are key components of most cellular processes, so identification of PPIs is at the heart of functional genomics. A number of experimental techniques have been developed to discover the PPI networks of several organisms. However, the accuracy and coverage of these techniques have proven to be limited. Therefore, it is important to develop computational methods to assist in the design and validation of experimental studies and for the prediction of interaction partners. Here, we provide a critical overview of existing computational methods including genomic context method, structure-based method, domain-based method and sequence-based method. While an exhaustive list of methods is not presented, we analyze the relative strengths and weaknesses for each of the methods discussed, as well as a broader perspective on computational techniques for determining PPIs. In addition to algorithms for interaction prediction, description of many useful databases pertaining to PPIs is also provided.


Asunto(s)
Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Modelos Teóricos , Filogenia , Unión Proteica/genética , Unión Proteica/fisiología
9.
Protein Pept Lett ; 17(9): 1085-90, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20509850

RESUMEN

With a huge amount of protein sequence data, the computational method for protein-protein interaction (PPI) prediction using only the protein sequences information have drawn increasing interest. In this article, we propose a sequence-based method based on a novel representation of local protein sequence descriptors. Local descriptors account for the interactions between residues in both continuous and discontinuous regions of a protein sequence, so this method enables us to extract more PPI information from the sequence. A series of elaborate experiments are performed to optimize the prediction model by varying the parameter k and the distance measuring function of the k-nearest neighbors learning system and the ways of coding a protein pair. When performed on the PPI data of Saccharomyces cerevisiae, the method achieved 86.15% prediction accuracy with 81.03% sensitivity at the precision of 90.24%. An independent data set of 986 Escherichia coli PPIs was used to evaluate this prediction model and the prediction accuracy is 73.02%. Given the complex nature of PPIs, the performance of our method is promising, and it can be a helpful supplement for PPIs prediction.


Asunto(s)
Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Proteínas/metabolismo , Secuencia de Aminoácidos , Unión Proteica/fisiología , Proteínas/genética , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Análisis de Secuencia de Proteína/métodos
10.
BMC Bioinformatics ; 11: 174, 2010 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-20377884

RESUMEN

BACKGROUND: It is well known that most of the binding free energy of protein interaction is contributed by a few key hot spot residues. These residues are crucial for understanding the function of proteins and studying their interactions. Experimental hot spots detection methods such as alanine scanning mutagenesis are not applicable on a large scale since they are time consuming and expensive. Therefore, reliable and efficient computational methods for identifying hot spots are greatly desired and urgently required. RESULTS: In this work, we introduce an efficient approach that uses support vector machine (SVM) to predict hot spot residues in protein interfaces. We systematically investigate a wide variety of 62 features from a combination of protein sequence and structure information. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method. Based on the selected features, nine individual-feature based predictors are developed to identify hot spots using SVMs. Furthermore, a new ensemble classifier, namely APIS (A combined model based on Protrusion Index and Solvent accessibility), is developed to further improve the prediction accuracy. The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature. In addition, we also demonstrate the predictive power of our proposed method by modelling two protein complexes: the calmodulin/myosin light chain kinase complex and the heat shock locus gene products U and V complex, which indicate that our method can identify more hot spots in these two complexes compared with other state-of-the-art methods. CONCLUSION: We have developed an accurate prediction model for hot spot residues, given the structure of a protein complex. A major contribution of this study is to propose several new features based on the protrusion index of amino acid residues, which has been shown to significantly improve the prediction performance of hot spots. Moreover, we identify a compact and useful feature subset that has an important implication for identifying hot spot residues. Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues. The data and source code are available on web site http://home.ustc.edu.cn/~jfxia/hotspot.html.


Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Programas Informáticos , Sitios de Unión , Bases de Datos de Proteínas , Modelos Moleculares , Conformación Proteica , Pliegue de Proteína , Proteínas/metabolismo , Solventes/química , Propiedades de Superficie
11.
Amino Acids ; 39(5): 1595-9, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20386937

RESUMEN

A novel method is proposed for predicting protein-protein interactions (PPIs) based on the meta approach, which predicts PPIs using support vector machine that combines results by six independent state-of-the-art predictors. Significant improvement in prediction performance is observed, when performed on Saccharomyces cerevisiae and Helicobacter pylori datasets. In addition, we used the final prediction model trained on the PPIs dataset of S. cerevisiae to predict interactions in other species. The results reveal that our meta model is also capable of performing cross-species predictions. The source code and the datasets are available at http://home.ustc.edu.cn/~jfxia/Meta_PPI.html.


Asunto(s)
Proteínas Bacterianas/química , Proteínas Bacterianas/metabolismo , Mapeo de Interacción de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Bases de Datos Factuales , Helicobacter pylori/química , Unión Proteica , Saccharomyces cerevisiae/química
12.
Protein Pept Lett ; 17(1): 137-45, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20214637

RESUMEN

We propose a sequence-based multiple classifier system, i.e., rotation forest, to infer protein-protein interactions (PPIs). Moreover, Moran autocorrelation descriptor is used to code an interaction protein pair. Experimental results on Saccharomyces cerevisiae and Helicobacter pylori datasets show that our approach outperforms those previously published in literature, which demonstrates the effectiveness of the proposed method.


Asunto(s)
Proteínas Bacterianas/química , Reconocimiento de Normas Patrones Automatizadas/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/química , Análisis de Secuencia/métodos , Algoritmos , Bases de Datos de Proteínas , Helicobacter pylori , Sensibilidad y Especificidad
13.
Amino Acids ; 38(3): 891-9, 2010 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-19387790

RESUMEN

Identifying protein-protein interactions (PPIs) is critical for understanding the cellular function of the proteins and the machinery of a proteome. Data of PPIs derived from high-throughput technologies are often incomplete and noisy. Therefore, it is important to develop computational methods and high-quality interaction dataset for predicting PPIs. A sequence-based method is proposed by combining correlation coefficient (CC) transformation and support vector machine (SVM). CC transformation not only adequately considers the neighboring effect of protein sequence but describes the level of CC between two protein sequences. A gold standard positives (interacting) dataset MIPS Core and a gold standard negatives (non-interacting) dataset GO-NEG of yeast Saccharomyces cerevisiae were mined to objectively evaluate the above method and attenuate the bias. The SVM model combined with CC transformation yielded the best performance with a high accuracy of 87.94% using gold standard positives and gold standard negatives datasets. The source code of MATLAB and the datasets are available on request under smgsmg@mail.ustc.edu.cn.


Asunto(s)
Aminoácidos/química , Mapeo de Interacción de Proteínas , Proteoma/química , Proteoma/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Algoritmos , Secuencia de Aminoácidos , Inteligencia Artificial , Proteínas Bacterianas/química , Proteínas Bacterianas/metabolismo , Biología Computacional/métodos , Minería de Datos , Bases de Datos de Proteínas , Helicobacter pylori , Modelos Biológicos , Unión Proteica , Proteómica/métodos
14.
Protein Pept Lett ; 15(5): 488-93, 2008.
Artículo en Inglés | MEDLINE | ID: mdl-18537739

RESUMEN

This paper proposes an efficient ensemble system to tackle the protein secondary structure prediction problem with neural networks as base classifiers. The experimental results show that the multi-layer system can lead to better results. When deploying more accurate classifiers, the higher accuracy of the ensemble system can be obtained.


Asunto(s)
Biología Computacional/métodos , Redes Neurales de la Computación , Estructura Secundaria de Proteína , Proteínas/química , Conformación Proteica , Pliegue de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...