Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
BMC Bioinformatics ; 20(Suppl 7): 204, 2019 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-31074375

RESUMEN

A report of the 12th International Conference on Systems Biology (ISB2018), 18-21 August, Guiyang, China.


Asunto(s)
Inteligencia Artificial , Biología Computacional/métodos , Genómica/métodos , Análisis de la Célula Individual/métodos , Biología de Sistemas/métodos , Congresos como Asunto , Humanos
2.
Semin Cancer Biol ; 30: 42-51, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24412105

RESUMEN

Cancer is increasingly perceived as a systems-level, network phenomenon. The major trend of malignant transformation can be described as a two-phase process, where an initial increase of network plasticity is followed by a decrease of plasticity at late stages of tumor development. The fluctuating intensity of stress factors, like hypoxia, inflammation and the either cooperative or hostile interactions of tumor inter-cellular networks, all increase the adaptation potential of cancer cells. This may lead to the bypass of cellular senescence, and to the development of cancer stem cells. We propose that the central tenet of cancer stem cell definition lies exactly in the indefinability of cancer stem cells. Actual properties of cancer stem cells depend on the individual "stress-history" of the given tumor. Cancer stem cells are characterized by an extremely large evolvability (i.e. a capacity to generate heritable phenotypic variation), which corresponds well with the defining hallmarks of cancer stem cells: the possession of the capacity to self-renew and to repeatedly re-build the heterogeneous lineages of cancer cells that comprise a tumor in new environments. Cancer stem cells represent a cell population, which is adapted to adapt. We argue that the high evolvability of cancer stem cells is helped by their repeated transitions between plastic (proliferative, symmetrically dividing) and rigid (quiescent, asymmetrically dividing, often more invasive) phenotypes having plastic and rigid networks. Thus, cancer stem cells reverse and replay cancer development multiple times. We describe network models potentially explaining cancer stem cell-like behavior. Finally, we propose novel strategies including combination therapies and multi-target drugs to overcome the Nietzschean dilemma of cancer stem cell targeting: "what does not kill me makes me stronger".


Asunto(s)
Hipoxia de la Célula/fisiología , Transformación Celular Neoplásica/patología , Senescencia Celular/fisiología , Inflamación/patología , Células Madre Neoplásicas/patología , Humanos
3.
Bioinformatics ; 31(20): 3330-8, 2015 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-26092859

RESUMEN

MOTIVATION: In prognosis and survival studies, an important goal is to identify multi-biomarker panels with predictive power using molecular characteristics or clinical observations. Such analysis is often challenged by censored, small-sample-size, but high-dimensional genomic profiles or clinical data. Therefore, sophisticated models and algorithms are in pressing need. RESULTS: In this study, we propose a novel Area Under Curve (AUC) optimization method for multi-biomarker panel identification named Nearest Centroid Classifier for AUC optimization (NCC-AUC). Our method is motived by the connection between AUC score for classification accuracy evaluation and Harrell's concordance index in survival analysis. This connection allows us to convert the survival time regression problem to a binary classification problem. Then an optimization model is formulated to directly maximize AUC and meanwhile minimize the number of selected features to construct a predictor in the nearest centroid classifier framework. NCC-AUC shows its great performance by validating both in genomic data of breast cancer and clinical data of stage IB Non-Small-Cell Lung Cancer (NSCLC). For the genomic data, NCC-AUC outperforms Support Vector Machine (SVM) and Support Vector Machine-based Recursive Feature Elimination (SVM-RFE) in classification accuracy. It tends to select a multi-biomarker panel with low average redundancy and enriched biological meanings. Also NCC-AUC is more significant in separation of low and high risk cohorts than widely used Cox model (Cox proportional-hazards regression model) and L1-Cox model (L1 penalized in Cox model). These performance gains of NCC-AUC are quite robust across 5 subtypes of breast cancer. Further in an independent clinical data, NCC-AUC outperforms SVM and SVM-RFE in predictive accuracy and is consistently better than Cox model and L1-Cox model in grouping patients into high and low risk categories. CONCLUSION: In summary, NCC-AUC provides a rigorous optimization framework to systematically reveal multi-biomarker panel from genomic and clinical data. It can serve as a useful tool to identify prognostic biomarkers for survival analysis. AVAILABILITY AND IMPLEMENTATION: NCC-AUC is available at http://doc.aporc.org/wiki/NCC-AUC. CONTACT: ywang@amss.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Área Bajo la Curva , Biomarcadores/análisis , Neoplasias de la Mama/diagnóstico , Carcinoma de Pulmón de Células no Pequeñas/diagnóstico , Interpretación Estadística de Datos , Genómica/métodos , Neoplasias Pulmonares/diagnóstico , Neoplasias de la Mama/genética , Neoplasias de la Mama/mortalidad , Carcinoma de Pulmón de Células no Pequeñas/genética , Carcinoma de Pulmón de Células no Pequeñas/mortalidad , Femenino , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/mortalidad , Modelos Biológicos , Reconocimiento de Normas Patrones Automatizadas , Pronóstico , Modelos de Riesgos Proporcionales , Máquina de Vectores de Soporte , Tasa de Supervivencia , Biología de Sistemas , Integración de Sistemas
4.
Nucleic Acids Res ; 41(14): e143, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23761440

RESUMEN

Gene expression profiling has gradually become a routine procedure for disease diagnosis and classification. In the past decade, many computational methods have been proposed, resulting in great improvements on various levels, including feature selection and algorithms for classification and clustering. In this study, we present iPcc, a novel method from the feature extraction perspective to further propel gene expression profiling technologies from bench to bedside. We define 'correlation feature space' for samples based on the gene expression profiles by iterative employment of Pearson's correlation coefficient. Numerical experiments on both simulated and real gene expression data sets demonstrate that iPcc can greatly highlight the latent patterns underlying noisy gene expression data and thus greatly improve the robustness and accuracy of the algorithms currently available for disease diagnosis and classification based on gene expression profiles.


Asunto(s)
Algoritmos , Enfermedad/clasificación , Perfilación de la Expresión Génica/métodos , Clasificación/métodos , Análisis por Conglomerados , Técnicas y Procedimientos Diagnósticos , Enfermedad/genética , Humanos , Leucemia/clasificación , Leucemia/genética , Masculino , Neoplasias de la Próstata/clasificación , Neoplasias de la Próstata/genética , Psoriasis/clasificación , Psoriasis/genética
5.
Nucleic Acids Res ; 41(20): 9230-42, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-23945931

RESUMEN

Chromatin modifications have been comprehensively illustrated to play important roles in gene regulation and cell diversity in recent years. Given the rapid accumulation of genome-wide chromatin modification maps across multiple cell types, there is an urgent need for computational methods to analyze multiple maps to reveal combinatorial modification patterns and define functional DNA elements, especially those are specific to cell types or tissues. In this current study, we developed a computational method using differential chromatin modification analysis (dCMA) to identify cell-type-specific genomic regions with distinctive chromatin modifications. We then apply this method to a public data set with modification profiles of nine marks for nine cell types to evaluate its effectiveness. We found cell-type-specific elements unique to each cell type investigated. These unique features show significant cell-type-specific biological relevance and tend to be located within functional regulatory elements. These results demonstrate the power of a differential comparative epigenomic strategy in deciphering the human genome and characterizing cell specificity.


Asunto(s)
Cromatina/metabolismo , Epigénesis Genética , Genoma Humano , Sitios de Unión , Proteína p300 Asociada a E1A/metabolismo , Epigenómica/métodos , Histonas/metabolismo , Humanos , Transcripción Genética
6.
Nucleic Acids Res ; 41(4): e53, 2013 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-23262226

RESUMEN

Computationally identifying effective biomarkers for cancers from gene expression profiles is an important and challenging task. The challenge lies in the complicated pathogenesis of cancers that often involve the dysfunction of many genes and regulatory interactions. Thus, sophisticated classification model is in pressing need. In this study, we proposed an efficient approach, called ellipsoidFN (ellipsoid Feature Net), to model the disease complexity by ellipsoids and seek a set of heterogeneous biomarkers. Our approach achieves a non-linear classification scheme for the mixed samples by the ellipsoid concept, and at the same time uses a linear programming framework to efficiently select biomarkers from high-dimensional space. ellipsoidFN reduces the redundancy and improves the complementariness between the identified biomarkers, thus significantly enhancing the distinctiveness between cancers and normal samples, and even between cancer types. Numerical evaluation on real prostate cancer, breast cancer and leukemia gene expression datasets suggested that ellipsoidFN outperforms the state-of-the-art biomarker identification methods, and it can serve as a useful tool for cancer biomarker identification in the future. The Matlab code of ellipsoidFN is freely available from http://doc.aporc.org/wiki/EllipsoidFN.


Asunto(s)
Biomarcadores de Tumor/análisis , Programas Informáticos , Transcriptoma , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Femenino , Humanos , Leucemia/genética , Leucemia/metabolismo , Masculino , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/metabolismo
7.
BMC Bioinformatics ; 15: 271, 2014 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-25106096

RESUMEN

BACKGROUND: It has been widely realized that pathways rather than individual genes govern the course of carcinogenesis. Therefore, discovering driver pathways is becoming an important step to understand the molecular mechanisms underlying cancer and design efficient treatments for cancer patients. Previous studies have focused mainly on observation of the alterations in cancer genomes at the individual gene or single pathway level. However, a great deal of evidence has indicated that multiple pathways often function cooperatively in carcinogenesis and other key biological processes. RESULTS: In this study, an exact mathematical programming method was proposed to de novo identify co-occurring mutated driver pathways (CoMDP) in carcinogenesis without any prior information beyond mutation profiles. Two possible properties of mutations that occurred in cooperative pathways were exploited to achieve this: (1) each individual pathway has high coverage and high exclusivity; and (2) the mutations between the pair of pathways showed statistically significant co-occurrence. The efficiency of CoMDP was validated first by testing on simulated data and comparing it with a previous method. Then CoMDP was applied to several real biological data including glioblastoma, lung adenocarcinoma, and ovarian carcinoma datasets. The discovered co-occurring driver pathways were here found to be involved in several key biological processes, such as cell survival and protein synthesis. Moreover, CoMDP was modified to (1) identify an extra pathway co-occurring with a known pathway and (2) detect multiple significant co-occurring driver pathways for carcinogenesis. CONCLUSIONS: The present method can be used to identify gene sets with more biological relevance than the ones currently used for the discovery of single driver pathways.


Asunto(s)
Carcinogénesis/genética , Neoplasias/genética , Neoplasias/patología , Programas Informáticos , Biología de Sistemas/métodos , Algoritmos , Progresión de la Enfermedad , Humanos , Mutación , Transducción de Señal/genética
8.
Bioinformatics ; 28(22): 2940-7, 2012 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-22982574

RESUMEN

MOTIVATION: The first step for clinical diagnostics, prognostics and targeted therapeutics of cancer is to comprehensively understand its molecular mechanisms. Large-scale cancer genomics projects are providing a large volume of data about genomic, epigenomic and gene expression aberrations in multiple cancer types. One of the remaining challenges is to identify driver mutations, driver genes and driver pathways promoting cancer proliferation and filter out the unfunctional and passenger ones. RESULTS: In this study, we propose two methods to solve the so-called maximum weight submatrix problem, which is designed to de novo identify mutated driver pathways from mutation data in cancer. The first one is an exact method that can be helpful for assessing other approximate or/and heuristic algorithms. The second one is a stochastic and flexible method that can be employed to incorporate other types of information to improve the first method. Particularly, we propose an integrative model to combine mutation and expression data. We first apply our methods onto simulated data to show their efficiency. We further apply the proposed methods onto several real biological datasets, such as the mutation profiles of 74 head and neck squamous cell carcinomas samples, 90 glioblastoma tumor samples and 313 ovarian carcinoma samples. The gene expression profiles were also considered for the later two data. The results show that our integrative model can identify more biologically relevant gene sets. We have implemented all these methods and made a package called mutated driver pathway finder, which can be easily used for other researchers. AVAILABILITY: A MATLAB package of MDPFinder is available at http://zhangroup.aporc.org/ShiHuaZhang. CONTACT: zsh@amss.ac.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Mutación , Neoplasias/genética , Análisis Mutacional de ADN/métodos , Genómica/métodos , Humanos , Modelos Genéticos
9.
Nucleic Acids Res ; 39(13): e87, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21543451

RESUMEN

Gene ontology analysis has become a popular and important tool in bioinformatics study, and current ontology analyses are mainly conducted in individual gene or a gene list. However, recent molecular network analysis reveals that the same list of genes with different interactions may perform different functions. Therefore, it is necessary to consider molecular interactions to correctly and specifically annotate biological networks. Here, we propose a novel Network Ontology Analysis (NOA) method to perform gene ontology enrichment analysis on biological networks. Specifically, NOA first defines link ontology that assigns functions to interactions based on the known annotations of joint genes via optimizing two novel indexes 'Coverage' and 'Diversity'. Then, NOA generates two alternative reference sets to statistically rank the enriched functional terms for a given biological network. We compare NOA with traditional enrichment analysis methods in several biological networks, and find that: (i) NOA can capture the change of functions not only in dynamic transcription regulatory networks but also in rewiring protein interaction networks while the traditional methods cannot and (ii) NOA can find more relevant and specific functions than traditional methods in different types of static networks. Furthermore, a freely accessible web server for NOA has been developed at http://www.aporc.org/noa/.


Asunto(s)
Redes Reguladoras de Genes , Mapeo de Interacción de Proteínas , Programas Informáticos , Envejecimiento/genética , Enfermedad de Alzheimer/metabolismo , Biología Computacional/métodos , Humanos , Internet , Anotación de Secuencia Molecular , Neoplasias Pancreáticas/genética , Saccharomyces cerevisiae/genética
10.
Chin J Cancer ; 32(4): 195-204, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23237213

RESUMEN

Analyzing the function of gene sets is a critical step in interpreting the results of high-throughput experiments in systems biology. A variety of enrichment analysis tools have been developed in recent years, but most output a long list of significantly enriched terms that are often redundant, making it difficult to extract the most meaningful functions. In this paper, we present GOMA, a novel enrichment analysis method based on the new concept of enriched functional Gene Ontology (GO) modules. With this method, we systematically revealed functional GO modules, i.e., groups of functionally similar GO terms, via an optimization model and then ranked them by enrichment scores. Our new method simplifies enrichment analysis results by reducing redundancy, thereby preventing inconsistent enrichment results among functionally similar terms and providing more biologically meaningful results.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica , Ontología de Genes , Redes Reguladoras de Genes , Algoritmos , Neoplasias de la Mama/genética , Bases de Datos Genéticas , Femenino , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos
11.
BMC Bioinformatics ; 13: 70, 2012 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-22548981

RESUMEN

BACKGROUND: Gene expression profiling technologies have gradually become a community standard tool for clinical applications. For example, gene expression data has been analyzed to reveal novel disease subtypes (class discovery) and assign particular samples to well-defined classes (class prediction). In the past decade, many effective methods have been proposed for individual applications. However, there is still a pressing need for a unified framework that can reveal the complicated relationships between samples. RESULTS: We propose a novel convex optimization model to perform class discovery and class prediction in a unified framework. An efficient algorithm is designed and software named OTCC (Optimization Tool for Clustering and Classification) is developed. Comparison in a simulated dataset shows that our method outperforms the existing methods. We then applied OTCC to acute leukemia and breast cancer datasets. The results demonstrate that our method not only can reveal the subtle structures underlying those cancer gene expression data but also can accurately predict the class labels of unknown cancer samples. Therefore, our method holds the promise to identify novel cancer subtypes and improve diagnosis. CONCLUSIONS: We propose a unified computational framework for class discovery and class prediction to facilitate the discovery and prediction of subtle subtypes of cancers. Our method can be generally applied to multiple types of measurements, e.g., gene expression profiling, proteomic measuring, and recent next-generation sequencing, since it only requires the similarities among samples as input.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Simulación por Computador , Perfilación de la Expresión Génica/métodos , Neoplasias/clasificación , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/genética , Análisis por Conglomerados , Femenino , Humanos , Leucemia/clasificación , Leucemia/genética , Neoplasias/genética , Programas Informáticos
12.
BMC Bioinformatics ; 13 Suppl 7: S6, 2012 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-22595003

RESUMEN

BACKGROUND: Mycobacterium tuberculosis is an infectious bacterium posing serious threats to human health. Due to the difficulty in performing molecular biology experiments to detect protein interactions, reconstruction of a protein interaction map of M. tuberculosis by computational methods will provide crucial information to understand the biological processes in the pathogenic microorganism, as well as provide the framework upon which new therapeutic approaches can be developed. RESULTS: In this paper, we constructed an integrated M. tuberculosis protein interaction network by machine learning and ortholog-based methods. Firstly, we built a support vector machine (SVM) method to infer the protein interactions of M. tuberculosis H37Rv by gene sequence information. We tested our predictors in Escherichia coli and mapped the genetic codon features underlying its protein interactions to M. tuberculosis. Moreover, the documented interactions of 14 other species were mapped to the interactome of M. tuberculosis by the interolog method. The ensemble protein interactions were validated by various functional relationships, i.e., gene coexpression, evolutionary relationship and functional similarity, extracted from heterogeneous data sources. The accuracy and validation demonstrate the effectiveness and efficiency of our framework. CONCLUSIONS: A protein interaction map of M. tuberculosis is inferred from genetic codons and interologs. The prediction accuracy and numerically experimental validation demonstrate the effectiveness and efficiency of our method. Furthermore, our methods can be straightforwardly extended to infer the protein interactions of other bacterial species.


Asunto(s)
Interacciones Huésped-Patógeno , Mycobacterium tuberculosis/metabolismo , Mapas de Interacción de Proteínas , Máquina de Vectores de Soporte , Animales , Escherichia coli/metabolismo , Humanos
13.
Bioinformatics ; 27(22): 3173-8, 2011 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-21926127

RESUMEN

MOTIVATION: A large amount of biomolecular network data for multiple species have been generated by high-throughput experimental techniques, including undirected and directed networks such as protein-protein interaction networks, gene regulatory networks and metabolic networks. There are many conserved functionally similar modules and pathways among multiple biomolecular networks in different species; therefore, it is important to analyze the similarity between the biomolecular networks. Network querying approaches aim at efficiently discovering the similar subnetworks among different species. However, many existing methods only partially solve this problem. RESULTS: In this article, a novel approach for network querying problem based on conditional random fields (CRFs) model is presented, which can handle both undirected and directed networks, acyclic and cyclic networks and any number of insertions/deletions. The CRF method is fast and can query pathways in a large network in seconds using a PC. To evaluate the CRF method, extensive computational experiments are conducted on the simulated and real data, and the results are compared with the existing network querying methods. All results show that the CRF method is very useful and efficient to find the conserved functionally similar modules and pathways in multiple biomolecular networks.


Asunto(s)
Redes Reguladoras de Genes , Redes y Vías Metabólicas , Mapeo de Interacción de Proteínas/métodos , Algoritmos , Biología Computacional/métodos , Transducción de Señal
14.
Nucleic Acids Res ; 38(18): 5959-69, 2010 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-20466810

RESUMEN

When averaged over the full yeast protein-protein interaction and transcriptional regulatory networks, protein hubs with many interaction partners or regulators tend to evolve significantly more slowly due to increased negative selection. However, genome-wide analysis of protein evolution in the subnetworks of associations involving yeast transcription factors (TFs) reveals that TF hubs do not tend to evolve significantly more slowly than TF non-hubs. This result holds for all four major types of TF hubs: interaction hubs, regulatory in-degree and out-degree hubs, as well as co-regulatory hubs that jointly regulate target genes with many TFs. Furthermore, TF regulatory in-degree hubs tend to evolve significantly more quickly than TF non-hubs. Most importantly, the correlations between evolutionary rate (K(A)/K(S)) and degrees for TFs are significantly more positive than those for generic proteins within the same global protein-protein interaction and transcriptional regulatory networks. Compared to generic protein hubs, TF hubs operate at a higher level in the hierarchical structure of cellular networks, and hence experience additional evolutionary forces (relaxed negative selection or positive selection through network rewiring). The striking difference between the evolution of TF hubs and generic protein hubs demonstrates that components within the same global network can be governed by distinct organizational and evolutionary principles.


Asunto(s)
Evolución Molecular , Redes Reguladoras de Genes , Proteínas de Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismo , Regulación de la Expresión Génica , Mapeo de Interacción de Proteínas , Proteínas de Saccharomyces cerevisiae/genética , Factores de Transcripción/genética
15.
BMC Bioinformatics ; 12: 409, 2011 Oct 24.
Artículo en Inglés | MEDLINE | ID: mdl-22024143

RESUMEN

BACKGROUND: With the development of genome-sequencing technologies, protein sequences are readily obtained by translating the measured mRNAs. Therefore predicting protein-protein interactions from the sequences is of great demand. The reason lies in the fact that identifying protein-protein interactions is becoming a bottleneck for eventually understanding the functions of proteins, especially for those organisms barely characterized. Although a few methods have been proposed, the converse problem, if the features used extract sufficient and unbiased information from protein sequences, is almost untouched. RESULTS: In this study, we interrogate this problem theoretically by an optimization scheme. Motivated by the theoretical investigation, we find novel encoding methods for both protein sequences and protein pairs. Our new methods exploit sufficiently the information of protein sequences and reduce artificial bias and computational cost. Thus, it significantly outperforms the available methods regarding sensitivity, specificity, precision, and recall with cross-validation evaluation and reaches ~80% and ~90% accuracy in Escherichia coli and Saccharomyces cerevisiae respectively. Our findings here hold important implication for other sequence-based prediction tasks because representation of biological sequence is always the first step in computational biology. CONCLUSIONS: By considering the converse problem, we propose new representation methods for both protein sequences and protein pairs. The results show that our method significantly improves the accuracy of protein-protein interaction predictions.


Asunto(s)
Mapeo de Interacción de Proteínas , Proteínas/metabolismo , Máquina de Vectores de Soporte , Secuencia de Aminoácidos , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas/química , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Sensibilidad y Especificidad
16.
Bioinformatics ; 26(13): 1616-22, 2010 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-20483814

RESUMEN

MOTIVATION: Protein-RNA interactions play a key role in a number of biological processes, such as protein synthesis, mRNA processing, mRNA assembly, ribosome function and eukaryotic spliceosomes. As a result, a reliable identification of RNA binding site of a protein is important for functional annotation and site-directed mutagenesis. Accumulated data of experimental protein-RNA interactions reveal that a RNA binding residue with different neighbor amino acids often exhibits different preferences for its RNA partners, which in turn can be assessed by the interacting interdependence of the amino acid fragment and RNA nucleotide. RESULTS: In this work, we propose a novel classification method to identify the RNA binding sites in proteins by combining a new interacting feature (interaction propensity) with other sequence- and structure-based features. Specifically, the interaction propensity represents a binding specificity of a protein residue to the interacting RNA nucleotide by considering its two-side neighborhood in a protein residue triplet. The sequence as well as the structure-based features of the residues are combined together to discriminate the interaction propensity of amino acids with RNA. We predict RNA interacting residues in proteins by implementing a well-built random forest classifier. The experiments show that our method is able to detect the annotated protein-RNA interaction sites in a high accuracy. Our method achieves an accuracy of 84.5%, F-measure of 0.85 and AUC of 0.92 prediction of the RNA binding residues for a dataset containing 205 non-homologous RNA binding proteins, and also outperforms several existing RNA binding residue predictors, such as RNABindR, BindN, RNAProB and PPRint, and some alternative machine learning methods, such as support vector machine, naive Bayes and neural network in the comparison study. Furthermore, we provide some biological insights into the roles of sequences and structures in protein-RNA interactions by both evaluating the importance of features for their contributions in predictive accuracy and analyzing the binding patterns of interacting residues. AVAILABILITY: All the source data and code are available at http://www.aporc.org/doc/wiki/PRNA or http://www.sysbio.ac.cn/datatools.asp CONTACT: lnchen@sibs.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteínas de Unión al ARN/química , ARN/química , Análisis de Secuencia de Proteína , Inteligencia Artificial , Biología Computacional/métodos , ARN/metabolismo , Proteínas de Unión al ARN/metabolismo
17.
Nucleic Acids Res ; 37(18): 5943-58, 2009 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-19661283

RESUMEN

Transcriptional cooperativity among several transcription factors (TFs) is believed to be the main mechanism of complexity and precision in transcriptional regulatory programs. Here, we present a Bayesian network framework to reconstruct a high-confidence whole-genome map of transcriptional cooperativity in Saccharomyces cerevisiae by integrating a comprehensive list of 15 genomic features. We design a Bayesian network structure to capture the dominant correlations among features and TF cooperativity, and introduce a supervised learning framework with a well-constructed gold-standard dataset. This framework allows us to assess the predictive power of each genomic feature, validate the superior performance of our Bayesian network compared to alternative methods, and integrate genomic features for optimal TF cooperativity prediction. Data integration reveals 159 high-confidence predicted cooperative relationships among 105 TFs, most of which are subsequently validated by literature search. The existing and predicted transcriptional cooperativities can be grouped into three categories based on the combination patterns of the genomic features, providing further biological insights into the different types of TF cooperativity. Our methodology is the first supervised learning approach for predicting transcriptional cooperativity, compares favorably to alternative unsupervised methodologies, and can be applied to other genomic data integration tasks where high-quality gold-standard positive data are scarce.


Asunto(s)
Redes Reguladoras de Genes , Genoma Fúngico , Saccharomyces cerevisiae/genética , Factores de Transcripción/metabolismo , Transcripción Genética , Teorema de Bayes
18.
BMC Bioinformatics ; 11: 26, 2010 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-20070902

RESUMEN

BACKGROUND: The accumulation of high-throughput data greatly promotes computational investigation of gene function in the context of complex biological systems. However, a biological function is not simply controlled by an individual gene since genes function in a cooperative manner to achieve biological processes. In the study of human diseases, rather than to discover disease related genes, identifying disease associated pathways and modules becomes an essential problem in the field of systems biology. RESULTS: In this paper, we propose a novel method to detect disease related gene modules or dysfunctional pathways based on global characteristics of interactome coupled with gene expression data. Specifically, we exploit interacting relationships between genes to define a gene's active score function based on the kernel trick, which can represent nonlinear effects of gene cooperativity. Then, modules or pathways are inferred based on the active scores evaluated by the support vector regression in a global and integrative manner. The efficiency and robustness of the proposed method are comprehensively validated by using both simulated and real data with the comparison to existing methods. CONCLUSIONS: By applying the proposed method to two cancer related problems, i.e. breast cancer and prostate cancer, we successfully identified active modules or dysfunctional pathways related to these two types of cancers with literature confirmed evidences. We show that this network-based method is highly efficient and can be applied to a large-scale problem especially for human disease related modules or pathway extraction. Moreover, this method can also be used for prioritizing genes associated with a specific phenotype or disease.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Neoplasias/genética , Bases de Datos Genéticas , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Fenotipo
19.
Amino Acids ; 39(2): 417-25, 2010 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-20127263

RESUMEN

Protein structure alignment algorithms play an important role in the studies of protein structure and function. In this paper, a novel approach for structure alignment is presented. Specifically, core regions in two protein structures are first aligned by identifying connected components in a network of neighboring geometrically compatible aligned fragment pairs. The initial alignments then are refined through a multi-objective optimization method. The algorithm can produce both sequential and non-sequential alignments. We show the superior performance of the proposed algorithm by the computational experiments on several benchmark datasets and the comparisons with the well-known structure alignment algorithms such as DALI, CE and MATT. The proposed method can obtain accurate and biologically significant alignment results for the case with occurrence of internal repeats or indels, identify the circular permutations, and reveal conserved functional sites. A ranking criterion of our algorithm for fold similarity is presented and found to be comparable or superior to the Z-score of CE in most cases from the numerical experiments. The software and supplementary data of computational results are available at http://zhangroup.aporc.org/bioinfo/SANA.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Alineación de Secuencia/métodos , Bases de Datos de Proteínas , Modelos Moleculares , Programas Informáticos
20.
PLoS Comput Biol ; 5(9): e1000521, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19779549

RESUMEN

One of the challenging problems in biology and medicine is exploring the underlying mechanisms of genetic diseases. Recent studies suggest that the relationship between genetic diseases and the aging process is important in understanding the molecular mechanisms of complex diseases. Although some intricate associations have been investigated for a long time, the studies are still in their early stages. In this paper, we construct a human disease-aging network to study the relationship among aging genes and genetic disease genes. Specifically, we integrate human protein-protein interactions (PPIs), disease-gene associations, aging-gene associations, and physiological system-based genetic disease classification information in a single graph-theoretic framework and find that (1) human disease genes are much closer to aging genes than expected by chance; and (2) diseases can be categorized into two types according to their relationships with aging. Type I diseases have their genes significantly close to aging genes, while type II diseases do not. Furthermore, we examine the topological characters of the disease-aging network from a systems perspective. Theoretical results reveal that the genes of type I diseases are in a central position of a PPI network while type II are not; (3) more importantly, we define an asymmetric closeness based on the PPI network to describe relationships between diseases, and find that aging genes make a significant contribution to associations among diseases, especially among type I diseases. In conclusion, the network-based study provides not only evidence for the intricate relationship between the aging process and genetic diseases, but also biological implications for prying into the nature of human diseases.


Asunto(s)
Envejecimiento/genética , Biología Computacional/métodos , Enfermedad/genética , Predisposición Genética a la Enfermedad , Modelos Genéticos , Mapeo Cromosómico , Análisis por Conglomerados , Humanos , Mapeo de Interacción de Proteínas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA