Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 20(Suppl 7): 204, 2019 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-31074375

RESUMO

A report of the 12th International Conference on Systems Biology (ISB2018), 18-21 August, Guiyang, China.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Genômica/métodos , Análise de Célula Única/métodos , Biologia de Sistemas/métodos , Congressos como Assunto , Humanos
2.
Semin Cancer Biol ; 30: 42-51, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24412105

RESUMO

Cancer is increasingly perceived as a systems-level, network phenomenon. The major trend of malignant transformation can be described as a two-phase process, where an initial increase of network plasticity is followed by a decrease of plasticity at late stages of tumor development. The fluctuating intensity of stress factors, like hypoxia, inflammation and the either cooperative or hostile interactions of tumor inter-cellular networks, all increase the adaptation potential of cancer cells. This may lead to the bypass of cellular senescence, and to the development of cancer stem cells. We propose that the central tenet of cancer stem cell definition lies exactly in the indefinability of cancer stem cells. Actual properties of cancer stem cells depend on the individual "stress-history" of the given tumor. Cancer stem cells are characterized by an extremely large evolvability (i.e. a capacity to generate heritable phenotypic variation), which corresponds well with the defining hallmarks of cancer stem cells: the possession of the capacity to self-renew and to repeatedly re-build the heterogeneous lineages of cancer cells that comprise a tumor in new environments. Cancer stem cells represent a cell population, which is adapted to adapt. We argue that the high evolvability of cancer stem cells is helped by their repeated transitions between plastic (proliferative, symmetrically dividing) and rigid (quiescent, asymmetrically dividing, often more invasive) phenotypes having plastic and rigid networks. Thus, cancer stem cells reverse and replay cancer development multiple times. We describe network models potentially explaining cancer stem cell-like behavior. Finally, we propose novel strategies including combination therapies and multi-target drugs to overcome the Nietzschean dilemma of cancer stem cell targeting: "what does not kill me makes me stronger".


Assuntos
Hipóxia Celular/fisiologia , Transformação Celular Neoplásica/patologia , Senescência Celular/fisiologia , Inflamação/patologia , Células-Tronco Neoplásicas/patologia , Humanos
3.
Bioinformatics ; 31(20): 3330-8, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26092859

RESUMO

MOTIVATION: In prognosis and survival studies, an important goal is to identify multi-biomarker panels with predictive power using molecular characteristics or clinical observations. Such analysis is often challenged by censored, small-sample-size, but high-dimensional genomic profiles or clinical data. Therefore, sophisticated models and algorithms are in pressing need. RESULTS: In this study, we propose a novel Area Under Curve (AUC) optimization method for multi-biomarker panel identification named Nearest Centroid Classifier for AUC optimization (NCC-AUC). Our method is motived by the connection between AUC score for classification accuracy evaluation and Harrell's concordance index in survival analysis. This connection allows us to convert the survival time regression problem to a binary classification problem. Then an optimization model is formulated to directly maximize AUC and meanwhile minimize the number of selected features to construct a predictor in the nearest centroid classifier framework. NCC-AUC shows its great performance by validating both in genomic data of breast cancer and clinical data of stage IB Non-Small-Cell Lung Cancer (NSCLC). For the genomic data, NCC-AUC outperforms Support Vector Machine (SVM) and Support Vector Machine-based Recursive Feature Elimination (SVM-RFE) in classification accuracy. It tends to select a multi-biomarker panel with low average redundancy and enriched biological meanings. Also NCC-AUC is more significant in separation of low and high risk cohorts than widely used Cox model (Cox proportional-hazards regression model) and L1-Cox model (L1 penalized in Cox model). These performance gains of NCC-AUC are quite robust across 5 subtypes of breast cancer. Further in an independent clinical data, NCC-AUC outperforms SVM and SVM-RFE in predictive accuracy and is consistently better than Cox model and L1-Cox model in grouping patients into high and low risk categories. CONCLUSION: In summary, NCC-AUC provides a rigorous optimization framework to systematically reveal multi-biomarker panel from genomic and clinical data. It can serve as a useful tool to identify prognostic biomarkers for survival analysis. AVAILABILITY AND IMPLEMENTATION: NCC-AUC is available at http://doc.aporc.org/wiki/NCC-AUC. CONTACT: ywang@amss.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Área Sob a Curva , Biomarcadores/análise , Neoplasias da Mama/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Interpretação Estatística de Dados , Genômica/métodos , Neoplasias Pulmonares/diagnóstico , Neoplasias da Mama/genética , Neoplasias da Mama/mortalidade , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/mortalidade , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/mortalidade , Modelos Biológicos , Reconhecimento Automatizado de Padrão , Prognóstico , Modelos de Riscos Proporcionais , Máquina de Vetores de Suporte , Taxa de Sobrevida , Biologia de Sistemas , Integração de Sistemas
4.
Nucleic Acids Res ; 41(14): e143, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23761440

RESUMO

Gene expression profiling has gradually become a routine procedure for disease diagnosis and classification. In the past decade, many computational methods have been proposed, resulting in great improvements on various levels, including feature selection and algorithms for classification and clustering. In this study, we present iPcc, a novel method from the feature extraction perspective to further propel gene expression profiling technologies from bench to bedside. We define 'correlation feature space' for samples based on the gene expression profiles by iterative employment of Pearson's correlation coefficient. Numerical experiments on both simulated and real gene expression data sets demonstrate that iPcc can greatly highlight the latent patterns underlying noisy gene expression data and thus greatly improve the robustness and accuracy of the algorithms currently available for disease diagnosis and classification based on gene expression profiles.


Assuntos
Algoritmos , Doença/classificação , Perfilação da Expressão Gênica/métodos , Classificação/métodos , Análise por Conglomerados , Técnicas e Procedimentos Diagnósticos , Doença/genética , Humanos , Leucemia/classificação , Leucemia/genética , Masculino , Neoplasias da Próstata/classificação , Neoplasias da Próstata/genética , Psoríase/classificação , Psoríase/genética
5.
Nucleic Acids Res ; 41(20): 9230-42, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23945931

RESUMO

Chromatin modifications have been comprehensively illustrated to play important roles in gene regulation and cell diversity in recent years. Given the rapid accumulation of genome-wide chromatin modification maps across multiple cell types, there is an urgent need for computational methods to analyze multiple maps to reveal combinatorial modification patterns and define functional DNA elements, especially those are specific to cell types or tissues. In this current study, we developed a computational method using differential chromatin modification analysis (dCMA) to identify cell-type-specific genomic regions with distinctive chromatin modifications. We then apply this method to a public data set with modification profiles of nine marks for nine cell types to evaluate its effectiveness. We found cell-type-specific elements unique to each cell type investigated. These unique features show significant cell-type-specific biological relevance and tend to be located within functional regulatory elements. These results demonstrate the power of a differential comparative epigenomic strategy in deciphering the human genome and characterizing cell specificity.


Assuntos
Cromatina/metabolismo , Epigênese Genética , Genoma Humano , Sítios de Ligação , Proteína p300 Associada a E1A/metabolismo , Epigenômica/métodos , Histonas/metabolismo , Humanos , Transcrição Gênica
6.
Nucleic Acids Res ; 41(4): e53, 2013 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-23262226

RESUMO

Computationally identifying effective biomarkers for cancers from gene expression profiles is an important and challenging task. The challenge lies in the complicated pathogenesis of cancers that often involve the dysfunction of many genes and regulatory interactions. Thus, sophisticated classification model is in pressing need. In this study, we proposed an efficient approach, called ellipsoidFN (ellipsoid Feature Net), to model the disease complexity by ellipsoids and seek a set of heterogeneous biomarkers. Our approach achieves a non-linear classification scheme for the mixed samples by the ellipsoid concept, and at the same time uses a linear programming framework to efficiently select biomarkers from high-dimensional space. ellipsoidFN reduces the redundancy and improves the complementariness between the identified biomarkers, thus significantly enhancing the distinctiveness between cancers and normal samples, and even between cancer types. Numerical evaluation on real prostate cancer, breast cancer and leukemia gene expression datasets suggested that ellipsoidFN outperforms the state-of-the-art biomarker identification methods, and it can serve as a useful tool for cancer biomarker identification in the future. The Matlab code of ellipsoidFN is freely available from http://doc.aporc.org/wiki/EllipsoidFN.


Assuntos
Biomarcadores Tumorais/análise , Software , Transcriptoma , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Feminino , Humanos , Leucemia/genética , Leucemia/metabolismo , Masculino , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo
7.
BMC Bioinformatics ; 15: 271, 2014 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-25106096

RESUMO

BACKGROUND: It has been widely realized that pathways rather than individual genes govern the course of carcinogenesis. Therefore, discovering driver pathways is becoming an important step to understand the molecular mechanisms underlying cancer and design efficient treatments for cancer patients. Previous studies have focused mainly on observation of the alterations in cancer genomes at the individual gene or single pathway level. However, a great deal of evidence has indicated that multiple pathways often function cooperatively in carcinogenesis and other key biological processes. RESULTS: In this study, an exact mathematical programming method was proposed to de novo identify co-occurring mutated driver pathways (CoMDP) in carcinogenesis without any prior information beyond mutation profiles. Two possible properties of mutations that occurred in cooperative pathways were exploited to achieve this: (1) each individual pathway has high coverage and high exclusivity; and (2) the mutations between the pair of pathways showed statistically significant co-occurrence. The efficiency of CoMDP was validated first by testing on simulated data and comparing it with a previous method. Then CoMDP was applied to several real biological data including glioblastoma, lung adenocarcinoma, and ovarian carcinoma datasets. The discovered co-occurring driver pathways were here found to be involved in several key biological processes, such as cell survival and protein synthesis. Moreover, CoMDP was modified to (1) identify an extra pathway co-occurring with a known pathway and (2) detect multiple significant co-occurring driver pathways for carcinogenesis. CONCLUSIONS: The present method can be used to identify gene sets with more biological relevance than the ones currently used for the discovery of single driver pathways.


Assuntos
Carcinogênese/genética , Neoplasias/genética , Neoplasias/patologia , Software , Biologia de Sistemas/métodos , Algoritmos , Progressão da Doença , Humanos , Mutação , Transdução de Sinais/genética
8.
Bioinformatics ; 28(22): 2940-7, 2012 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-22982574

RESUMO

MOTIVATION: The first step for clinical diagnostics, prognostics and targeted therapeutics of cancer is to comprehensively understand its molecular mechanisms. Large-scale cancer genomics projects are providing a large volume of data about genomic, epigenomic and gene expression aberrations in multiple cancer types. One of the remaining challenges is to identify driver mutations, driver genes and driver pathways promoting cancer proliferation and filter out the unfunctional and passenger ones. RESULTS: In this study, we propose two methods to solve the so-called maximum weight submatrix problem, which is designed to de novo identify mutated driver pathways from mutation data in cancer. The first one is an exact method that can be helpful for assessing other approximate or/and heuristic algorithms. The second one is a stochastic and flexible method that can be employed to incorporate other types of information to improve the first method. Particularly, we propose an integrative model to combine mutation and expression data. We first apply our methods onto simulated data to show their efficiency. We further apply the proposed methods onto several real biological datasets, such as the mutation profiles of 74 head and neck squamous cell carcinomas samples, 90 glioblastoma tumor samples and 313 ovarian carcinoma samples. The gene expression profiles were also considered for the later two data. The results show that our integrative model can identify more biologically relevant gene sets. We have implemented all these methods and made a package called mutated driver pathway finder, which can be easily used for other researchers. AVAILABILITY: A MATLAB package of MDPFinder is available at http://zhangroup.aporc.org/ShiHuaZhang. CONTACT: zsh@amss.ac.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Mutação , Neoplasias/genética , Análise Mutacional de DNA/métodos , Genômica/métodos , Humanos , Modelos Genéticos
9.
Nucleic Acids Res ; 39(13): e87, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21543451

RESUMO

Gene ontology analysis has become a popular and important tool in bioinformatics study, and current ontology analyses are mainly conducted in individual gene or a gene list. However, recent molecular network analysis reveals that the same list of genes with different interactions may perform different functions. Therefore, it is necessary to consider molecular interactions to correctly and specifically annotate biological networks. Here, we propose a novel Network Ontology Analysis (NOA) method to perform gene ontology enrichment analysis on biological networks. Specifically, NOA first defines link ontology that assigns functions to interactions based on the known annotations of joint genes via optimizing two novel indexes 'Coverage' and 'Diversity'. Then, NOA generates two alternative reference sets to statistically rank the enriched functional terms for a given biological network. We compare NOA with traditional enrichment analysis methods in several biological networks, and find that: (i) NOA can capture the change of functions not only in dynamic transcription regulatory networks but also in rewiring protein interaction networks while the traditional methods cannot and (ii) NOA can find more relevant and specific functions than traditional methods in different types of static networks. Furthermore, a freely accessible web server for NOA has been developed at http://www.aporc.org/noa/.


Assuntos
Redes Reguladoras de Genes , Mapeamento de Interação de Proteínas , Software , Envelhecimento/genética , Doença de Alzheimer/metabolismo , Biologia Computacional/métodos , Humanos , Internet , Anotação de Sequência Molecular , Neoplasias Pancreáticas/genética , Saccharomyces cerevisiae/genética
10.
Chin J Cancer ; 32(4): 195-204, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23237213

RESUMO

Analyzing the function of gene sets is a critical step in interpreting the results of high-throughput experiments in systems biology. A variety of enrichment analysis tools have been developed in recent years, but most output a long list of significantly enriched terms that are often redundant, making it difficult to extract the most meaningful functions. In this paper, we present GOMA, a novel enrichment analysis method based on the new concept of enriched functional Gene Ontology (GO) modules. With this method, we systematically revealed functional GO modules, i.e., groups of functionally similar GO terms, via an optimization model and then ranked them by enrichment scores. Our new method simplifies enrichment analysis results by reducing redundancy, thereby preventing inconsistent enrichment results among functionally similar terms and providing more biologically meaningful results.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica , Ontologia Genética , Redes Reguladoras de Genes , Algoritmos , Neoplasias da Mama/genética , Bases de Dados Genéticas , Feminino , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos
11.
BMC Bioinformatics ; 13: 70, 2012 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-22548981

RESUMO

BACKGROUND: Gene expression profiling technologies have gradually become a community standard tool for clinical applications. For example, gene expression data has been analyzed to reveal novel disease subtypes (class discovery) and assign particular samples to well-defined classes (class prediction). In the past decade, many effective methods have been proposed for individual applications. However, there is still a pressing need for a unified framework that can reveal the complicated relationships between samples. RESULTS: We propose a novel convex optimization model to perform class discovery and class prediction in a unified framework. An efficient algorithm is designed and software named OTCC (Optimization Tool for Clustering and Classification) is developed. Comparison in a simulated dataset shows that our method outperforms the existing methods. We then applied OTCC to acute leukemia and breast cancer datasets. The results demonstrate that our method not only can reveal the subtle structures underlying those cancer gene expression data but also can accurately predict the class labels of unknown cancer samples. Therefore, our method holds the promise to identify novel cancer subtypes and improve diagnosis. CONCLUSIONS: We propose a unified computational framework for class discovery and class prediction to facilitate the discovery and prediction of subtle subtypes of cancers. Our method can be generally applied to multiple types of measurements, e.g., gene expression profiling, proteomic measuring, and recent next-generation sequencing, since it only requires the similarities among samples as input.


Assuntos
Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Neoplasias/classificação , Neoplasias da Mama/classificação , Neoplasias da Mama/genética , Análise por Conglomerados , Feminino , Humanos , Leucemia/classificação , Leucemia/genética , Neoplasias/genética , Software
12.
BMC Bioinformatics ; 13 Suppl 7: S6, 2012 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-22595003

RESUMO

BACKGROUND: Mycobacterium tuberculosis is an infectious bacterium posing serious threats to human health. Due to the difficulty in performing molecular biology experiments to detect protein interactions, reconstruction of a protein interaction map of M. tuberculosis by computational methods will provide crucial information to understand the biological processes in the pathogenic microorganism, as well as provide the framework upon which new therapeutic approaches can be developed. RESULTS: In this paper, we constructed an integrated M. tuberculosis protein interaction network by machine learning and ortholog-based methods. Firstly, we built a support vector machine (SVM) method to infer the protein interactions of M. tuberculosis H37Rv by gene sequence information. We tested our predictors in Escherichia coli and mapped the genetic codon features underlying its protein interactions to M. tuberculosis. Moreover, the documented interactions of 14 other species were mapped to the interactome of M. tuberculosis by the interolog method. The ensemble protein interactions were validated by various functional relationships, i.e., gene coexpression, evolutionary relationship and functional similarity, extracted from heterogeneous data sources. The accuracy and validation demonstrate the effectiveness and efficiency of our framework. CONCLUSIONS: A protein interaction map of M. tuberculosis is inferred from genetic codons and interologs. The prediction accuracy and numerically experimental validation demonstrate the effectiveness and efficiency of our method. Furthermore, our methods can be straightforwardly extended to infer the protein interactions of other bacterial species.


Assuntos
Interações Hospedeiro-Patógeno , Mycobacterium tuberculosis/metabolismo , Mapas de Interação de Proteínas , Máquina de Vetores de Suporte , Animais , Escherichia coli/metabolismo , Humanos
13.
Bioinformatics ; 27(22): 3173-8, 2011 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-21926127

RESUMO

MOTIVATION: A large amount of biomolecular network data for multiple species have been generated by high-throughput experimental techniques, including undirected and directed networks such as protein-protein interaction networks, gene regulatory networks and metabolic networks. There are many conserved functionally similar modules and pathways among multiple biomolecular networks in different species; therefore, it is important to analyze the similarity between the biomolecular networks. Network querying approaches aim at efficiently discovering the similar subnetworks among different species. However, many existing methods only partially solve this problem. RESULTS: In this article, a novel approach for network querying problem based on conditional random fields (CRFs) model is presented, which can handle both undirected and directed networks, acyclic and cyclic networks and any number of insertions/deletions. The CRF method is fast and can query pathways in a large network in seconds using a PC. To evaluate the CRF method, extensive computational experiments are conducted on the simulated and real data, and the results are compared with the existing network querying methods. All results show that the CRF method is very useful and efficient to find the conserved functionally similar modules and pathways in multiple biomolecular networks.


Assuntos
Redes Reguladoras de Genes , Redes e Vias Metabólicas , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Biologia Computacional/métodos , Transdução de Sinais
14.
Nucleic Acids Res ; 38(18): 5959-69, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20466810

RESUMO

When averaged over the full yeast protein-protein interaction and transcriptional regulatory networks, protein hubs with many interaction partners or regulators tend to evolve significantly more slowly due to increased negative selection. However, genome-wide analysis of protein evolution in the subnetworks of associations involving yeast transcription factors (TFs) reveals that TF hubs do not tend to evolve significantly more slowly than TF non-hubs. This result holds for all four major types of TF hubs: interaction hubs, regulatory in-degree and out-degree hubs, as well as co-regulatory hubs that jointly regulate target genes with many TFs. Furthermore, TF regulatory in-degree hubs tend to evolve significantly more quickly than TF non-hubs. Most importantly, the correlations between evolutionary rate (K(A)/K(S)) and degrees for TFs are significantly more positive than those for generic proteins within the same global protein-protein interaction and transcriptional regulatory networks. Compared to generic protein hubs, TF hubs operate at a higher level in the hierarchical structure of cellular networks, and hence experience additional evolutionary forces (relaxed negative selection or positive selection through network rewiring). The striking difference between the evolution of TF hubs and generic protein hubs demonstrates that components within the same global network can be governed by distinct organizational and evolutionary principles.


Assuntos
Evolução Molecular , Redes Reguladoras de Genes , Proteínas de Saccharomyces cerevisiae/metabolismo , Fatores de Transcrição/metabolismo , Regulação da Expressão Gênica , Mapeamento de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/genética , Fatores de Transcrição/genética
15.
BMC Bioinformatics ; 12: 409, 2011 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-22024143

RESUMO

BACKGROUND: With the development of genome-sequencing technologies, protein sequences are readily obtained by translating the measured mRNAs. Therefore predicting protein-protein interactions from the sequences is of great demand. The reason lies in the fact that identifying protein-protein interactions is becoming a bottleneck for eventually understanding the functions of proteins, especially for those organisms barely characterized. Although a few methods have been proposed, the converse problem, if the features used extract sufficient and unbiased information from protein sequences, is almost untouched. RESULTS: In this study, we interrogate this problem theoretically by an optimization scheme. Motivated by the theoretical investigation, we find novel encoding methods for both protein sequences and protein pairs. Our new methods exploit sufficiently the information of protein sequences and reduce artificial bias and computational cost. Thus, it significantly outperforms the available methods regarding sensitivity, specificity, precision, and recall with cross-validation evaluation and reaches ~80% and ~90% accuracy in Escherichia coli and Saccharomyces cerevisiae respectively. Our findings here hold important implication for other sequence-based prediction tasks because representation of biological sequence is always the first step in computational biology. CONCLUSIONS: By considering the converse problem, we propose new representation methods for both protein sequences and protein pairs. The results show that our method significantly improves the accuracy of protein-protein interaction predictions.


Assuntos
Mapeamento de Interação de Proteínas , Proteínas/metabolismo , Máquina de Vetores de Suporte , Sequência de Aminoácidos , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas/química , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Sensibilidade e Especificidade
16.
Bioinformatics ; 26(13): 1616-22, 2010 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-20483814

RESUMO

MOTIVATION: Protein-RNA interactions play a key role in a number of biological processes, such as protein synthesis, mRNA processing, mRNA assembly, ribosome function and eukaryotic spliceosomes. As a result, a reliable identification of RNA binding site of a protein is important for functional annotation and site-directed mutagenesis. Accumulated data of experimental protein-RNA interactions reveal that a RNA binding residue with different neighbor amino acids often exhibits different preferences for its RNA partners, which in turn can be assessed by the interacting interdependence of the amino acid fragment and RNA nucleotide. RESULTS: In this work, we propose a novel classification method to identify the RNA binding sites in proteins by combining a new interacting feature (interaction propensity) with other sequence- and structure-based features. Specifically, the interaction propensity represents a binding specificity of a protein residue to the interacting RNA nucleotide by considering its two-side neighborhood in a protein residue triplet. The sequence as well as the structure-based features of the residues are combined together to discriminate the interaction propensity of amino acids with RNA. We predict RNA interacting residues in proteins by implementing a well-built random forest classifier. The experiments show that our method is able to detect the annotated protein-RNA interaction sites in a high accuracy. Our method achieves an accuracy of 84.5%, F-measure of 0.85 and AUC of 0.92 prediction of the RNA binding residues for a dataset containing 205 non-homologous RNA binding proteins, and also outperforms several existing RNA binding residue predictors, such as RNABindR, BindN, RNAProB and PPRint, and some alternative machine learning methods, such as support vector machine, naive Bayes and neural network in the comparison study. Furthermore, we provide some biological insights into the roles of sequences and structures in protein-RNA interactions by both evaluating the importance of features for their contributions in predictive accuracy and analyzing the binding patterns of interacting residues. AVAILABILITY: All the source data and code are available at http://www.aporc.org/doc/wiki/PRNA or http://www.sysbio.ac.cn/datatools.asp CONTACT: lnchen@sibs.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas de Ligação a RNA/química , RNA/química , Análise de Sequência de Proteína , Inteligência Artificial , Biologia Computacional/métodos , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo
17.
Nucleic Acids Res ; 37(18): 5943-58, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19661283

RESUMO

Transcriptional cooperativity among several transcription factors (TFs) is believed to be the main mechanism of complexity and precision in transcriptional regulatory programs. Here, we present a Bayesian network framework to reconstruct a high-confidence whole-genome map of transcriptional cooperativity in Saccharomyces cerevisiae by integrating a comprehensive list of 15 genomic features. We design a Bayesian network structure to capture the dominant correlations among features and TF cooperativity, and introduce a supervised learning framework with a well-constructed gold-standard dataset. This framework allows us to assess the predictive power of each genomic feature, validate the superior performance of our Bayesian network compared to alternative methods, and integrate genomic features for optimal TF cooperativity prediction. Data integration reveals 159 high-confidence predicted cooperative relationships among 105 TFs, most of which are subsequently validated by literature search. The existing and predicted transcriptional cooperativities can be grouped into three categories based on the combination patterns of the genomic features, providing further biological insights into the different types of TF cooperativity. Our methodology is the first supervised learning approach for predicting transcriptional cooperativity, compares favorably to alternative unsupervised methodologies, and can be applied to other genomic data integration tasks where high-quality gold-standard positive data are scarce.


Assuntos
Redes Reguladoras de Genes , Genoma Fúngico , Saccharomyces cerevisiae/genética , Fatores de Transcrição/metabolismo , Transcrição Gênica , Teorema de Bayes
18.
BMC Bioinformatics ; 11: 26, 2010 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-20070902

RESUMO

BACKGROUND: The accumulation of high-throughput data greatly promotes computational investigation of gene function in the context of complex biological systems. However, a biological function is not simply controlled by an individual gene since genes function in a cooperative manner to achieve biological processes. In the study of human diseases, rather than to discover disease related genes, identifying disease associated pathways and modules becomes an essential problem in the field of systems biology. RESULTS: In this paper, we propose a novel method to detect disease related gene modules or dysfunctional pathways based on global characteristics of interactome coupled with gene expression data. Specifically, we exploit interacting relationships between genes to define a gene's active score function based on the kernel trick, which can represent nonlinear effects of gene cooperativity. Then, modules or pathways are inferred based on the active scores evaluated by the support vector regression in a global and integrative manner. The efficiency and robustness of the proposed method are comprehensively validated by using both simulated and real data with the comparison to existing methods. CONCLUSIONS: By applying the proposed method to two cancer related problems, i.e. breast cancer and prostate cancer, we successfully identified active modules or dysfunctional pathways related to these two types of cancers with literature confirmed evidences. We show that this network-based method is highly efficient and can be applied to a large-scale problem especially for human disease related modules or pathway extraction. Moreover, this method can also be used for prioritizing genes associated with a specific phenotype or disease.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Neoplasias/genética , Bases de Dados Genéticas , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo
19.
Amino Acids ; 39(2): 417-25, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20127263

RESUMO

Protein structure alignment algorithms play an important role in the studies of protein structure and function. In this paper, a novel approach for structure alignment is presented. Specifically, core regions in two protein structures are first aligned by identifying connected components in a network of neighboring geometrically compatible aligned fragment pairs. The initial alignments then are refined through a multi-objective optimization method. The algorithm can produce both sequential and non-sequential alignments. We show the superior performance of the proposed algorithm by the computational experiments on several benchmark datasets and the comparisons with the well-known structure alignment algorithms such as DALI, CE and MATT. The proposed method can obtain accurate and biologically significant alignment results for the case with occurrence of internal repeats or indels, identify the circular permutations, and reveal conserved functional sites. A ranking criterion of our algorithm for fold similarity is presented and found to be comparable or superior to the Z-score of CE in most cases from the numerical experiments. The software and supplementary data of computational results are available at http://zhangroup.aporc.org/bioinfo/SANA.


Assuntos
Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Bases de Dados de Proteínas , Modelos Moleculares , Software
20.
PLoS Comput Biol ; 5(9): e1000521, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19779549

RESUMO

One of the challenging problems in biology and medicine is exploring the underlying mechanisms of genetic diseases. Recent studies suggest that the relationship between genetic diseases and the aging process is important in understanding the molecular mechanisms of complex diseases. Although some intricate associations have been investigated for a long time, the studies are still in their early stages. In this paper, we construct a human disease-aging network to study the relationship among aging genes and genetic disease genes. Specifically, we integrate human protein-protein interactions (PPIs), disease-gene associations, aging-gene associations, and physiological system-based genetic disease classification information in a single graph-theoretic framework and find that (1) human disease genes are much closer to aging genes than expected by chance; and (2) diseases can be categorized into two types according to their relationships with aging. Type I diseases have their genes significantly close to aging genes, while type II diseases do not. Furthermore, we examine the topological characters of the disease-aging network from a systems perspective. Theoretical results reveal that the genes of type I diseases are in a central position of a PPI network while type II are not; (3) more importantly, we define an asymmetric closeness based on the PPI network to describe relationships between diseases, and find that aging genes make a significant contribution to associations among diseases, especially among type I diseases. In conclusion, the network-based study provides not only evidence for the intricate relationship between the aging process and genetic diseases, but also biological implications for prying into the nature of human diseases.


Assuntos
Envelhecimento/genética , Biologia Computacional/métodos , Doença/genética , Predisposição Genética para Doença , Modelos Genéticos , Mapeamento Cromossômico , Análise por Conglomerados , Humanos , Mapeamento de Interação de Proteínas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA