RESUMO
It is widely believed that comparing discrepancies in the protein-protein interaction (PPI) networks of individuals will become an important tool in understanding and preventing diseases. Currently PPI networks for individuals are not available, but gene expression data is becoming easier to obtain and allows us to represent individuals by a co-integrated gene expression/protein interaction network. Two major problems hamper the application of graph kernels - state-of-the-art methods for whole-graph comparison - to compare PPI networks. First, these methods do not scale to graphs of the size of a PPI network. Second, missing edges in these interaction networks are biologically relevant for detecting discrepancies, yet, these methods do not take this into account. In this article we present graph kernels for biological network comparison that are fast to compute and take into account missing interactions. We evaluate their practical performance on two datasets of co-integrated gene expression/PPI networks.
Assuntos
Mapeamento de Interação de Proteínas/estatística & dados numéricos , Biologia Computacional , Bases de Dados Genéticas , Progressão da Doença , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Prognóstico , Análise Serial de Proteínas/estatística & dados numéricosRESUMO
We present a kernel-based approach to the classification of time series of gene expression profiles. Our method takes into account the dynamic evolution over time as well as the temporal characteristics of the data. More specifically, we model the evolution of the gene expression profiles as a Linear Time Invariant (LTI) dynamical system and estimate its model parameters. A kernel on dynamical systems is then used to classify these time series. We successfully test our approach on a published dataset to predict response to drug therapy in Multiple Sclerosis patients. For pharmacogenomics, our method offers a huge potential for advanced computational tools in disease diagnosis, and disease and drug therapy outcome prognosis.
Assuntos
Perfilação da Expressão Gênica/estatística & dados numéricos , Inteligência Artificial , Biologia Computacional , Bases de Dados Genéticas , Humanos , Modelos Lineares , Modelos Genéticos , Esclerose Múltipla/tratamento farmacológico , Esclerose Múltipla/genética , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Farmacogenética/estatística & dados numéricos , Fatores de TempoRESUMO
MOTIVATION: Computational approaches to protein function prediction infer protein function by finding proteins with similar sequence, structure, surface clefts, chemical properties, amino acid motifs, interaction partners or phylogenetic profiles. We present a new approach that combines sequential, structural and chemical information into one graph model of proteins. We predict functional class membership of enzymes and non-enzymes using graph kernels and support vector machine classification on these protein graphs. RESULTS: Our graph model, derivable from protein sequence and structure only, is competitive with vector models that require additional protein information, such as the size of surface pockets. If we include this extra information into our graph model, our classifier yields significantly higher accuracy levels than the vector models. Hyperkernels allow us to select and to optimally combine the most relevant node attributes in our protein graphs. We have laid the foundation for a protein function prediction system that integrates protein information from various sources efficiently and effectively. AVAILABILITY: More information available via www.dbs.ifi.lmu.de/Mitarbeiter/borgwardt.html.