Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Bioinformatics ; 37(24): 4801-4809, 2021 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-34375392

RESUMO

MOTIVATION: The integration of multi-omic data using machine learning methods has been focused on solving relevant tasks such as predicting sensitivity to a drug or subtyping patients. Recent integration methods, such as joint Non-negative Matrix Factorization, have allowed researchers to exploit the information in the data to unravel the biological processes of multi-omic datasets. RESULTS: We present a novel method called Multi-project and Multi-profile joint Non-negative Matrix Factorization capable of integrating data from different sources, such as experimental and observational multi-omic data. The method can generate co-clusters between observations, predict profiles and relate latent variables. We applied the method to integrate low-grade glioma omic profiles from The Cancer Genome Atlas (TCGA) and Cancer Cell Line Encyclopedia projects. The method allowed us to find gene clusters mainly enriched in cancer-associated terms. We identified groups of patients and cell lines similar to each other by comparing biological processes. We predicted the drug profile for patients, and we identified genetic signatures for resistant and sensitive tumors to a specific drug. AVAILABILITY AND IMPLEMENTATION: Source code repository is publicly available at https:/bitbucket.org/dsalazarb/mmjnmf/-Zenodo DOI: 10.5281/zenodo.5150920. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Glioma , Humanos , Software , Genoma , Multiômica
2.
Bioinformatics ; 37(21): 3839-3847, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34213534

RESUMO

MOTIVATION: We are increasingly accumulating complex omics data that capture different aspects of cellular functioning. A key challenge is to untangle their complexity and effectively mine them for new biomedical information. To decipher this new information, we introduce algorithms based on network embeddings. Such algorithms represent biological macromolecules as vectors in d-dimensional space, in which topologically similar molecules are embedded close in space and knowledge is extracted directly by vector operations. Recently, it has been shown that neural networks used to obtain vectorial representations (embeddings) are implicitly factorizing a mutual information matrix, called Positive Pointwise Mutual Information (PPMI) matrix. Thus, we propose the use of the PPMI matrix to represent the human protein-protein interaction (PPI) network and also introduce the graphlet degree vector PPMI matrix of the PPI network to capture different topological (structural) similarities of the nodes in the molecular network. RESULTS: We generate the embeddings by decomposing these matrices with Nonnegative Matrix Tri-Factorization. We demonstrate that genes that are embedded close in these spaces have similar biological functions, so we can extract new biomedical knowledge directly by doing linear operations on their embedding vector representations. We exploit this property to predict new genes participating in protein complexes and to identify new cancer-related genes based on the cosine similarities between the vector representations of the genes. We validate 80% of our novel cancer-related gene predictions in the literature and also by patient survival curves that demonstrating that 93.3% of them have a potential clinical relevance as biomarkers of cancer. AVAILABILITY AND IMPLEMENTATION: Code and data are available online at https://gitlab.bsc.es/axenos/embedded-omics-data-geometry/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas , Humanos , Mapeamento de Interação de Proteínas/métodos , Redes Neurais de Computação , Mapas de Interação de Proteínas , Oncogenes
3.
Bioinformatics ; 36(Suppl_1): i455-i463, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657405

RESUMO

MOTIVATION: The structure of chromatin impacts gene expression. Its alteration has been shown to coincide with the occurrence of cancer. A key challenge is in understanding the role of chromatin structure (CS) in cellular processes and its implications in diseases. RESULTS: We propose a comparative pipeline to analyze CSs and apply it to study chronic lymphocytic leukemia (CLL). We model the chromatin of the affected and control cells as networks and analyze the network topology by state-of-the-art methods. Our results show that CSs are a rich source of new biological and functional information about DNA elements and cells that can complement protein-protein and co-expression data. Importantly, we show the existence of structural markers of cancer-related DNA elements in the chromatin. Surprisingly, CLL driver genes are characterized by specific local wiring patterns not only in the CS network of CLL cells, but also of healthy cells. This allows us to successfully predict new CLL-related DNA elements. Importantly, this shows that we can identify cancer-related DNA elements in other cancer types by investigating the CS network of the healthy cell of origin, a key new insight paving the road to new therapeutic strategies. This gives us an opportunity to exploit chromosome conformation data in healthy cells to predict new drivers. AVAILABILITY AND IMPLEMENTATION: Our predicted CLL genes and RNAs are provided as a free resource to the community at https://life.bsc.es/iconbi/chromatin/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Cromatina , Leucemia Linfocítica Crônica de Células B , Biomarcadores , DNA , Humanos , Leucemia Linfocítica Crônica de Células B/genética
4.
Breast Cancer Res Treat ; 143(2): 393-401, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24337538

RESUMO

Breast cancer accounts for more than 450,000 deaths per year worldwide. Discovery of novel therapeutic targets that will allow patient-tailored treatment of this disease is an emerging area of scientific interest. Recently, nicastrin has been identified as one such therapeutic target. Its overexpression is indicative of worse overall survival in the estrogen-receptor-negative patient population. In this paper, we analyze data from a large invasive breast carcinoma study and confirm nicastrin amplification. In search for genes that are co-amplified with nicastrin, we identify a potential novel breast cancer-related amplicon located on chromosome 1. Furthermore, we search for "influential interactors," i.e., genes that interact with a statistically significantly high number of genes which are co-amplified with nicastrin, and confirm their involvement in this female neoplasm. Among the influential interactors, we find genes which belong to the core diseasome (a recently identified therapeutically relevant set of genes which is known to drive disease formation) and propose that they might be important for breast cancer onset, and serve as its novel therapeutic targets. Finally, we identify a pathway that may play a role in nicastrin's amplification process and we experimentally confirm downstream signaling mechanism of nicastrin in breast cancer cells.


Assuntos
Secretases da Proteína Precursora do Amiloide/genética , Neoplasias da Mama/genética , Carcinoma Ductal de Mama/genética , Amplificação de Genes/genética , Glicoproteínas de Membrana/genética , Proteínas Adaptadoras da Sinalização Shc/genética , Animais , Linhagem Celular Tumoral , Feminino , Humanos , Camundongos , Camundongos Nus , Transplante de Neoplasias , Mapas de Interação de Proteínas/genética , Transdução de Sinais/genética , Proteína 1 de Transformação que Contém Domínio 2 de Homologia de Src , Transplante Heterólogo
5.
Bioinformatics ; 22(8): 974-80, 2006 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-16452112

RESUMO

MOTIVATION: Algorithmic and modeling advances in the area of protein-protein interaction (PPI) network analysis could contribute to the understanding of biological processes. Local structure of networks can be measured by the frequency distribution of graphlets, small connected non-isomorphic induced subgraphs. This measure of local structure has been used to show that high-confidence PPI networks have local structure of geometric random graphs. Finding graphlets exhaustively in a large network is computationally intensive. More complete PPI networks, as well as PPI networks of higher organisms, will thus require efficient heuristic approaches. RESULTS: We propose two efficient and scalable heuristics for finding graphlets in high-confidence PPI networks. We show that both PPI and their model geometric random networks, have defined boundaries that are sparser than the 'inner parts' of the networks. In addition, these networks exhibit 'uniformity' of local structure inside the networks. Our first heuristic exploits these two structural properties of PPI and geometric random networks to find good estimates of graphlet frequency distributions in these networks up to 690 times faster than the exhaustive searches. Our second heuristic is a variant of a more standard sampling technique and it produces accurate approximate results up to 377 times faster than the exhaustive searches. We indicate how the combination of these approaches may result in an even better heuristic. AVAILABILITY: Supplementary information is available at http://www.cs.toronto.edu/~natasha/BIOINF-2005-0946/Supplementary.pdf. Software implementing the algorithms is available at http://www.cs.toronto.edu/~natasha/BIOINF-2005-0946/estimate_grap-hlets.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Gráficos por Computador , Modelos Biológicos , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Transdução de Sinais/fisiologia , Simulação por Computador , Interpretação Estatística de Dados , Modelos Estatísticos , Distribuições Estatísticas
6.
Bioinformatics ; 20(18): 3508-15, 2004 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-15284103

RESUMO

MOTIVATION: Networks have been used to model many real-world phenomena to better understand the phenomena and to guide experiments in order to predict their behavior. Since incorrect models lead to incorrect predictions, it is vital to have as accurate a model as possible. As a result, new techniques and models for analyzing and modeling real-world networks have recently been introduced. RESULTS: One example of large and complex networks involves protein-protein interaction (PPI) networks. We analyze PPI networks of yeast Saccharomyces cerevisiae and fruitfly Drosophila melanogaster using a newly introduced measure of local network structure as well as the standardly used measures of global network structure. We examine the fit of four different network models, including Erdos-Renyi, scale-free and geometric random network models, to these PPI networks with respect to the measures of local and global network structure. We demonstrate that the currently accepted scale-free model of PPI networks fails to fit the data in several respects and show that a random geometric model provides a much more accurate model of the PPI data. We hypothesize that only the noise in these networks is scale-free. CONCLUSIONS: We systematically evaluate how well-different network models fit the PPI networks. We show that the structure of PPI networks is better modeled by a geometric random graph than by a scale-free model. SUPPLEMENTARY INFORMATION: Supplementary information is available at http://www.cs.utoronto.ca/~juris/data/data/ppiGRG04/


Assuntos
Proteínas de Drosophila/metabolismo , Modelos Biológicos , Mapeamento de Interação de Proteínas/métodos , Proteoma/metabolismo , Saccharomyces cerevisiae/metabolismo , Transdução de Sinais/fisiologia , Algoritmos , Animais , Simulação por Computador , Drosophila melanogaster/metabolismo , Regulação da Expressão Gênica/fisiologia , Proteínas de Saccharomyces cerevisiae/metabolismo
7.
Bioinformatics ; 20(17): 3013-20, 2004 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-15180928

RESUMO

MOTIVATION: Understanding principles of cellular organization and function can be enhanced if we detect known and predict still undiscovered protein complexes within the cell's protein-protein interaction (PPI) network. Such predictions may be used as an inexpensive tool to direct biological experiments. The increasing amount of available PPI data necessitates an accurate and scalable approach to protein complex identification. RESULTS: We have developed the Restricted Neighborhood Search Clustering Algorithm (RNSC) to efficiently partition networks into clusters using a cost function. We applied this cost-based clustering algorithm to PPI networks of Saccharomyces cerevisiae, Drosophila melanogaster and Caenorhabditis elegans to identify and predict protein complexes. We have determined functional and graph-theoretic properties of true protein complexes from the MIPS database. Based on these properties, we defined filters to distinguish between identified network clusters and true protein complexes. CONCLUSIONS: Our application of the cost-based clustering algorithm provides an accurate and scalable method of detecting and predicting protein complexes within a PPI network.


Assuntos
Algoritmos , Análise por Conglomerados , Modelos Biológicos , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Transdução de Sinais/fisiologia , Animais , Proteínas de Caenorhabditis elegans/metabolismo , Simulação por Computador , Proteínas de Drosophila/metabolismo , Complexos Multienzimáticos/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
8.
Bioinformatics ; 20(3): 340-8, 2004 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-14960460

RESUMO

MOTIVATION: The building blocks of biological networks are individual protein-protein interactions (PPIs). The cumulative PPI data set in Saccharomyces cerevisiae now exceeds 78 000. Studying the network of these interactions will provide valuable insight into the inner workings of cells. RESULTS: We performed a systematic graph theory-based analysis of this PPI network to construct computational models for describing and predicting the properties of lethal mutations and proteins participating in genetic interactions, functional groups, protein complexes and signaling pathways. Our analysis suggests that lethal mutations are not only highly connected within the network, but they also satisfy an additional property: their removal causes a disruption in network structure. We also provide evidence for the existence of alternate paths that bypass viable proteins in PPI networks, while such paths do not exist for lethal mutations. In addition, we show that distinct functional classes of proteins have differing network properties. We also demonstrate a way to extract and iteratively predict protein complexes and signaling pathways. We evaluate the power of predictions by comparing them with a random model, and assess accuracy of predictions by analyzing their overlap with MIPS database. CONCLUSIONS: Our models provide a means for understanding the complex wiring underlying cellular function, and enable us to predict essentiality, genetic interaction, function, protein complexes and cellular pathways. This analysis uncovers structure-function relationships observable in a large PPI network.


Assuntos
Algoritmos , Modelos Biológicos , Mapeamento de Interação de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/fisiologia , Transdução de Sinais/fisiologia , Análise por Conglomerados , Simulação por Computador , Complexos Multienzimáticos/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA