Pesquisa | Biblioteca Virtual em Saúde Fiocruz

Identification of co-evolving temporal networks.

Elhesha, Rasha; Sarkar, Aisharjya; Boucher, Christina; Kahveci, Tamer.

BMC Genomics ; 20(Suppl 6): 434, 2019 Jun 13.

Artigo em Inglês | MEDLINE | ID: mdl-31189471

RESUMO

BACKGROUND: Biological networks describes the mechanisms which govern cellular functions. Temporal networks show how these networks evolve over time. Studying the temporal progression of network topologies is of utmost importance since it uncovers how a network evolves and how it resists to external stimuli and internal variations. Two temporal networks have co-evolving subnetworks if the evolving topologies of these subnetworks remain similar to each other as the network topology evolves over a period of time. In this paper, we consider the problem of identifying co-evolving subnetworks given a pair of temporal networks, which aim to capture the evolution of molecules and their interactions over time. Although this problem shares some characteristics of the well-known network alignment problems, it differs from existing network alignment formulations as it seeks a mapping of the two network topologies that is invariant to temporal evolution of the given networks. This is a computationally challenging problem as it requires capturing not only similar topologies between two networks but also their similar evolution patterns. RESULTS: We present an efficient algorithm, Tempo, for solving identifying co-evolving subnetworks with two given temporal networks. We formally prove the correctness of our method. We experimentally demonstrate that Tempo scales efficiently with the size of network as well as the number of time points, and generates statistically significant alignments-even when evolution rates of given networks are high. Our results on a human aging dataset demonstrate that Tempo identifies novel genes contributing to the progression of Alzheimer's, Huntington's and Type II diabetes, while existing methods fail to do so. CONCLUSIONS: Studying temporal networks in general and human aging specifically using Tempo enables us to identify age related genes from non age related genes successfully. More importantly, Tempo takes the network alignment problem one huge step forward by moving beyond the classical static network models.

Assuntos

Algoritmos , Evolução Molecular , Redes Reguladoras de Genes , Redes e Vias Metabólicas , Adulto , Idoso , Idoso de 80 Anos ou mais , Envelhecimento , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Encéfalo/metabolismo , Biologia Computacional/métodos , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Humanos , Doença de Huntington/genética , Doença de Huntington/metabolismo , Pessoa de Meia-Idade , Mapeamento de Interação de Proteínas , Adulto Jovem

ProMotE: an efficient algorithm for counting independent motifs in uncertain network topologies.

Ren, Yuanfang; Sarkar, Aisharjya; Kahveci, Tamer.

BMC Bioinformatics ; 19(1): 242, 2018 06 26.

Artigo em Inglês | MEDLINE | ID: mdl-29940838

RESUMO

BACKGROUND: Identifying motifs in biological networks is essential in uncovering key functions served by these networks. Finding non-overlapping motif instances is however a computationally challenging task. The fact that biological interactions are uncertain events further complicates the problem, as it makes the existence of an embedding of a given motif an uncertain event as well. RESULTS: In this paper, we develop a novel method, ProMotE (Probabilistic Motif Embedding), to count non-overlapping embeddings of a given motif in probabilistic networks. We utilize a polynomial model to capture the uncertainty. We develop three strategies to scale our algorithm to large networks. CONCLUSIONS: Our experiments demonstrate that our method scales to large networks in practical time with high accuracy where existing methods fail. Moreover, our experiments on cancer and degenerative disease networks show that our method helps in uncovering key functional characteristics of biological networks.

Assuntos

Motivos de Aminoácidos/genética , Algoritmos

Optimal Supervised Reduction of High Dimensional Transcription Data.

Bailey, Richard; Sarkar, Aisharjya; Singh, Aaditya; Dobra, Alin; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 3093-3105, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37276117

RESUMO

The plight of navigating high-dimensional transcription datasets remains a persistent problem. This problem is further amplified for complex disorders, such as cancer as these disorders are often multigenic traits with multiple subsets of genes collectively affecting the type, stage, and severity of the trait. We are often faced with a trade off between reducing the dimensionality of our datasets and maintaining the integrity of our data. To accomplish both tasks simultaneously for very high dimensional transcriptome for complex multigenic traits, we propose a new supervised technique, Class Separation Transformation (CST). CST accomplishes both tasks simultaneously by significantly reducing the dimensionality of the input space into a one-dimensional transformed space that provides optimal separation between the differing classes. Furthermore, CST offers an means of explainable ML, as it computes the relative importance of each feature for its contribution to class distinction, which can thus lead to deeper insights and discovery. We compare our method with existing state-of-the-art methods using both real and synthetic datasets, demonstrating that CST is the more accurate, robust, scalable, and computationally advantageous technique relative to existing methods. Code used in this paper is available on https://github.com/richiebailey74/CST.

Assuntos

Transcriptoma , Fenótipo

Data Perturbation and Recovery of Time Series Gene Expression Data.

Sarkar, Aisharjya; Mishra, Prabhat; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 830-842, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-33566765

RESUMO

Cells, in order to regulate their activities, process transcripts by controlling which genes to transcribe and by what amount. The transcription level of genes often change over time. Rate of change of gene transcription varies between genes. It can even change for the same gene across different members of a population. Thus, for a given gene, it is important to study the transcription level not only at a single time point, but across multiple time points to capture changes in patterns of gene expression which underlies several phenotypic or external factors. In such a dataset perturbation can happen due to which it may have missing transcription values for different samples at different time points. In this paper, we define three data perturbation models that are significant with respect to random deletion. We also define a recovery method that recovers data loss in the perturbed dataset such that the error is minimized. Our experimental results show that the recovery method compensates for the loss made by perturbation models. We show by means of two measures, namely, normalized distance and Pearson's correlation coefficient that the distance between the original and perturbed dataset is more than the distance between original and recovered dataset.

Assuntos

Perfilação da Expressão Gênica , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Fatores de Tempo

Pattern Discovery in Multilayer Networks.

Ren, Yuanfang; Sarkar, Aisharjya; Veltri, Pierangelo; Ay, Ahmet; Dobra, Alin; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 741-752, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34398763

RESUMO

MOTIVATION: In bioinformatics, complex cellular modeling and behavior simulation to identify significant molecular interactions is considered a relevant problem. Traditional methods model such complex systems using single and binary network. However, this model is inadequate to represent biological networks as different sets of interactions can simultaneously take place for different interaction constraints (such as transcription regulation and protein interaction). Furthermore, biological systems may exhibit varying interaction topologies even for the same interaction type under different developmental stages or stress conditions. Therefore, models which consider biological systems as solitary interactions are inaccurate as they fail to capture the complex behavior of cellular interactions within organisms. Identification and counting of recurrent motifs within a network is one of the fundamental problems in biological network analysis. Existing methods for motif counting on single network topologies are inadequate to capture patterns of molecular interactions that have significant changes in biological expression when identified across different organisms that are similar, or even time-varying networks within the same organism. That is, they fail to identify recurrent interactions as they consider a single snapshot of a network among a set of multiple networks. Therefore, we need methods geared towards studying multiple network topologies and the pattern conservation among them. Contributions: In this paper, we consider the problem of counting the number of instances of a user supplied motif topology in a given multilayer network. We model interactions among a set of entities (e.g., genes)describing various conditions or temporal variation as multilayer networks. Thus a separate network as each layer shows the connectivity of the nodes under a unique network state. Existing motif counting and identification methods are limited to single network topologies, and thus cannot be directly applied on multilayer networks. We apply our model and algorithm to study frequent patterns in cellular networks that are common in varying cellular states under different stress conditions, where the cellular network topology under each stress condition describes a unique network layer. RESULTS: We develop a methodology and corresponding algorithm based on the proposed model for motif counting in multilayer networks. We performed experiments on both real and synthetic datasets. We modeled the synthetic datasets under a wide spectrum of parameters, such as network size, density, motif frequency. Results on synthetic datasets demonstrate that our algorithm finds motif embeddings with very high accuracy compared to existing state-of-the-art methods such as G-tries, ESU (FANMODE)and mfinder. Furthermore, we observe that our method runs from several times to several orders of magnitude faster than existing methods. For experiments on real dataset, we consider Escherichia coli (E. coli)transcription regulatory network under different experimental conditions. We observe that the genes selected by our method conserves functional characteristics under various stress conditions with very low false discovery rates. Moreover, the method is scalable to real networks in terms of both network size and number of layers.

Assuntos

Escherichia coli , Redes Reguladoras de Genes , Algoritmos , Biologia Computacional/métodos , Escherichia coli/genética , Redes Reguladoras de Genes/genética

ANCA: Alignment-Based Network Construction Algorithm.

Chow, Kevin; Sarkar, Aisharjya; Elhesha, Rasha; Cinaglia, Pietro; Ay, Ahmet; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 18(2): 512-524, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-31226082

RESUMO

Dynamic biological networks model changes in the network topology over time. However, often the topologies of these networks are not available at specific time points. Existing algorithms for studying dynamic networks often ignore this problem and focus only on the time points at which experimental data is available. In this paper, we develop a novel alignment based network construction algorithm, ANCA, that constructs the dynamic networks at the missing time points by exploiting the information from a reference dynamic network. Our experiments on synthetic and real networks demonstrate that ANCA predicts the missing target networks accurately, and scales to large-scale biological networks in practical time. Our analysis of an E. coli protein-protein interaction network shows that ANCA successfully identifies key temporal changes in the biological networks. Our analysis also suggests that by focusing on the topological differences in the network, our method can be used to find important genes and temporal functional changes in the biological networks.

Assuntos

Algoritmos , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Alinhamento de Sequência/métodos , Escherichia coli/genética , Mapas de Interação de Proteínas/genética

An Efficient Algorithm for Identifying Mutated Subnetworks Associated with Survival in Cancer.

Sarkar, Aisharjya; Atay, Yilmaz; Erickson, Alana Lorraine; Arisi, Ivan; Saltini, Cesare; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 17(5): 1582-1594, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-30990435

RESUMO

Protein-protein interaction (PPI) network models interconnections between protein-encoding genes. A group of proteins that perform similar functions are often connected to each other in the PPI network. The corresponding genes form pathways or functional modules. Mutation in protein-encoding genes affect behavior of pathways. This results in initiation, progression, and severity of diseases that propagates through pathways. In this work, we integrate mutation, survival information of patients, and PPI network to identify connected subnetworks associated with survival. We define the computational problem using a fitness function called log-rank statistic to score subnetworks. Log-rank statistic compares the survival between two populations. We propose a novel method, Survival Associated Mutated Subnetwork (SAMS) that adopts genetic algorithm strategy to find the connected subnetwork within the PPI network whose mutation yields highest log-rank statistic. We test on real cancer and synthetic datasets. SAMS generate solutions in negligible time while the state-of-art method in literature takes exponential time. Log-rank statistic of SAMS selected mutated subnetworks are comparable to the method. Our result genesets show significant overlap with well-known cancer driver genes derived from curated datasets and studies in literature, display high text-mining score in terms of number of citations combined with disease-specific keywords in PubMed, and identify pathways having high biological relevance.

Assuntos

Algoritmos , Mutação/genética , Neoplasias/genética , Neoplasias/mortalidade , Mapas de Interação de Proteínas/genética , Biologia Computacional/métodos , Variações do Número de Cópias de DNA/genética , Humanos

A New Algorithm for Counting Independent Motifs in Probabilistic Networks.

Sarkar, Aisharjya; Ren, Yuanfang; Elhesha, Rasha; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1049-1062, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-29994098

RESUMO

Biological networks provide great potential to understand how cells function. Motifs are topological patterns which are repeated frequently in a specific network. Network motifs are key structures through which biological networks operate. However, counting independent (i.e., non-overlapping) instances of a specific motif remains to be a computationally hard problem. Motif counting problem becomes computationally even harder for biological networks as biological interactions are uncertain events. The main challenge behind this problem is that different embeddings of a given motif in a network can share edges. Such edges can create complex computational dependencies between different instances of the given motif when considering uncertainty of those edges. In this paper, we develop a novel algorithm for counting independent instances of a specific motif topology in probabilistic biological networks. We present a novel mathematical model to capture the dependency between each embedding and all the other embeddings, which it overlaps with. We prove the correctness of this model. We evaluate our model on real and synthetic networks with different probability, and topology models as well as reasonable range of network sizes. Our results demonstrate that our method counts non-overlapping embeddings in practical time for a broad range of networks.

Assuntos

Biologia Computacional/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Algoritmos , Animais , Doenças Cardiovasculares/diagnóstico , Ciclo Celular , Gorilla gorilla , Humanos , Pan troglodytes , Pongo , Probabilidade , Especificidade da Espécie , Incerteza

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA