Pesquisa | Portal Regional da BVS

1.

Morphological profiling for drug discovery in the era of deep learning.

Tang, Qiaosi; Ratnayake, Ranjala; Seabra, Gustavo; Jiang, Zhe; Fang, Ruogu; Cui, Lina; Ding, Yousong; Kahveci, Tamer; Bian, Jiang; Li, Chenglong; Luesch, Hendrik; Li, Yanjun.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38886164

RESUMO

Morphological profiling is a valuable tool in phenotypic drug discovery. The advent of high-throughput automated imaging has enabled the capturing of a wide range of morphological features of cells or organisms in response to perturbations at the single-cell resolution. Concurrently, significant advances in machine learning and deep learning, especially in computer vision, have led to substantial improvements in analyzing large-scale high-content images at high throughput. These efforts have facilitated understanding of compound mechanism of action, drug repurposing, characterization of cell morphodynamics under perturbation, and ultimately contributing to the development of novel therapeutics. In this review, we provide a comprehensive overview of the recent advances in the field of morphological profiling. We summarize the image profiling analysis workflow, survey a broad spectrum of analysis strategies encompassing feature engineering- and deep learning-based approaches, and introduce publicly available benchmark datasets. We place a particular emphasis on the application of deep learning in this pipeline, covering cell segmentation, image representation learning, and multimodal learning. Additionally, we illuminate the application of morphological profiling in phenotypic drug discovery and highlight potential challenges and opportunities in this field.

Assuntos

Aprendizado Profundo , Descoberta de Drogas , Descoberta de Drogas/métodos , Humanos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina

2.

Morphological Profiling for Drug Discovery in the Era of Deep Learning.

Tang, Qiaosi; Ratnayake, Ranjala; Seabra, Gustavo; Jiang, Zhe; Fang, Ruogu; Cui, Lina; Ding, Yousong; Kahveci, Tamer; Bian, Jiang; Li, Chenglong; Luesch, Hendrik; Li, Yanjun.

ArXiv ; 2024 Jan 15.

Artigo em Inglês | MEDLINE | ID: mdl-38168460

RESUMO

Morphological profiling is a valuable tool in phenotypic drug discovery. The advent of high-throughput automated imaging has enabled the capturing of a wide range of morphological features of cells or organisms in response to perturbations at the single-cell resolution. Concurrently, significant advances in machine learning and deep learning, especially in computer vision, have led to substantial improvements in analyzing large-scale high-content images at high-throughput. These efforts have facilitated understanding of compound mechanism-of-action (MOA), drug repurposing, characterization of cell morphodynamics under perturbation, and ultimately contributing to the development of novel therapeutics. In this review, we provide a comprehensive overview of the recent advances in the field of morphological profiling. We summarize the image profiling analysis workflow, survey a broad spectrum of analysis strategies encompassing feature engineering- and deep learning-based approaches, and introduce publicly available benchmark datasets. We place a particular emphasis on the application of deep learning in this pipeline, covering cell segmentation, image representation learning, and multimodal learning. Additionally, we illuminate the application of morphological profiling in phenotypic drug discovery and highlight potential challenges and opportunities in this field.

3.

QuTIE: quantum optimization for target identification by enzymes.

Ngo, Hoang M; Thai, My T; Kahveci, Tamer.

Bioinform Adv ; 3(1): vbad112, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37786534

RESUMO

Summary: Target identification by enzymes (TIE) problem aims to identify the set of enzymes in a given metabolic network, such that their inhibition eliminates a given set of target compounds associated with a disease while incurring minimum damage to the rest of the compounds. This is a NP-hard problem, and thus optimal solutions using classical computers fail to scale to large metabolic networks. In this article, we develop the first quantum optimization solution, called QuTIE (quantum optimization for target identification by enzymes), to this NP-hard problem. We do that by developing an equivalent formulation of the TIE problem in quadratic unconstrained binary optimization form. We then map it to a logical graph, and embed the logical graph on a quantum hardware graph. Our experimental results on 27 metabolic networks from Escherichia coli, Homo sapiens, and Mus musculus show that QuTIE yields solutions that are optimal or almost optimal. Our experiments also demonstrate that QuTIE can successfully identify enzyme targets already verified in wet-lab experiments for 14 major disease classes. Availability and implementation: Code and sample data are available at: https://github.com/ngominhhoang/Quantum-Target-Identification-by-Enzymes.

4.

Optimal Supervised Reduction of High Dimensional Transcription Data.

Bailey, Richard; Sarkar, Aisharjya; Singh, Aaditya; Dobra, Alin; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 3093-3105, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37276117

RESUMO

The plight of navigating high-dimensional transcription datasets remains a persistent problem. This problem is further amplified for complex disorders, such as cancer as these disorders are often multigenic traits with multiple subsets of genes collectively affecting the type, stage, and severity of the trait. We are often faced with a trade off between reducing the dimensionality of our datasets and maintaining the integrity of our data. To accomplish both tasks simultaneously for very high dimensional transcriptome for complex multigenic traits, we propose a new supervised technique, Class Separation Transformation (CST). CST accomplishes both tasks simultaneously by significantly reducing the dimensionality of the input space into a one-dimensional transformed space that provides optimal separation between the differing classes. Furthermore, CST offers an means of explainable ML, as it computes the relative importance of each feature for its contribution to class distinction, which can thus lead to deeper insights and discovery. We compare our method with existing state-of-the-art methods using both real and synthetic datasets, demonstrating that CST is the more accurate, robust, scalable, and computationally advantageous technique relative to existing methods. Code used in this paper is available on https://github.com/richiebailey74/CST.

Assuntos

Transcriptoma , Fenótipo

5.

AMR-meta: a k-mer and metafeature approach to classify antimicrobial resistance from high-throughput short-read metagenomics data.

Marini, Simone; Oliva, Marco; Slizovskiy, Ilya B; Das, Rishabh A; Noyes, Noelle Robertson; Kahveci, Tamer; Boucher, Christina; Prosperi, Mattia.

Gigascience ; 112022 05 18.

Artigo em Inglês | MEDLINE | ID: mdl-35583675

RESUMO

BACKGROUND: Antimicrobial resistance (AMR) is a global health concern. High-throughput metagenomic sequencing of microbial samples enables profiling of AMR genes through comparison with curated AMR databases. However, the performance of current methods is often hampered by database incompleteness and the presence of homology/homoplasy with other non-AMR genes in sequenced samples. RESULTS: We present AMR-meta, a database-free and alignment-free approach, based on k-mers, which combines algebraic matrix factorization into metafeatures with regularized regression. Metafeatures capture multi-level gene diversity across the main antibiotic classes. AMR-meta takes in reads from metagenomic shotgun sequencing and outputs predictions about whether those reads contribute to resistance against specific classes of antibiotics. In addition, AMR-meta uses an augmented training strategy that joins an AMR gene database with non-AMR genes (used as negative examples). We compare AMR-meta with AMRPlusPlus, DeepARG, and Meta-MARC, further testing their ensemble via a voting system. In cross-validation, AMR-meta has a median f-score of 0.7 (interquartile range, 0.2-0.9). On semi-synthetic metagenomic data-external test-on average AMR-meta yields a 1.3-fold hit rate increase over existing methods. In terms of run-time, AMR-meta is 3 times faster than DeepARG, 30 times faster than Meta-MARC, and as fast as AMRPlusPlus. Finally, we note that differences in AMR ontologies and observed variance of all tools in classification outputs call for further development on standardization of benchmarking data and protocols. CONCLUSIONS: AMR-meta is a fast, accurate classifier that exploits non-AMR negative sets to improve sensitivity and specificity. The differences in AMR ontologies and the high variance of all tools in classification outputs call for the deployment of standard benchmarking data and protocols, to fairly compare AMR prediction tools.

Assuntos

Antibacterianos , Metagenômica , Antibacterianos/farmacologia , Farmacorresistência Bacteriana/genética , Sequenciamento de Nucleotídeos em Larga Escala , Metagenoma , Metagenômica/métodos

6.

Data Perturbation and Recovery of Time Series Gene Expression Data.

Sarkar, Aisharjya; Mishra, Prabhat; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 830-842, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-33566765

RESUMO

Cells, in order to regulate their activities, process transcripts by controlling which genes to transcribe and by what amount. The transcription level of genes often change over time. Rate of change of gene transcription varies between genes. It can even change for the same gene across different members of a population. Thus, for a given gene, it is important to study the transcription level not only at a single time point, but across multiple time points to capture changes in patterns of gene expression which underlies several phenotypic or external factors. In such a dataset perturbation can happen due to which it may have missing transcription values for different samples at different time points. In this paper, we define three data perturbation models that are significant with respect to random deletion. We also define a recovery method that recovers data loss in the perturbed dataset such that the error is minimized. Our experimental results show that the recovery method compensates for the loss made by perturbation models. We show by means of two measures, namely, normalized distance and Pearson's correlation coefficient that the distance between the original and perturbed dataset is more than the distance between original and recovered dataset.

Assuntos

Perfilação da Expressão Gênica , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Fatores de Tempo

7.

Pattern Discovery in Multilayer Networks.

Ren, Yuanfang; Sarkar, Aisharjya; Veltri, Pierangelo; Ay, Ahmet; Dobra, Alin; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 741-752, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34398763

RESUMO

MOTIVATION: In bioinformatics, complex cellular modeling and behavior simulation to identify significant molecular interactions is considered a relevant problem. Traditional methods model such complex systems using single and binary network. However, this model is inadequate to represent biological networks as different sets of interactions can simultaneously take place for different interaction constraints (such as transcription regulation and protein interaction). Furthermore, biological systems may exhibit varying interaction topologies even for the same interaction type under different developmental stages or stress conditions. Therefore, models which consider biological systems as solitary interactions are inaccurate as they fail to capture the complex behavior of cellular interactions within organisms. Identification and counting of recurrent motifs within a network is one of the fundamental problems in biological network analysis. Existing methods for motif counting on single network topologies are inadequate to capture patterns of molecular interactions that have significant changes in biological expression when identified across different organisms that are similar, or even time-varying networks within the same organism. That is, they fail to identify recurrent interactions as they consider a single snapshot of a network among a set of multiple networks. Therefore, we need methods geared towards studying multiple network topologies and the pattern conservation among them. Contributions: In this paper, we consider the problem of counting the number of instances of a user supplied motif topology in a given multilayer network. We model interactions among a set of entities (e.g., genes)describing various conditions or temporal variation as multilayer networks. Thus a separate network as each layer shows the connectivity of the nodes under a unique network state. Existing motif counting and identification methods are limited to single network topologies, and thus cannot be directly applied on multilayer networks. We apply our model and algorithm to study frequent patterns in cellular networks that are common in varying cellular states under different stress conditions, where the cellular network topology under each stress condition describes a unique network layer. RESULTS: We develop a methodology and corresponding algorithm based on the proposed model for motif counting in multilayer networks. We performed experiments on both real and synthetic datasets. We modeled the synthetic datasets under a wide spectrum of parameters, such as network size, density, motif frequency. Results on synthetic datasets demonstrate that our algorithm finds motif embeddings with very high accuracy compared to existing state-of-the-art methods such as G-tries, ESU (FANMODE)and mfinder. Furthermore, we observe that our method runs from several times to several orders of magnitude faster than existing methods. For experiments on real dataset, we consider Escherichia coli (E. coli)transcription regulatory network under different experimental conditions. We observe that the genes selected by our method conserves functional characteristics under various stress conditions with very low false discovery rates. Moreover, the method is scalable to real networks in terms of both network size and number of layers.

Assuntos

Escherichia coli , Redes Reguladoras de Genes , Algoritmos , Biologia Computacional/métodos , Escherichia coli/genética , Redes Reguladoras de Genes/genética

8.

Prediction of host-pathogen protein interactions by extended network model.

Kösesoy, Irfan; Gök, Murat; Kahveci, Tamer.

Turk J Biol ; 45(2): 138-148, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33907496

RESUMO

Knowledge of the pathogen-host interactions between the species is essentialin order to develop a solution strategy against infectious diseases. In vitro methods take extended periods of time to detect interactions and provide very few of the possible interaction pairs. Hence, modelling interactions between proteins has necessitated the development of computational methods. The main scope of this paper is integrating the known protein interactions between thehost and pathogen organisms to improve the prediction success rate of unknown pathogen-host interactions. Thus, the truepositive rate of the predictions was expected to increase.In order to perform this study extensively, encoding methods and learning algorithms of several proteins were tested. Along with human as the host organism, two different pathogen organisms were used in the experiments. For each combination of protein-encoding and prediction method, both the original prediction algorithms were tested using only pathogen-host interactions and the same methodwas testedagain after integrating the known protein interactions within each organism. The effect of merging the networks of pathogen-host interactions of different species on the prediction performance of state-of-the-art methods was also observed. Successwas measured in terms of Matthews correlation coefficient, precision, recall, F1 score, and accuracy metrics. Empirical results showed that integrating the host and pathogen interactions yields better performance consistently in almost all experiments.

9.

ANCA: Alignment-Based Network Construction Algorithm.

Chow, Kevin; Sarkar, Aisharjya; Elhesha, Rasha; Cinaglia, Pietro; Ay, Ahmet; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 18(2): 512-524, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-31226082

RESUMO

Dynamic biological networks model changes in the network topology over time. However, often the topologies of these networks are not available at specific time points. Existing algorithms for studying dynamic networks often ignore this problem and focus only on the time points at which experimental data is available. In this paper, we develop a novel alignment based network construction algorithm, ANCA, that constructs the dynamic networks at the missing time points by exploiting the information from a reference dynamic network. Our experiments on synthetic and real networks demonstrate that ANCA predicts the missing target networks accurately, and scales to large-scale biological networks in practical time. Our analysis of an E. coli protein-protein interaction network shows that ANCA successfully identifies key temporal changes in the biological networks. Our analysis also suggests that by focusing on the topological differences in the network, our method can be used to find important genes and temporal functional changes in the biological networks.

Assuntos

Algoritmos , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Alinhamento de Sequência/métodos , Escherichia coli/genética , Mapas de Interação de Proteínas/genética

10.

Selected Research Articles from the 2019 International Workshop on Computational Network Biology: Modeling, Analysis, and Control (CNB-MAC).

Yoon, Byung-Jun; Qian, Xiaoning; Kahveci, Tamer; Pal, Ranadip.

BMC Genomics ; 21(Suppl 9): 584, 2020 Sep 07.

Artigo em Inglês | MEDLINE | ID: mdl-32900374

Assuntos

Biologia Computacional

11.

An Efficient Algorithm for Identifying Mutated Subnetworks Associated with Survival in Cancer.

Sarkar, Aisharjya; Atay, Yilmaz; Erickson, Alana Lorraine; Arisi, Ivan; Saltini, Cesare; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 17(5): 1582-1594, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-30990435

RESUMO

Protein-protein interaction (PPI) network models interconnections between protein-encoding genes. A group of proteins that perform similar functions are often connected to each other in the PPI network. The corresponding genes form pathways or functional modules. Mutation in protein-encoding genes affect behavior of pathways. This results in initiation, progression, and severity of diseases that propagates through pathways. In this work, we integrate mutation, survival information of patients, and PPI network to identify connected subnetworks associated with survival. We define the computational problem using a fitness function called log-rank statistic to score subnetworks. Log-rank statistic compares the survival between two populations. We propose a novel method, Survival Associated Mutated Subnetwork (SAMS) that adopts genetic algorithm strategy to find the connected subnetwork within the PPI network whose mutation yields highest log-rank statistic. We test on real cancer and synthetic datasets. SAMS generate solutions in negligible time while the state-of-art method in literature takes exponential time. Log-rank statistic of SAMS selected mutated subnetworks are comparable to the method. Our result genesets show significant overlap with well-known cancer driver genes derived from curated datasets and studies in literature, display high text-mining score in terms of number of citations combined with disease-specific keywords in PubMed, and identify pathways having high biological relevance.

Assuntos

Algoritmos , Mutação/genética , Neoplasias/genética , Neoplasias/mortalidade , Mapas de Interação de Proteínas/genética , Biologia Computacional/métodos , Variações do Número de Cópias de DNA/genética , Humanos

12.

Stability Analysis of Biological Networks' Diffusion State.

Altuntas, Volkan; Gok, Murat; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 17(4): 1406-1418, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-30452376

RESUMO

Computational knowledge acquired from noisy networks is not reliable and the network topology determines the reliability. Protein-protein interaction networks have uncertain topologies and noise that contain false positive and false negative edges at high rates. In this study, we analyze effects of the existing mutations in a network topology to the diffusion state of that network. To evaluate the sensitivity of the diffusion state, we derive the fitness measures based on the mathematically defined stability of a network. Searching for an influential set of edges in a network is a difficult problem. We handle the computational challenge by developing a novel metaheuristic optimization method and we find influential mutations time-efficiently. Our experiments, conducted on both synthetic and real networks from public databases, demonstrated that our method obtained better results than competitors for all types of network topologies. This is the first-time that the diffusion has been evaluated under topological mutations. Our analysis identifies significant biological results about the stability of biological - synthetic networks and diffusion state. In this manner, mutations in protein-protein interaction network topologies have a significant influence on the diffusion state of the network. Network stability is more affected by the network model than the network size.

Assuntos

Biologia Computacional/métodos , Modelos Biológicos , Algoritmos , Animais , Bases de Dados de Proteínas , Difusão , Humanos , Mapas de Interação de Proteínas , Biologia Sintética

13.

Replication timing networks reveal a link between transcription regulatory circuits and replication timing control.

Rivera-Mulia, Juan Carlos; Kim, Sebo; Gabr, Haitham; Chakraborty, Abhijit; Ay, Ferhat; Kahveci, Tamer; Gilbert, David M.

Genome Res ; 29(9): 1415-1428, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31434679

RESUMO

DNA replication occurs in a defined temporal order known as the replication timing (RT) program and is regulated during development, coordinated with 3D genome organization and transcriptional activity. However, transcription and RT are not sufficiently coordinated to predict each other, suggesting an indirect relationship. Here, we exploit genome-wide RT profiles from 15 human cell types and intermediate differentiation stages derived from human embryonic stem cells to construct different types of RT regulatory networks. First, we constructed networks based on the coordinated RT changes during cell fate commitment to create highly complex RT networks composed of thousands of interactions that form specific functional subnetwork communities. We also constructed directional regulatory networks based on the order of RT changes within cell lineages, and identified master regulators of differentiation pathways. Finally, we explored relationships between RT networks and transcriptional regulatory networks (TRNs) by combining them into more complex circuitries of composite and bipartite networks. Results identified novel trans interactions linking transcription factors that are core to the regulatory circuitry of each cell type to RT changes occurring in those cell types. These core transcription factors were found to bind cooperatively to sites in the affected replication domains, providing provocative evidence that they constitute biologically significant directional interactions. Our findings suggest a regulatory link between the establishment of cell-type-specific TRNs and RT control during lineage specification.

Assuntos

Período de Replicação do DNA , Células-Tronco Embrionárias/citologia , Fatores de Transcrição/metabolismo , Diferenciação Celular , Linhagem da Célula , Células Cultivadas , DNA/metabolismo , Células-Tronco Embrionárias/química , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Humanos , Transcrição Gênica

14.

Characterizing building blocks of resource constrained biological networks.

Ren, Yuanfang; Ay, Ahmet; Dobra, Alin; Kahveci, Tamer.

BMC Bioinformatics ; 20(Suppl 12): 318, 2019 Jun 20.

Artigo em Inglês | MEDLINE | ID: mdl-31216986

RESUMO

BACKGROUND: Identification of motifs-recurrent and statistically significant patterns-in biological networks is the key to understand the design principles, and to infer governing mechanisms of biological systems. This, however, is a computationally challenging task. This task is further complicated as biological interactions depend on limited resources, i.e., a reaction takes place if the reactant molecule concentrations are above a certain threshold level. This biochemical property implies that network edges can participate in a limited number of motifs simultaneously. Existing motif counting methods ignore this problem. This simplification often leads to inaccurate motif counts (over- or under-estimates), and thus, wrong biological interpretations. RESULTS: In this paper, we develop a novel motif counting algorithm, Partially Overlapping MOtif Counting (POMOC), that considers capacity levels for all interactions in counting motifs. CONCLUSIONS: Our experiments on real and synthetic networks demonstrate that motif count using the POMOC method significantly differs from the existing motif counting approaches, and our method extends to large-scale biological networks in practical time. Our results also show that our method makes it possible to characterize the impact of different stress factors on cell's organization of network. In this regard, analysis of a S. cerevisiae transcriptional regulatory network using our method shows that oxidative stress is more disruptive to organization and abundance of motifs in this network than mutations of individual genes. Our analysis also suggests that by focusing on the edges that lead to variation in motif counts, our method can be used to find important genes, and to reveal subtle topological and functional differences of the biological networks under different cell states.

Assuntos

Redes Reguladoras de Genes/genética , Saccharomyces cerevisiae/genética , Algoritmos , Bases de Dados Genéticas , Genes Fúngicos , Modelos Biológicos , Estresse Oxidativo/genética

15.

Selected research articles from the 2018 International Workshop on Computational Network Biology: Modeling, Analysis, and Control (CNB-MAC).

Yoon, Byung-Jun; Qian, Xiaoning; Kahveci, Tamer; Pal, Ranadip.

BMC Bioinformatics ; 20(Suppl 12): 316, 2019 06 20.

Artigo em Inglês | MEDLINE | ID: mdl-31217001

16.

Identification of co-evolving temporal networks.

Elhesha, Rasha; Sarkar, Aisharjya; Boucher, Christina; Kahveci, Tamer.

BMC Genomics ; 20(Suppl 6): 434, 2019 Jun 13.

Artigo em Inglês | MEDLINE | ID: mdl-31189471

RESUMO

BACKGROUND: Biological networks describes the mechanisms which govern cellular functions. Temporal networks show how these networks evolve over time. Studying the temporal progression of network topologies is of utmost importance since it uncovers how a network evolves and how it resists to external stimuli and internal variations. Two temporal networks have co-evolving subnetworks if the evolving topologies of these subnetworks remain similar to each other as the network topology evolves over a period of time. In this paper, we consider the problem of identifying co-evolving subnetworks given a pair of temporal networks, which aim to capture the evolution of molecules and their interactions over time. Although this problem shares some characteristics of the well-known network alignment problems, it differs from existing network alignment formulations as it seeks a mapping of the two network topologies that is invariant to temporal evolution of the given networks. This is a computationally challenging problem as it requires capturing not only similar topologies between two networks but also their similar evolution patterns. RESULTS: We present an efficient algorithm, Tempo, for solving identifying co-evolving subnetworks with two given temporal networks. We formally prove the correctness of our method. We experimentally demonstrate that Tempo scales efficiently with the size of network as well as the number of time points, and generates statistically significant alignments-even when evolution rates of given networks are high. Our results on a human aging dataset demonstrate that Tempo identifies novel genes contributing to the progression of Alzheimer's, Huntington's and Type II diabetes, while existing methods fail to do so. CONCLUSIONS: Studying temporal networks in general and human aging specifically using Tempo enables us to identify age related genes from non age related genes successfully. More importantly, Tempo takes the network alignment problem one huge step forward by moving beyond the classical static network models.

Assuntos

Algoritmos , Evolução Molecular , Redes Reguladoras de Genes , Redes e Vias Metabólicas , Adulto , Idoso , Idoso de 80 Anos ou mais , Envelhecimento , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Encéfalo/metabolismo , Biologia Computacional/métodos , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Humanos , Doença de Huntington/genética , Doença de Huntington/metabolismo , Pessoa de Meia-Idade , Mapeamento de Interação de Proteínas , Adulto Jovem

17.

Aligning optical maps to de Bruijn graphs.

Mukherjee, Kingshuk; Alipanahi, Bahar; Kahveci, Tamer; Salmela, Leena; Boucher, Christina.

Bioinformatics ; 35(18): 3250-3256, 2019 09 15.

Artigo em Inglês | MEDLINE | ID: mdl-30698651

RESUMO

MOTIVATION: Optical maps are high-resolution restriction maps (Rmaps) that give a unique numeric representation to a genome. Used in concert with sequence reads, they provide a useful tool for genome assembly and for discovering structural variations and rearrangements. Although they have been a regular feature of modern genome assembly projects, optical maps have been mainly used in post-processing step and not in the genome assembly process itself. Several methods have been proposed for pairwise alignment of single molecule optical maps-called Rmaps, or for aligning optical maps to assembled reads. However, the problem of aligning an Rmap to a graph representing the sequence data of the same genome has not been studied before. Such an alignment provides a mapping between two sets of data: optical maps and sequence data which will facilitate the usage of optical maps in the sequence assembly step itself. RESULTS: We define the problem of aligning an Rmap to a de Bruijn graph and present the first algorithm for solving this problem which is based on a seed-and-extend approach. We demonstrate that our method is capable of aligning 73% of Rmaps generated from the Escherichia coli genome to the de Bruijn graph constructed from short reads generated from the same genome. We validate the alignments and show that our method achieves an accuracy of 99.6%. We also show that our method scales to larger genomes. In particular, we show that 76% of Rmaps can be aligned to the de Bruijn graph in the case of human data. AVAILABILITY AND IMPLEMENTATION: The software for aligning optical maps to de Bruijn graph, omGraph is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/omGraph. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Genoma , Mapeamento por Restrição , Análise de Sequência de DNA

18.

A New Algorithm for Counting Independent Motifs in Probabilistic Networks.

Sarkar, Aisharjya; Ren, Yuanfang; Elhesha, Rasha; Kahveci, Tamer.

IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1049-1062, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-29994098

RESUMO

Biological networks provide great potential to understand how cells function. Motifs are topological patterns which are repeated frequently in a specific network. Network motifs are key structures through which biological networks operate. However, counting independent (i.e., non-overlapping) instances of a specific motif remains to be a computationally hard problem. Motif counting problem becomes computationally even harder for biological networks as biological interactions are uncertain events. The main challenge behind this problem is that different embeddings of a given motif in a network can share edges. Such edges can create complex computational dependencies between different instances of the given motif when considering uncertainty of those edges. In this paper, we develop a novel algorithm for counting independent instances of a specific motif topology in probabilistic biological networks. We present a novel mathematical model to capture the dependency between each embedding and all the other embeddings, which it overlaps with. We prove the correctness of this model. We evaluate our model on real and synthetic networks with different probability, and topology models as well as reasonable range of network sizes. Our results demonstrate that our method counts non-overlapping embeddings in practical time for a broad range of networks.

Assuntos

Biologia Computacional/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Algoritmos , Animais , Doenças Cardiovasculares/diagnóstico , Ciclo Celular , Gorilla gorilla , Humanos , Pan troglodytes , Pongo , Probabilidade , Especificidade da Espécie , Incerteza

19.

Shortest path counting in probabilistic biological networks.

Ren, Yuanfang; Ay, Ahmet; Kahveci, Tamer.

BMC Bioinformatics ; 19(1): 465, 2018 Dec 04.

Artigo em Inglês | MEDLINE | ID: mdl-30514202

RESUMO

BACKGROUND: Biological regulatory networks, representing the interactions between genes and their products, control almost every biological activity in the cell. Shortest path search is critical to apprehend the structure of these networks, and to detect their key components. Counting the number of shortest paths between pairs of genes in biological networks is a polynomial time problem. The fact that biological interactions are uncertain events however drastically complicates the problem, as it makes the topology of a given network uncertain. RESULTS: In this paper, we develop a novel method to count the number of shortest paths between two nodes in probabilistic networks. Unlike earlier approaches, which uses the shortest path counting methods that are specifically designed for deterministic networks, our method builds a new mathematical model to express and compute the number of shortest paths. We prove the correctness of this model. CONCLUSIONS: We compare our novel method to three existing shortest path counting methods on synthetic and real gene regulatory networks. Our experiments demonstrate that our method is scalable, and it outperforms the existing methods in accuracy. Application of our shortest path counting method to detect communities in probabilistic networks shows that our method successfully finds communities in probabilistic networks. Moreover, our experiments on cell cycle pathway among different cancer types exhibit that our method helps in uncovering key functional characteristics of biological networks.

Assuntos

Produtos Biológicos/metabolismo , Redes Reguladoras de Genes/genética , Humanos

20.

Identification of jointly correlated gene sets.

Ren, Yuanfang; Ay, Ahmet; Gerke, Travis A; Kahveci, Tamer.

J Bioinform Comput Biol ; 16(5): 1840019, 2018 10.

Artigo em Inglês | MEDLINE | ID: mdl-30419787

RESUMO

Associations between expressions of genes play a key role in deciphering their functions. Correlation score between pairs of genes is often utilized to associate two genes. However, the relationship between genes is often more complex; multiple genes might collaborate to control the transcription of a gene. In this paper, we introduce the problem of searching pairs of genes, which collectively correlate with another gene. This problem is computationally much harder than the classical problem of identifying pairwise gene associations. Exhaustive search is infeasible for transcriptomic datasets also; since for [Formula: see text] genes, there are [Formula: see text] possible gene combinations. Our method builds three filters to avoid computing the association for a large fraction of the gene combinations, which do not produce high correlation. Our experiments on a synthetic dataset and a prostate cancer dataset demonstrate that our method produces accurate results at the transcriptome level in practical time. Moreover, our method identifies biologically novel results which classical pairwise gene association studies are unlikely to discover.

Assuntos

Biologia Computacional/métodos , Modelos Genéticos , Neoplasias da Próstata/genética , Transcriptoma , Algoritmos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Humanos , Masculino

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA