Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 77
Filtrar
1.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38886164

RESUMO

Morphological profiling is a valuable tool in phenotypic drug discovery. The advent of high-throughput automated imaging has enabled the capturing of a wide range of morphological features of cells or organisms in response to perturbations at the single-cell resolution. Concurrently, significant advances in machine learning and deep learning, especially in computer vision, have led to substantial improvements in analyzing large-scale high-content images at high throughput. These efforts have facilitated understanding of compound mechanism of action, drug repurposing, characterization of cell morphodynamics under perturbation, and ultimately contributing to the development of novel therapeutics. In this review, we provide a comprehensive overview of the recent advances in the field of morphological profiling. We summarize the image profiling analysis workflow, survey a broad spectrum of analysis strategies encompassing feature engineering- and deep learning-based approaches, and introduce publicly available benchmark datasets. We place a particular emphasis on the application of deep learning in this pipeline, covering cell segmentation, image representation learning, and multimodal learning. Additionally, we illuminate the application of morphological profiling in phenotypic drug discovery and highlight potential challenges and opportunities in this field.


Assuntos
Aprendizado Profundo , Descoberta de Drogas , Descoberta de Drogas/métodos , Humanos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina
2.
Genome Res ; 29(9): 1415-1428, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31434679

RESUMO

DNA replication occurs in a defined temporal order known as the replication timing (RT) program and is regulated during development, coordinated with 3D genome organization and transcriptional activity. However, transcription and RT are not sufficiently coordinated to predict each other, suggesting an indirect relationship. Here, we exploit genome-wide RT profiles from 15 human cell types and intermediate differentiation stages derived from human embryonic stem cells to construct different types of RT regulatory networks. First, we constructed networks based on the coordinated RT changes during cell fate commitment to create highly complex RT networks composed of thousands of interactions that form specific functional subnetwork communities. We also constructed directional regulatory networks based on the order of RT changes within cell lineages, and identified master regulators of differentiation pathways. Finally, we explored relationships between RT networks and transcriptional regulatory networks (TRNs) by combining them into more complex circuitries of composite and bipartite networks. Results identified novel trans interactions linking transcription factors that are core to the regulatory circuitry of each cell type to RT changes occurring in those cell types. These core transcription factors were found to bind cooperatively to sites in the affected replication domains, providing provocative evidence that they constitute biologically significant directional interactions. Our findings suggest a regulatory link between the establishment of cell-type-specific TRNs and RT control during lineage specification.


Assuntos
Período de Replicação do DNA , Células-Tronco Embrionárias/citologia , Fatores de Transcrição/metabolismo , Diferenciação Celular , Linhagem da Célula , Células Cultivadas , DNA/metabolismo , Células-Tronco Embrionárias/química , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Humanos , Transcrição Gênica
3.
Bioinformatics ; 35(18): 3250-3256, 2019 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-30698651

RESUMO

MOTIVATION: Optical maps are high-resolution restriction maps (Rmaps) that give a unique numeric representation to a genome. Used in concert with sequence reads, they provide a useful tool for genome assembly and for discovering structural variations and rearrangements. Although they have been a regular feature of modern genome assembly projects, optical maps have been mainly used in post-processing step and not in the genome assembly process itself. Several methods have been proposed for pairwise alignment of single molecule optical maps-called Rmaps, or for aligning optical maps to assembled reads. However, the problem of aligning an Rmap to a graph representing the sequence data of the same genome has not been studied before. Such an alignment provides a mapping between two sets of data: optical maps and sequence data which will facilitate the usage of optical maps in the sequence assembly step itself. RESULTS: We define the problem of aligning an Rmap to a de Bruijn graph and present the first algorithm for solving this problem which is based on a seed-and-extend approach. We demonstrate that our method is capable of aligning 73% of Rmaps generated from the Escherichia coli genome to the de Bruijn graph constructed from short reads generated from the same genome. We validate the alignments and show that our method achieves an accuracy of 99.6%. We also show that our method scales to larger genomes. In particular, we show that 76% of Rmaps can be aligned to the de Bruijn graph in the case of human data. AVAILABILITY AND IMPLEMENTATION: The software for aligning optical maps to de Bruijn graph, omGraph is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/omGraph. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Genoma , Mapeamento por Restrição , Análise de Sequência de DNA
4.
Nature ; 515(7527): 402-5, 2014 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-25409831

RESUMO

Eukaryotic chromosomes replicate in a temporal order known as the replication-timing program. In mammals, replication timing is cell-type-specific with at least half the genome switching replication timing during development, primarily in units of 400-800 kilobases ('replication domains'), whose positions are preserved in different cell types, conserved between species, and appear to confine long-range effects of chromosome rearrangements. Early and late replication correlate, respectively, with open and closed three-dimensional chromatin compartments identified by high-resolution chromosome conformation capture (Hi-C), and, to a lesser extent, late replication correlates with lamina-associated domains (LADs). Recent Hi-C mapping has unveiled substructure within chromatin compartments called topologically associating domains (TADs) that are largely conserved in their positions between cell types and are similar in size to replication domains. However, TADs can be further sub-stratified into smaller domains, challenging the significance of structures at any particular scale. Moreover, attempts to reconcile TADs and LADs to replication-timing data have not revealed a common, underlying domain structure. Here we localize boundaries of replication domains to the early-replicating border of replication-timing transitions and map their positions in 18 human and 13 mouse cell types. We demonstrate that, collectively, replication domain boundaries share a near one-to-one correlation with TAD boundaries, whereas within a cell type, adjacent TADs that replicate at similar times obscure replication domain boundaries, largely accounting for the previously reported lack of alignment. Moreover, cell-type-specific replication timing of TADs partitions the genome into two large-scale sub-nuclear compartments revealing that replication-timing transitions are indistinguishable from late-replicating regions in chromatin composition and lamina association and accounting for the reduced correlation of replication timing to LADs and heterochromatin. Our results reconcile cell-type-specific sub-nuclear compartmentalization and replication timing with developmentally stable structural domains and offer a unified model for large-scale chromosome structure and function.


Assuntos
Cromatina/química , Cromatina/genética , Período de Replicação do DNA , DNA/biossíntese , Animais , Compartimento Celular , Cromatina/metabolismo , Montagem e Desmontagem da Cromatina , DNA/genética , Genoma/genética , Heterocromatina/química , Heterocromatina/genética , Heterocromatina/metabolismo , Humanos , Camundongos , Especificidade de Órgãos , Fatores de Tempo
5.
BMC Bioinformatics ; 20(Suppl 12): 318, 2019 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-31216986

RESUMO

BACKGROUND: Identification of motifs-recurrent and statistically significant patterns-in biological networks is the key to understand the design principles, and to infer governing mechanisms of biological systems. This, however, is a computationally challenging task. This task is further complicated as biological interactions depend on limited resources, i.e., a reaction takes place if the reactant molecule concentrations are above a certain threshold level. This biochemical property implies that network edges can participate in a limited number of motifs simultaneously. Existing motif counting methods ignore this problem. This simplification often leads to inaccurate motif counts (over- or under-estimates), and thus, wrong biological interpretations. RESULTS: In this paper, we develop a novel motif counting algorithm, Partially Overlapping MOtif Counting (POMOC), that considers capacity levels for all interactions in counting motifs. CONCLUSIONS: Our experiments on real and synthetic networks demonstrate that motif count using the POMOC method significantly differs from the existing motif counting approaches, and our method extends to large-scale biological networks in practical time. Our results also show that our method makes it possible to characterize the impact of different stress factors on cell's organization of network. In this regard, analysis of a S. cerevisiae transcriptional regulatory network using our method shows that oxidative stress is more disruptive to organization and abundance of motifs in this network than mutations of individual genes. Our analysis also suggests that by focusing on the edges that lead to variation in motif counts, our method can be used to find important genes, and to reveal subtle topological and functional differences of the biological networks under different cell states.


Assuntos
Redes Reguladoras de Genes/genética , Saccharomyces cerevisiae/genética , Algoritmos , Bases de Dados Genéticas , Genes Fúngicos , Modelos Biológicos , Estresse Oxidativo/genética
6.
BMC Genomics ; 20(Suppl 6): 434, 2019 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-31189471

RESUMO

BACKGROUND: Biological networks describes the mechanisms which govern cellular functions. Temporal networks show how these networks evolve over time. Studying the temporal progression of network topologies is of utmost importance since it uncovers how a network evolves and how it resists to external stimuli and internal variations. Two temporal networks have co-evolving subnetworks if the evolving topologies of these subnetworks remain similar to each other as the network topology evolves over a period of time. In this paper, we consider the problem of identifying co-evolving subnetworks given a pair of temporal networks, which aim to capture the evolution of molecules and their interactions over time. Although this problem shares some characteristics of the well-known network alignment problems, it differs from existing network alignment formulations as it seeks a mapping of the two network topologies that is invariant to temporal evolution of the given networks. This is a computationally challenging problem as it requires capturing not only similar topologies between two networks but also their similar evolution patterns. RESULTS: We present an efficient algorithm, Tempo, for solving identifying co-evolving subnetworks with two given temporal networks. We formally prove the correctness of our method. We experimentally demonstrate that Tempo scales efficiently with the size of network as well as the number of time points, and generates statistically significant alignments-even when evolution rates of given networks are high. Our results on a human aging dataset demonstrate that Tempo identifies novel genes contributing to the progression of Alzheimer's, Huntington's and Type II diabetes, while existing methods fail to do so. CONCLUSIONS: Studying temporal networks in general and human aging specifically using Tempo enables us to identify age related genes from non age related genes successfully. More importantly, Tempo takes the network alignment problem one huge step forward by moving beyond the classical static network models.


Assuntos
Algoritmos , Evolução Molecular , Redes Reguladoras de Genes , Redes e Vias Metabólicas , Adulto , Idoso , Idoso de 80 Anos ou mais , Envelhecimento , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Encéfalo/metabolismo , Biologia Computacional/métodos , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Humanos , Doença de Huntington/genética , Doença de Huntington/metabolismo , Pessoa de Meia-Idade , Mapeamento de Interação de Proteínas , Adulto Jovem
7.
BMC Bioinformatics ; 19(1): 465, 2018 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-30514202

RESUMO

BACKGROUND: Biological regulatory networks, representing the interactions between genes and their products, control almost every biological activity in the cell. Shortest path search is critical to apprehend the structure of these networks, and to detect their key components. Counting the number of shortest paths between pairs of genes in biological networks is a polynomial time problem. The fact that biological interactions are uncertain events however drastically complicates the problem, as it makes the topology of a given network uncertain. RESULTS: In this paper, we develop a novel method to count the number of shortest paths between two nodes in probabilistic networks. Unlike earlier approaches, which uses the shortest path counting methods that are specifically designed for deterministic networks, our method builds a new mathematical model to express and compute the number of shortest paths. We prove the correctness of this model. CONCLUSIONS: We compare our novel method to three existing shortest path counting methods on synthetic and real gene regulatory networks. Our experiments demonstrate that our method is scalable, and it outperforms the existing methods in accuracy. Application of our shortest path counting method to detect communities in probabilistic networks shows that our method successfully finds communities in probabilistic networks. Moreover, our experiments on cell cycle pathway among different cancer types exhibit that our method helps in uncovering key functional characteristics of biological networks.


Assuntos
Produtos Biológicos/metabolismo , Redes Reguladoras de Genes/genética , Humanos
8.
BMC Bioinformatics ; 19(1): 242, 2018 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-29940838

RESUMO

BACKGROUND: Identifying motifs in biological networks is essential in uncovering key functions served by these networks. Finding non-overlapping motif instances is however a computationally challenging task. The fact that biological interactions are uncertain events further complicates the problem, as it makes the existence of an embedding of a given motif an uncertain event as well. RESULTS: In this paper, we develop a novel method, ProMotE (Probabilistic Motif Embedding), to count non-overlapping embeddings of a given motif in probabilistic networks. We utilize a polynomial model to capture the uncertainty. We develop three strategies to scale our algorithm to large networks. CONCLUSIONS: Our experiments demonstrate that our method scales to large networks in practical time with high accuracy where existing methods fail. Moreover, our experiments on cancer and degenerative disease networks show that our method helps in uncovering key functional characteristics of biological networks.


Assuntos
Motivos de Aminoácidos/genética , Algoritmos
9.
Genome Res ; 25(8): 1091-103, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26055160

RESUMO

Duplication of the genome in mammalian cells occurs in a defined temporal order referred to as its replication-timing (RT) program. RT changes dynamically during development, regulated in units of 400-800 kb referred to as replication domains (RDs). Changes in RT are generally coordinated with transcriptional competence and changes in subnuclear position. We generated genome-wide RT profiles for 26 distinct human cell types, including embryonic stem cell (hESC)-derived, primary cells and established cell lines representing intermediate stages of endoderm, mesoderm, ectoderm, and neural crest (NC) development. We identified clusters of RDs that replicate at unique times in each stage (RT signatures) and confirmed global consolidation of the genome into larger synchronously replicating segments during differentiation. Surprisingly, transcriptome data revealed that the well-accepted correlation between early replication and transcriptional activity was restricted to RT-constitutive genes, whereas two-thirds of the genes that switched RT during differentiation were strongly expressed when late replicating in one or more cell types. Closer inspection revealed that transcription of this class of genes was frequently restricted to the lineage in which the RT switch occurred, but was induced prior to a late-to-early RT switch and/or down-regulated after an early-to-late RT switch. Analysis of transcriptional regulatory networks showed that this class of genes contains strong regulators of genes that were only expressed when early replicating. These results provide intriguing new insight into the complex relationship between transcription and RT regulation during human development.


Assuntos
Linhagem da Célula , Período de Replicação do DNA , Perfilação da Expressão Gênica/métodos , Células-Tronco Pluripotentes/fisiologia , Diferenciação Celular , Células Cultivadas , Análise por Conglomerados , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Genoma Humano , Humanos , Células-Tronco Pluripotentes/citologia
10.
BMC Bioinformatics ; 17(1): 408, 2016 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-27716036

RESUMO

BACKGROUND: Biological networks provide great potential to understand how cells function. Network motifs, frequent topological patterns, are key structures through which biological networks operate. Finding motifs in biological networks remains to be computationally challenging task as the size of the motif and the underlying network grow. Often, different copies of a given motif topology in a network share nodes or edges. Counting such overlapping copies introduces significant problems in motif identification. RESULTS: In this paper, we develop a scalable algorithm for finding network motifs. Unlike most of the existing studies, our algorithm counts independent copies of each motif topology. We introduce a set of small patterns and prove that we can construct any larger pattern by joining those patterns iteratively. By iteratively joining already identified motifs with those patterns, our algorithm avoids (i) constructing topologies which do not exist in the target network (ii) repeatedly counting the frequency of the motifs generated in subsequent iterations. Our experiments on real and synthetic networks demonstrate that our method is significantly faster and more accurate than the existing methods including SUBDUE and FSG. CONCLUSIONS: We conclude that our method for finding network motifs is scalable and computationally feasible for large motif sizes and a broad range of networks with different sizes and densities. We proved that any motif with four or more edges can be constructed as a join of the small patterns.


Assuntos
Algoritmos , Motivos de Aminoácidos/genética , Biologia Computacional/métodos , Redes Reguladoras de Genes , Reconhecimento Automatizado de Padrão , Proteínas Virais/metabolismo , Herpesvirus Humano 8/fisiologia , Humanos , Mapeamento de Interação de Proteínas , Proteínas Virais/genética
11.
BMC Bioinformatics ; 16 Suppl 17: S6, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26679404

RESUMO

BACKGROUND: Studying biological networks is of extreme importance in understanding cellular functions. These networks model interactions between molecules in each cell. A large volume of research has been done to uncover different characteristics of biological networks, such as large-scale organization, node centrality and network robustness. Nevertheless, the vast majority of research done in this area assume that biological networks have deterministic topologies. Biological interactions are however probabilistic events that may or may not appear at different cells or even in the same cell at different times. RESULTS: In this paper, we present novel methods for characterizing probabilistic signaling networks. Our methods do this by computing the probability that a signal propagates successfully from receptor to reporter genes through interactions in the network. We characterize such networks with respect to (i) centrality of individual nodes, (ii) stability of the entire network, and (iii) important functions served by the network. We use these methods to characterize major H. sapiens signaling networks including Wnt, ErbB and MAPK.


Assuntos
Probabilidade , Transdução de Sinais , Ontologia Genética , Humanos , Modelos Teóricos
12.
BMC Bioinformatics ; 16: 326, 2015 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-26453444

RESUMO

BACKGROUND: Network query problem aligns a small query network with an arbitrarily large target network. The complexity of this problem grows exponentially with the number of nodes in the query network if confidence in the optimality of result is desired. Scaling this problem to large query and target networks remains to be a challenge. RESULTS: In this article, we develop a novel index structure that dramatically reduces the cost of the network query problem. Our index structure maintains a small set of reference networks where each reference network is a small, carefully chosen subnetwork from the target network. Along with each reference, we also store all of its non-overlapping and statistically significant alignments with the target network. Given a query network, we first align the query with the reference networks. If the alignment with a reference network yields a sufficiently large score, we compute an upper-bound to the alignment score between the query and the target using the alignments of that reference and the target (which is stored in our index). If the upper-bound is large enough, we employ a second round of alignment between the query and the target by respecting the mapping found in the first alignment. Our experiments on protein-protein interaction networks demonstrate that our index achieves a significant speed-up in running time over the state-of-the-art methods such as ColT. The alignment subnetworks obtained by our method are also statistically significant. Finally, we observe that our method finds biologically and statistically significant alignments across multiple species. CONCLUSIONS: We developed a reference network based indexing structure that accelerates network query and produces functionally and statistically significant results.


Assuntos
Domínios e Motivos de Interação entre Proteínas/genética , Mapas de Interação de Proteínas/genética , Algoritmos
13.
BMC Bioinformatics ; 16: 161, 2015 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-25976669

RESUMO

BACKGROUND: Gene regulatory networks describe the interplay between genes and their products. These networks control almost every biological activity in the cell through interactions. The hierarchy of genes in these networks as defined by their interactions gives important insights into how these functions are governed. Accurately determining the hierarchy of genes is however a computationally difficult problem. This problem is further complicated by the fact that an intrinsic characteristic of regulatory networks is that the wiring of interactions can change over time. Determining how the hierarchy in the gene regulatory networks changes with dynamically evolving network topology remains to be an unsolved challenge. RESULTS: In this study, we develop a new method, named D-HIDEN (Dynamic-HIerarchical DEcomposition of Networks) to find the hierarchy of the genes in dynamically evolving gene regulatory network topologies. Unlike earlier methods, which recompute the hierarchy from scratch when the network topology changes, our method adapts the hierarchy based on the wiring of the interactions only for the nodes which have the potential to move in the hierarchy. CONCLUSIONS: We compare D-HIDEN to five currently available hierarchical decomposition methods on synthetic and real gene regulatory networks. Our experiments demonstrate that D-HIDEN significantly outperforms existing methods in running time, accuracy, or both. Furthermore, our method is robust against dynamic changes in hierarchy. Our experiments on human gene regulatory networks suggest that our method may be used to reconstruct hierarchy in gene regulatory networks.


Assuntos
Algoritmos , Fenômenos Fisiológicos Celulares , Biologia Computacional/métodos , Redes Reguladoras de Genes , Linfócitos/metabolismo , Células-Tronco/metabolismo , Linhagem da Célula , Perfilação da Expressão Gênica , Humanos , Linfócitos/citologia , Células-Tronco/citologia
15.
EMBO J ; 30(5): 882-93, 2011 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-21285948

RESUMO

The YgjD/Kae1 family (COG0533) has been on the top-10 list of universally conserved proteins of unknown function for over 5 years. It has been linked to DNA maintenance in bacteria and mitochondria and transcription regulation and telomere homeostasis in eukaryotes, but its actual function has never been found. Based on a comparative genomic and structural analysis, we predicted this family was involved in the biosynthesis of N(6)-threonylcarbamoyl adenosine, a universal modification found at position 37 of tRNAs decoding ANN codons. This was confirmed as a yeast mutant lacking Kae1 is devoid of t(6)A. t(6)A(-) strains were also used to reveal that t(6)A has a critical role in initiation codon restriction to AUG and in restricting frameshifting at tandem ANN codons. We also showed that YaeZ, a YgjD paralog, is required for YgjD function in vivo in bacteria. This work lays the foundation for understanding the pleiotropic role of this universal protein family.


Assuntos
Adenosina/análogos & derivados , Metaloendopeptidases/metabolismo , Proteínas Mitocondriais/metabolismo , RNA de Transferência/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Adenosina/metabolismo , Teste de Complementação Genética , Metaloendopeptidases/genética , Proteínas Mitocondriais/genética , Complexos Multiproteicos , RNA de Transferência/genética , RNA de Transferência/metabolismo , Saccharomyces cerevisiae/crescimento & desenvolvimento , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética
16.
Genome Res ; 22(7): 1334-49, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22456606

RESUMO

Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein-protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level.


Assuntos
Biologia Computacional/métodos , Drosophila melanogaster/genética , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Genoma de Inseto , Animais , Sequência de Bases , Montagem e Desmontagem da Cromatina , Imunoprecipitação da Cromatina , Mapeamento Cromossômico/métodos , Cromossomos/genética , Cromossomos/metabolismo , Sequência Conservada , Drosophila melanogaster/embriologia , Drosophila melanogaster/metabolismo , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Modelos Lineares , Modelos Genéticos , Anotação de Sequência Molecular , Sistema Nervoso/citologia , Sistema Nervoso/embriologia , Sistema Nervoso/metabolismo , Motivos de Nucleotídeos , Especificidade de Órgãos , Ligação Proteica , Mapeamento de Interação de Proteínas , Elementos Reguladores de Transcrição , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
17.
Bioinformatics ; 30(12): i96-104, 2014 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-24932011

RESUMO

MOTIVATION: Major disorders, such as leukemia, have been shown to alter the transcription of genes. Understanding how gene regulation is affected by such aberrations is of utmost importance. One promising strategy toward this objective is to compute whether signals can reach to the transcription factors through the transcription regulatory network (TRN). Due to the uncertainty of the regulatory interactions, this is a #P-complete problem and thus solving it for very large TRNs remains to be a challenge. RESULTS: We develop a novel and scalable method to compute the probability that a signal originating at any given set of source genes can arrive at any given set of target genes (i.e., transcription factors) when the topology of the underlying signaling network is uncertain. Our method tackles this problem for large networks while providing a provably accurate result. Our method follows a divide-and-conquer strategy. We break down the given network into a sequence of non-overlapping subnetworks such that reachability can be computed autonomously and sequentially on each subnetwork. We represent each interaction using a small polynomial. The product of these polynomials express different scenarios when a signal can or cannot reach to target genes from the source genes. We introduce polynomial collapsing operators for each subnetwork. These operators reduce the size of the resulting polynomial and thus the computational complexity dramatically. We show that our method scales to entire human regulatory networks in only seconds, while the existing methods fail beyond a few tens of genes and interactions. We demonstrate that our method can successfully characterize key reachability characteristics of the entire transcriptions regulatory networks of patients affected by eight different subtypes of leukemia, as well as those from healthy control samples. AVAILABILITY: All the datasets and code used in this article are available at bioinformatics.cise.ufl.edu/PReach/scalable.htm.


Assuntos
Redes Reguladoras de Genes , Algoritmos , Biologia Computacional/métodos , Regulação da Expressão Gênica , Humanos , Leucemia/genética , Leucemia/metabolismo , Transdução de Sinais , Fatores de Transcrição/genética
19.
Bioinformatics ; 29(2): 166-74, 2013 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-23162082

RESUMO

MOTIVATION: Phylogenetics, or reconstructing the evolutionary relationships of organisms, is critical for understanding evolution. A large number of heuristic algorithms for phylogenetics have been developed, some of which enable estimates of trees with tens of thousands of taxa. Such trees may not be robust, as small changes in the input data can cause major differences in the optimal topology. Tools that can assess the quality and stability of phylogenetic tree estimates and identify the most reliable parts of the tree are needed. RESULTS: We define measures that assess the stability of trees, subtrees and individual taxa with respect to changes in the input sequences. Our measures consider changes at the finest granularity in the input data (i.e. individual nucleotides). We demonstrate the effectiveness of our measures on large published datasets. Our measures are computationally feasible for phylogenetic datasets consisting of tens of thousands of taxa. AVAILABILITY: This software is available at http://bioinformatics.cise.ufl.edu/phylostab CONTACT: sheikh@cise.ufl.edu


Assuntos
Filogenia , Algoritmos , Animais , Mamíferos/classificação , Mamíferos/genética , Plantas/classificação , Plantas/genética , Alinhamento de Sequência , Análise de Sequência de DNA , Software
20.
ArXiv ; 2024 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-38168460

RESUMO

Morphological profiling is a valuable tool in phenotypic drug discovery. The advent of high-throughput automated imaging has enabled the capturing of a wide range of morphological features of cells or organisms in response to perturbations at the single-cell resolution. Concurrently, significant advances in machine learning and deep learning, especially in computer vision, have led to substantial improvements in analyzing large-scale high-content images at high-throughput. These efforts have facilitated understanding of compound mechanism-of-action (MOA), drug repurposing, characterization of cell morphodynamics under perturbation, and ultimately contributing to the development of novel therapeutics. In this review, we provide a comprehensive overview of the recent advances in the field of morphological profiling. We summarize the image profiling analysis workflow, survey a broad spectrum of analysis strategies encompassing feature engineering- and deep learning-based approaches, and introduce publicly available benchmark datasets. We place a particular emphasis on the application of deep learning in this pipeline, covering cell segmentation, image representation learning, and multimodal learning. Additionally, we illuminate the application of morphological profiling in phenotypic drug discovery and highlight potential challenges and opportunities in this field.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA