Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 35(14): i605-i614, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510678

RESUMEN

MOTIVATION: Perturbation experiments constitute the central means to study cellular networks. Several confounding factors complicate computational modeling of signaling networks from this data. First, the technique of RNA interference (RNAi), designed and commonly used to knock-down specific genes, suffers from off-target effects. As a result, each experiment is a combinatorial perturbation of multiple genes. Second, the perturbations propagate along unknown connections in the signaling network. Once the signal is blocked by perturbation, proteins downstream of the targeted proteins also become inactivated. Finally, all perturbed network members, either directly targeted by the experiment, or by propagation in the network, contribute to the observed effect, either in a positive or negative manner. One of the key questions of computational inference of signaling networks from such data are, how many and what combinations of perturbations are required to uniquely and accurately infer the model? RESULTS: Here, we introduce an enhanced version of linear effects models (LEMs), which extends the original by accounting for both negative and positive contributions of the perturbed network proteins to the observed phenotype. We prove that the enhanced LEMs are identified from data measured under perturbations of all single, pairs and triplets of network proteins. For small networks of up to five nodes, only perturbations of single and pairs of proteins are required for identifiability. Extensive simulations demonstrate that enhanced LEMs achieve excellent accuracy of parameter estimation and network structure learning, outperforming the previous version on realistic data. LEMs applied to Bartonella henselae infection RNAi screening data identified known interactions between eight nodes of the infection network, confirming high specificity of our model and suggested one new interaction. AVAILABILITY AND IMPLEMENTATION: https://github.com/EwaSzczurek/LEM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
ARN Interferente Pequeño/genética , Biología Computacional , Modelos Lineales , Proteínas , Interferencia de ARN
2.
Bioinformatics ; 32(16): 2419-26, 2016 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-27153645

RESUMEN

MOTIVATION: Computational prediction of transcription factor (TF) binding sites in the genome remains a challenging task. Here, we present Romulus, a novel computational method for identifying individual TF binding sites from genome sequence information and cell-type-specific experimental data, such as DNase-seq. It combines the strengths of previous approaches, and improves robustness by reducing the number of free parameters in the model by an order of magnitude. RESULTS: We show that Romulus significantly outperforms existing methods across three sources of DNase-seq data, by assessing the performance of these tools against ChIP-seq profiles. The difference was particularly significant when applied to binding site prediction for low-information-content motifs. Our method is capable of inferring multiple binding modes for a single TF, which differ in their DNase I cut profile. Finally, using the model learned by Romulus and ChIP-seq data, we introduce Binding in Closed Chromatin (BCC) as a quantitative measure of TF pioneer factor activity. Uniquely, our measure quantifies a defining feature of pioneer factors, namely their ability to bind closed chromatin. AVAILABILITY AND IMPLEMENTATION: Romulus is freely available as an R package at http://github.com/ajank/Romulus CONTACT: ajank@mimuw.edu.pl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Sitios de Unión , Biología Computacional/métodos , Unión Proteica , Factores de Transcripción , Cromatina , Inmunoprecipitación de Cromatina , Análisis de Secuencia de ADN
3.
Genome Res ; 23(8): 1307-18, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23554463

RESUMEN

The binding of transcription factors (TFs) to their specific motifs in genomic regulatory regions is commonly studied in isolation. However, in order to elucidate the mechanisms of transcriptional regulation, it is essential to determine which TFs bind DNA cooperatively as dimers and to infer the precise nature of these interactions. So far, only a small number of such dimeric complexes are known. Here, we present an algorithm for predicting cell-type-specific TF-TF dimerization on DNA on a large scale, using DNase I hypersensitivity data from 78 human cell lines. We represented the universe of possible TF complexes by their corresponding motif complexes, and analyzed their occurrence at cell-type-specific DNase I hypersensitive sites. Based on ∼1.4 billion tests for motif complex enrichment, we predicted 603 highly significant cell-type-specific TF dimers, the vast majority of which are novel. Our predictions included 76% (19/25) of the known dimeric complexes and showed significant overlap with an experimental database of protein-protein interactions. They were also independently supported by evolutionary conservation, as well as quantitative variation in DNase I digestion patterns. Notably, the known and predicted TF dimers were almost always highly compact and rigidly spaced, suggesting that TFs dimerize in close proximity to their partners, which results in strict constraints on the structure of the DNA-bound complex. Overall, our results indicate that chromatin openness profiles are highly predictive of cell-type-specific TF-TF interactions. Moreover, cooperative TF dimerization seems to be a widespread phenomenon, with multiple TF complexes predicted in most cell types.


Asunto(s)
Factor Nuclear 3-alfa del Hepatocito/metabolismo , Modelos Biológicos , Algoritmos , Secuencia de Bases , Sitios de Unión , Línea Celular Tumoral , Análisis por Conglomerados , Simulación por Computador , Secuencia de Consenso , División del ADN , Desoxirribonucleasa I/química , Evolución Molecular , Humanos , Unión Proteica , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Multimerización de Proteína , Factores de Transcripción/metabolismo
4.
Plant Physiol ; 169(3): 2080-101, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26351307

RESUMEN

Linker (H1) histones play critical roles in chromatin compaction in higher eukaryotes. They are also the most variable of the histones, with numerous nonallelic variants cooccurring in the same cell. Plants contain a distinct subclass of minor H1 variants that are induced by drought and abscisic acid and have been implicated in mediating adaptive responses to stress. However, how these variants facilitate adaptation remains poorly understood. Here, we show that the single Arabidopsis (Arabidopsis thaliana) stress-inducible variant H1.3 occurs in plants in two separate and most likely autonomous pools: a constitutive guard cell-specific pool and a facultative environmentally controlled pool localized in other tissues. Physiological and transcriptomic analyses of h1.3 null mutants demonstrate that H1.3 is required for both proper stomatal functioning under normal growth conditions and adaptive developmental responses to combined light and water deficiency. Using fluorescence recovery after photobleaching analysis, we show that H1.3 has superfast chromatin dynamics, and in contrast to the main Arabidopsis H1 variants H1.1 and H1.2, it has no stable bound fraction. The results of global occupancy studies demonstrate that, while H1.3 has the same overall binding properties as the main H1 variants, including predominant heterochromatin localization, it differs from them in its preferences for chromatin regions with epigenetic signatures of active and repressed transcription. We also show that H1.3 is required for a substantial part of DNA methylation associated with environmental stress, suggesting that the likely mechanism underlying H1.3 function may be the facilitation of chromatin accessibility by direct competition with the main H1 variants.


Asunto(s)
Ácido Abscísico/metabolismo , Adaptación Fisiológica , Arabidopsis/genética , Regulación de la Expresión Génica de las Plantas , Histonas/genética , Reguladores del Crecimiento de las Plantas/metabolismo , Arabidopsis/crecimiento & desarrollo , Arabidopsis/fisiología , Arabidopsis/efectos de la radiación , Cromatina/genética , Cromatina/metabolismo , Metilación de ADN , Sequías , Epigénesis Genética , Genes Reporteros , Heterocromatina/genética , Heterocromatina/metabolismo , Histonas/metabolismo , Luz , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Estrés Fisiológico
5.
BMC Bioinformatics ; 15: 65, 2014 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-24597904

RESUMEN

BACKGROUND: Inconsistencies are often observed in the genome annotations of bacterial strains. Moreover, these inconsistencies are often not reflected by sequence discrepancies, but are caused by wrongly annotated gene starts as well as mis-identified gene presence. Thus, tools are needed for improving annotation consistency and accuracy among sets of bacterial strain genomes. RESULTS: We have developed eCAMBer, a tool for efficiently supporting comparative analysis of multiple bacterial strains within the same species. eCAMBer is a highly optimized revision of our earlier tool, CAMBer, scaling it up for significantly larger datasets comprising hundreds of bacterial strains. eCAMBer works in two phases. First, it transfers gene annotations among all considered bacterial strains. In this phase, it also identifies homologous gene families and annotation inconsistencies. Second, eCAMBer, tries to improve the quality of annotations by resolving the gene start inconsistencies and filtering out gene families arising from annotation errors propagated in the previous phase. CONCLUSIONS: [corrected] eCAMBer efficiently identifies and resolves annotation inconsistencies among closely related bacterial genomes. It outperforms other competing tools both in terms of running time and accuracy of produced annotations. Software, user manual, and case study results are available at the project website: http://bioputer.mimuw.edu.pl/ecamber.


Asunto(s)
Genoma Bacteriano , Anotación de Secuencia Molecular , Programas Informáticos , Bacterias/clasificación , Bacterias/genética , Familia de Multigenes
6.
BMC Genomics ; 15 Suppl 10: S10, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25559874

RESUMEN

BACKGROUND: Development of drug resistance in bacteria causes antibiotic therapies to be less effective and more costly. Moreover, our understanding of the process remains incomplete. One promising approach to improve our understanding of how resistance is being acquired is to use whole-genome comparative approaches for detection of drug resistance-associated mutations. RESULTS: We present GWAMAR, a tool we have developed for detecting of drug resistance-associated mutations in bacteria through comparative analysis of whole-genome sequences. The pipeline of GWAMAR comprises several steps. First, for a set of closely related bacterial genomes, it employs eCAMBer to identify homologous gene families. Second, based on multiple alignments of the gene families, it identifies mutations among the strains of interest. Third, it calculates several statistics to identify which mutations are the most associated with drug resistance. CONCLUSIONS: Based on our analysis of two large datasets retrieved from publicly available data for M. tuberculosis, we identified a set of novel putative drug resistance-associated mutations. As a part of this work, we present also an application of our tool to detect putative compensatory mutations.


Asunto(s)
Análisis Mutacional de ADN/métodos , Farmacorresistencia Bacteriana , Mycobacterium tuberculosis/genética , Mutación Puntual , Algoritmos , Bases de Datos Genéticas , Genoma Bacteriano , Estudio de Asociación del Genoma Completo , Modelos Estadísticos , Familia de Multigenes , Filogenia , Programas Informáticos
7.
BMC Genomics ; 15: 208, 2014 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-24640962

RESUMEN

BACKGROUND: Cooperative binding of transcription factor (TF) dimers to DNA is increasingly recognized as a major contributor to binding specificity. However, it is likely that the set of known TF dimers is highly incomplete, given that they were discovered using ad hoc approaches, or through computational analyses of limited datasets. RESULTS: Here, we present TACO (Transcription factor Association from Complex Overrepresentation), a general-purpose standalone software tool that takes as input any genome-wide set of regulatory elements and predicts cell-type-specific TF dimers based on enrichment of motif complexes. TACO is the first tool that can accommodate motif complexes composed of overlapping motifs, a characteristic feature of many known TF dimers. Our method comprehensively outperforms existing tools when benchmarked on a reference set of 29 known dimers. We demonstrate the utility and consistency of TACO by applying it to 152 DNase-seq datasets and 94 ChIP-seq datasets. CONCLUSIONS: Based on these results, we uncover a general principle governing the structure of TF-TF-DNA ternary complexes, namely that the flexibility of the complex is correlated with, and most likely a consequence of, inter-motif spacing.


Asunto(s)
Algoritmos , Programas Informáticos , Sitios de Unión , Inmunoprecipitación de Cromatina , ADN/química , ADN/metabolismo , Dimerización , Unión Proteica , Análisis de Secuencia de ADN , Factores de Transcripción/química , Factores de Transcripción/metabolismo
8.
BMC Genomics ; 13 Suppl 7: S23, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23281931

RESUMEN

BACKGROUND: Drug resistance in bacterial pathogens is an increasing problem, which stimulates research. However, our understanding of drug resistance mechanisms remains incomplete. Fortunately, the fast-growing number of fully sequenced bacterial strains now enables us to develop new methods to identify mutations associated with drug resistance. RESULTS: We present a new comparative approach to identify genes and mutations that are likely to be associated with drug resistance mechanisms. In order to test the approach, we collected genotype and phenotype data of 100 fully sequenced strains of S. aureus and 10 commonly used drugs. Then, applying the method, we re-discovered the most common genetic determinants of drug resistance and identified some novel putative associations. CONCLUSIONS: Firstly, the collected data may help other researchers to develop and verify similar techniques. Secondly, the proposed method is successful in identifying drug resistance determinants. Thirdly, the in-silico identified genetic mutations, which are putatively involved in drug resistance mechanisms, may increase our understanding of the drug resistance mechanisms.


Asunto(s)
Algoritmos , Farmacorresistencia Bacteriana/genética , Staphylococcus aureus/genética , Antibacterianos/farmacología , Farmacorresistencia Bacteriana/efectos de los fármacos , Genotipo , Mutación , Fenotipo , Filogenia , Staphylococcus aureus/clasificación , Staphylococcus aureus/efectos de los fármacos
9.
Bioinformatics ; 27(23): 3313-4, 2011 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-21984770

RESUMEN

MOTIVATION: A number of inconsistencies in genome annotations are documented among bacterial strains. Visualization of the differences may help biologists to make correct decisions in spurious cases. RESULTS: We have developed a visualization tool, CAMBerVis, to support comparative analysis of multiple bacterial strains. The software manages simultaneous visualization of multiple bacterial genomes, enabling visual analysis focused on genome structure annotations. AVAILABILITY: The CAMBerVis software is freely available at the project website: http://bioputer.mimuw.edu.pl/camber. Input datasets for Mycobacterium tuberculosis and Staphylocacus aureus are integrated with the software as examples. CONTACT: m.wozniak@mimuw.edu.pl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bacterias/clasificación , Bacterias/genética , Genoma Bacteriano , Programas Informáticos , Anotación de Secuencia Molecular , Alineación de Secuencia/métodos
10.
BMC Bioinformatics ; 12: 249, 2011 Jun 21.
Artículo en Inglés | MEDLINE | ID: mdl-21693013

RESUMEN

BACKGROUND: Deregulation between two different cell populations manifests itself in changing gene expression patterns and changing regulatory interactions. Accumulating knowledge about biological networks creates an opportunity to study these changes in their cellular context. RESULTS: We analyze re-wiring of regulatory networks based on cell population-specific perturbation data and knowledge about signaling pathways and their target genes. We quantify deregulation by merging regulatory signal from the two cell populations into one score. This joint approach, called JODA, proves advantageous over separate analysis of the cell populations and analysis without incorporation of knowledge. JODA is implemented and freely available in a Bioconductor package 'joda'. CONCLUSIONS: Using JODA, we show wide-spread re-wiring of gene regulatory networks upon neocarzinostatin-induced DNA damage in Human cells. We recover 645 deregulated genes in thirteen functional clusters performing the rich program of response to damage. We find that the clusters contain many previously characterized neocarzinostatin target genes. We investigate connectivity between those genes, explaining their cooperation in performing the common functions. We review genes with the most extreme deregulation scores, reporting their involvement in response to DNA damage. Finally, we investigate the indirect impact of the ATM pathway on the deregulated genes, and build a hypothetical hierarchy of direct regulation. These results prove that JODA is a step forward to a systems level, mechanistic understanding of changes in gene regulation between different cell populations.


Asunto(s)
Daño del ADN , Regulación de la Expresión Génica , Programas Informáticos , Reparación del ADN , Enfermedad/genética , Perfilación de la Expresión Génica , Humanos , Transducción de Señal
11.
BMC Genomics ; 12 Suppl 2: S6, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21989220

RESUMEN

BACKGROUND: There is a large amount of inconsistency in gene structure annotations of bacterial strains. This inconsistency is a frustrating impedance to effective comparative genomic analysis of bacterial strains in promising applications such as gaining insights into bacterial drug resistance. RESULTS: Here, we propose CAMBer as an approach to support comparative analysis of multiple bacterial strains. CAMBer produces what we called multigene families. Each multigene family reveals genes that are in one-to-one correspondence in the bacterial strains, thereby permitting their annotations to be integrated. We present results of our method applied to three human pathogens: Escherichia coli, Mycobacterium tuberculosis and Staphylococcus aureus. CONCLUSIONS: As a result, more accurate and more comprehensive annotations of the bacterial strains can be produced.


Asunto(s)
Escherichia coli/clasificación , Genoma Bacteriano , Mycobacterium tuberculosis/clasificación , Programas Informáticos , Staphylococcus aureus/clasificación , Codón Iniciador/genética , Biología Computacional , Gráficos por Computador , Escherichia coli/genética , Anotación de Secuencia Molecular , Familia de Multigenes , Mycobacterium tuberculosis/genética , Staphylococcus aureus/genética , Factores de Tiempo
12.
Bioinformatics ; 26(14): 1790-1, 2010 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-20507893

RESUMEN

SUMMARY: Interrogating protein complexes and pathways in an evolutionary context provides insights into the formation of the basic functional components of the cell. We developed two independent Cytoscape plugins that can be cooperatively used to map evolving protein interaction networks at the module level. The APCluster plugin implements a recent affinity propagation (AP) algorithm for graph clustering and can be applied to decompose networks into coherent modules. The NetworkEvolution plugin provides the capability to visualize selected modules in consecutive evolutionary stages. AVAILABILITY: The plugins, input data and usage scenarios are freely available from the project web site: http://bioputer.mimuw.edu.pl/modevo. The plugins are also available from the Cytoscape plugin repository.


Asunto(s)
Evolución Molecular , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Programas Informáticos , Algoritmos , Bases de Datos de Proteínas
13.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2125-2135, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-31150345

RESUMEN

Tree reconciliation costs are a popular choice to account for the discordance between the evolutionary history of a gene family (i.e., a gene tree), and the species tree through which this family has evolved. This discordance is accounted for by the minimum number of postulated evolutionary events necessary for reconciling the two trees. Such events include gene duplication, loss, and deep coalescence, and are used to define different types of tree reconciliation costs. For example, the duplication-loss cost for a gene tree and species tree accounts for the minimum number of gene duplications and losses necessary to reconcile these trees. Fundamental to the understanding of how gene trees and species trees relate to each other are the diameters of tree reconciliation costs. While such diameters have been well-researched, still absent from these studies are the unconstrained diameters for two of the classic tree reconciliation costs, namely the duplication-loss cost and the loss cost. Here, we show the essential mathematical properties of these diameters and provide efficient solutions for computing them. Finally, we analyze the distributions of these diameters using simulated datasets.


Asunto(s)
Biología Computacional/métodos , Duplicación de Gen/genética , Modelos Genéticos , Evolución Molecular , Filogenia
14.
Mol Syst Biol ; 5: 287, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19584836

RESUMEN

Signaling cascades are triggered by environmental stimulation and propagate the signal to regulate transcription. Systematic reconstruction of the underlying regulatory mechanisms requires pathway-targeted, informative experimental data. However, practical experimental design approaches are still in their infancy. Here, we propose a framework that iterates design of experiments and identification of regulatory relationships downstream of a given pathway. The experimental design component, called MEED, aims to minimize the amount of laboratory effort required in this process. To avoid ambiguity in the identification of regulatory relationships, the choice of experiments maximizes diversity between expression profiles of genes regulated through different mechanisms. The framework takes advantage of expert knowledge about the pathways under study, formalized in a predictive logical model. By considering model-predicted dependencies between experiments, MEED is able to suggest a whole set of experiments that can be carried out simultaneously. Our framework was applied to investigate interconnected signaling pathways in yeast. In comparison with other approaches, MEED suggested the most informative experiments for unambiguous identification of transcriptional regulation in this system.


Asunto(s)
Algoritmos , Regulación de la Expresión Génica , Modelos Biológicos , Transducción de Señal , Transcripción Genética , Proteínas Fúngicas/biosíntesis , Proteínas Fúngicas/genética , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Elementos Reguladores de la Transcripción , Levaduras/genética , Levaduras/metabolismo
15.
Comput Biol Chem ; 89: 107260, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33038778

RESUMEN

BACKGROUND: The genomic duplication study is fundamental to understand the process of evolution. In evolutionary molecular biology, many approaches focus on discovering the occurrences of gene duplications and multiple gene duplication episodes and their locations in the Tree of Life. To reconstruct such episodes, one can cluster single gene duplications inferred by reconciling a set of gene trees with a species tree. RESULTS: We propose an efficient quadratic time algorithm to solve the problem of genomic duplication clustering, in which input gene trees are rooted, episode locations are restricted to preserve the minimal number of single gene duplications, clustering rules are described by minimum episodes method, and the goal is based on the recently introduced new approach to minimize the maximal number of duplication episodes on a single path, called here the MP score. Based on our theoretical results, we show new algorithmic relationships between the MP score and the minimum episodes (ME) score, defined as the minimal number of duplication episodes. CONCLUSIONS: Our evaluation analysis on three empirical datasets demonstrates, that under the model in which the minimal number of duplications is preserved, the duplication clusterings with minimal MP score support the clusterings with the minimal total number of duplication episodes. AVAILABILITY: The software is available at https://bitbucket.org/pgor17/rmp.


Asunto(s)
Algoritmos , Duplicación de Gen , Modelos Genéticos , Bases de Datos Genéticas/estadística & datos numéricos , Evolución Molecular
16.
BMC Bioinformatics ; 10: 393, 2009 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-19948065

RESUMEN

BACKGROUND: The assembly of reliable and complete protein-protein interaction (PPI) maps remains one of the significant challenges in systems biology. Computational methods which integrate and prioritize interaction data can greatly aid in approaching this goal. RESULTS: We developed a Bayesian inference framework which uses phylogenetic relationships to guide the integration of PPI evidence across multiple datasets and species, providing more accurate predictions. We apply our framework to reconcile seven eukaryotic interactomes: H. sapiens, M. musculus, R. norvegicus, D. melanogaster, C. elegans, S. cerevisiae and A. thaliana. Comprehensive GO-based quality assessment indicates a 5% to 44% score increase in predicted interactomes compared to the input data. Further support is provided by gold-standard MIPS, CYC2008 and HPRD datasets. We demonstrate the ability to recover known PPIs in well-characterized yeast and human complexes (26S proteasome, endosome and exosome) and suggest possible new partners interacting with the putative SWI/SNF chromatin remodeling complex in A. thaliana. CONCLUSION: Our phylogeny-guided approach compares favorably to two standard methods for mapping PPIs across species. Detailed analysis of predictions in selected functional modules uncovers specific PPI profiles among homologous proteins, establishing interaction-based partitioning of protein families. Provided evidence also suggests that interactions within core complex subunits are in general more conserved and easier to transfer accurately to other organisms, than interactions between these subunits.


Asunto(s)
Biología Computacional/métodos , Eucariontes/genética , Filogenia , Proteínas/química , Animales , Bases de Datos de Proteínas , Humanos
17.
BMC Bioinformatics ; 10: 82, 2009 Mar 10.
Artículo en Inglés | MEDLINE | ID: mdl-19284541

RESUMEN

BACKGROUND: Finding functional regulatory elements in DNA sequences is a very important problem in computational biology and providing a reliable algorithm for this task would be a major step towards understanding regulatory mechanisms on genome-wide scale. Major obstacles in this respect are that the fact that the amount of non-coding DNA is vast, and that the methods for predicting functional transcription factor binding sites tend to produce results with a high percentage of false positives. This makes the problem of finding regions significantly enriched in binding sites difficult. RESULTS: We develop a novel method for predicting regulatory regions in DNA sequences, which is designed to exploit the evolutionary conservation of regulatory elements between species without assuming that the order of motifs is preserved across species. We have implemented our method and tested its predictive abilities on various datasets from different organisms. CONCLUSION: We show that our approach enables us to find a majority of the known CRMs using only sequence information from different species together with currently publicly available motif data. Also, our method is robust enough to perform well in predicting CRMs, despite differences in tissue specificity and even across species, provided that the evolutionary distances between compared species do not change substantially. The complexity of the proposed algorithm is polynomial, and the observed running times show that it may be readily applied.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Evolución Molecular , Secuencias Reguladoras de Ácidos Nucleicos/genética , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , ADN/química
19.
Bioinformatics ; 23(13): i149-58, 2007 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-17646291

RESUMEN

MOTIVATION: The increasing availability of large-scale protein-protein interaction (PPI) data has fueled the efforts to elucidate the building blocks and organization of cellular machinery. Previous studies have shown cross-species comparison to be an effective approach in uncovering functional modules in protein networks. This has in turn driven the research for new network alignment methods with a more solid grounding in network evolution models and better scalability, to allow multiple network comparison. RESULTS: We develop a new framework for protein network alignment, based on reconstruction of an ancestral PPI network. The reconstruction algorithm is built upon a proposed model of protein network evolution, which takes into account phylogenetic history of the proteins and the evolution of their interactions. The application of our methodology to the PPI networks of yeast, worm and fly reveals that the most probable conserved ancestral interactions are often related to known protein complexes. By projecting the conserved ancestral interactions back onto the input networks we are able to identify the corresponding conserved protein modules in the considered species. In contrast to most of the previous methods, our algorithm is able to compare many networks simultaneously. The performed experiments demonstrate the ability of our method to uncover many functional modules with high specificity. AVAILABILITY: Information for obtaining software and supplementary results are available at http://bioputer.mimuw.edu.pl/papers/cappi.


Asunto(s)
Evolución Biológica , Secuencia Conservada/genética , Evolución Molecular , Modelos Biológicos , Mapeo de Interacción de Proteínas/métodos , Proteoma/química , Proteoma/genética , Simulación por Computador , Homología de Secuencia de Ácido Nucleico , Transducción de Señal/fisiología
20.
Bioinformatics ; 23(2): e116-22, 2007 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-17237078

RESUMEN

MOTIVATION: Inferring species phylogenies with a history of gene losses and duplications is a challenging and an important task in computational biology. This problem can be solved by duplication-loss models in which the primary step is to reconcile a rooted gene tree with a rooted species tree. Most modern methods of phylogenetic reconstruction (from sequences) produce unrooted gene trees. This limitation leads to the problem of transforming unrooted gene tree into a rooted tree, and then reconciling rooted trees. The main questions are 'What about biological interpretation of choosing rooting?', 'Can we find efficiently the optimal rootings?', 'Is the optimal rooting unique?'. RESULTS: In this paper we present a model of reconciling unrooted gene tree with a rooted species tree, which is based on a concept of choosing rooting which has minimal reconciliation cost. Our analysis leads to the surprising property that all the minimal rootings have identical distributions of gene duplications and gene losses in the species tree. It implies, in our opinion, that the concept of an optimal rooting is very robust, and thus biologically meaningful. Also, it has nice computational properties. We present a linear time and space algorithm for computing optimal rooting(s). This algorithm was used in two different ways to reconstruct the optimal species phylogeny of five known yeast genomes from approximately 4700 gene trees. Moreover, we determined locations (history) of all gene duplications and gene losses in the final species tree. It is interesting to notice that the top five species trees are the same for both methods. AVAILABILITY: Software and documentation are freely available from http://bioputer.mimuw.edu.pl/~gorecki/urec


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Evolución Molecular , Genoma Fúngico/genética , Filogenia , Proteoma/genética , Análisis de Secuencia de Proteína/métodos , Secuencia Conservada , Alineación de Secuencia/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA