Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Bioinformatics ; 35(22): 4812-4814, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31225867

RESUMO

SUMMARY: Statistical dependencies are present in a variety of sequence data, but are not discernible from traditional sequence logos. Here, we present the R package DepLogo for visualizing inter-position dependencies in aligned sequence data as dependency logos. Dependency logos make dependency structures, which correspond to regular co-occurrences of symbols at dependent positions, visually perceptible. To this end, sequences are partitioned based on their symbols at highly dependent positions as measured by mutual information, and each partition obtains its own visual representation. We illustrate the utility of the DepLogo package in several use cases generating dependency logos from DNA, RNA and protein sequences. AVAILABILITY AND IMPLEMENTATION: The DepLogo R package is available from CRAN and its source code is available at https://github.com/Jstacs/DepLogo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , DNA , Matrizes de Pontuação de Posição Específica , Análise de Sequência de DNA
2.
Bioinformatics ; 33(11): 1639-1646, 2017 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-28130227

RESUMO

MOTIVATION: The computational investigation of DNA binding motifs from binding sites is one of the classic tasks in bioinformatics and a prerequisite for understanding gene regulation as a whole. Due to the development of sequencing technologies and the increasing number of available genomes, approaches based on phylogenetic footprinting become increasingly attractive. Phylogenetic footprinting requires phylogenetic trees with attached substitution probabilities for quantifying the evolution of binding sites, but these trees and substitution probabilities are typically not known and cannot be estimated easily. RESULTS: Here, we investigate the influence of phylogenetic trees with different substitution probabilities on the classification performance of phylogenetic footprinting using synthetic and real data. For synthetic data we find that the classification performance is highest when the substitution probability used for phylogenetic footprinting is similar to that used for data generation. For real data, however, we typically find that the classification performance of phylogenetic footprinting surprisingly increases with increasing substitution probabilities and is often highest for unrealistically high substitution probabilities close to one. This finding suggests that choosing realistic model assumptions might not always yield optimal predictions in general and that choosing unrealistically high substitution probabilities close to one might actually improve the classification performance of phylogenetic footprinting. AVAILABILITY AND IMPLEMENTATION: The proposed PF is implemented in JAVA and can be downloaded from https://github.com/mgledi/PhyFoo. CONTACT: : martin.nettling@informatik.uni-halle.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA/genética , Redes Reguladoras de Genes , Filogenia , Análise de Sequência de DNA/métodos , Software , Animais , Sítios de Ligação/genética , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , Humanos , Análise de Sequência de Proteína
3.
BMC Bioinformatics ; 18(1): 141, 2017 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-28249564

RESUMO

BACKGROUND: Transcriptional gene regulation is a fundamental process in nature, and the experimental and computational investigation of DNA binding motifs and their binding sites is a prerequisite for elucidating this process. Approaches for de-novo motif discovery can be subdivided in phylogenetic footprinting that takes into account phylogenetic dependencies in aligned sequences of more than one species and non-phylogenetic approaches based on sequences from only one species that typically take into account intra-motif dependencies. It has been shown that modeling (i) phylogenetic dependencies as well as (ii) intra-motif dependencies separately improves de-novo motif discovery, but there is no approach capable of modeling both (i) and (ii) simultaneously. RESULTS: Here, we present an approach for de-novo motif discovery that combines phylogenetic footprinting with motif models capable of taking into account intra-motif dependencies. We study the degree of intra-motif dependencies inferred by this approach from ChIP-seq data of 35 transcription factors. We find that significant intra-motif dependencies of orders 1 and 2 are present in all 35 datasets and that intra-motif dependencies of order 2 are typically stronger than those of order 1. We also find that the presented approach improves the classification performance of phylogenetic footprinting in all 35 datasets and that incorporating intra-motif dependencies of order 2 yields a higher classification performance than incorporating such dependencies of only order 1. CONCLUSION: Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies leads to an improved performance in the classification of transcription factor binding sites. This may advance our understanding of transcriptional gene regulation and its evolution.


Assuntos
Modelos Moleculares , Fatores de Transcrição/classificação , Algoritmos , Motivos de Aminoácidos , Sítios de Ligação/genética , Cromatina/metabolismo , DNA/química , DNA/metabolismo , Humanos , Filogenia , Ligação Proteica , Domínios Proteicos , Análise de Sequência de DNA , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
4.
BMC Genomics ; 17: 347, 2016 05 10.
Artigo em Inglês | MEDLINE | ID: mdl-27165633

RESUMO

BACKGROUND: Transcriptional gene regulation is a fundamental process in nature, and the experimental and computational investigation of DNA binding motifs and their binding sites is a prerequisite for elucidating this process. ChIP-seq has become the major technology to uncover genomic regions containing those binding sites, but motifs predicted by traditional computational approaches using these data are distorted by a ubiquitous binding-affinity bias. Here, we present an approach for detecting and correcting this bias using inter-species information. RESULTS: We find that the binding-affinity bias caused by the ChIP-seq experiment in the reference species is stronger than the indirect binding-affinity bias in orthologous regions from phylogenetically related species. We use this difference to develop a phylogenetic footprinting model that is capable of detecting and correcting the binding-affinity bias. We find that this model improves motif prediction and that the corrected motifs are typically softer than those predicted by traditional approaches. CONCLUSIONS: These findings indicate that motifs published in databases and in the literature are artificially sharpened compared to the native motifs. These findings also indicate that our current understanding of transcriptional gene regulation might be blurred, but that it is possible to advance this understanding by taking into account inter-species information available today and even more in the future.


Assuntos
Sítios de Ligação , Imunoprecipitação da Cromatina , Sequenciamento de Nucleotídeos em Larga Escala , Motivos de Nucleotídeos , Fatores de Transcrição , Biologia Computacional/métodos , Regulação da Expressão Gênica , Humanos , Modelos Genéticos , Reprodutibilidade dos Testes , Fatores de Transcrição/metabolismo
5.
BMC Bioinformatics ; 16: 387, 2015 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-26577052

RESUMO

BACKGROUND: For three decades, sequence logos are the de facto standard for the visualization of sequence motifs in biology and bioinformatics. Reasons for this success story are their simplicity and clarity. The number of inferred and published motifs grows with the number of data sets and motif extraction algorithms. Hence, it becomes more and more important to perceive differences between motifs. However, motif differences are hard to detect from individual sequence logos in case of multiple motifs for one transcription factor, highly similar binding motifs of different transcription factors, or multiple motifs for one protein domain. RESULTS: Here, we present DiffLogo, a freely available, extensible, and user-friendly R package for visualizing motif differences. DiffLogo is capable of showing differences between DNA motifs as well as protein motifs in a pair-wise manner resulting in publication-ready figures. In case of more than two motifs, DiffLogo is capable of visualizing pair-wise differences in a tabular form. Here, the motifs are ordered by similarity, and the difference logos are colored for clarity. We demonstrate the benefit of DiffLogo on CTCF motifs from different human cell lines, on E-box motifs of three basic helix-loop-helix transcription factors as examples for comparison of DNA motifs, and on F-box domains from three different families as example for comparison of protein motifs. CONCLUSIONS: DiffLogo provides an intuitive visualization of motif differences. It enables the illustration and investigation of differences between highly similar motifs such as binding patterns of transcription factors for different cell types, treatments, and algorithmic approaches.


Assuntos
Algoritmos , Motivos de Aminoácidos/genética , Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Gráficos por Computador , Motivos de Nucleotídeos/genética , Análise de Sequência de DNA/métodos , Software , Fator de Ligação a CCCTC , Biologia Computacional/métodos , Humanos , Estrutura Terciária de Proteína , Proteínas Repressoras/genética , Células Tumorais Cultivadas
6.
BMC Bioinformatics ; 15: 38, 2014 Feb 04.
Artigo em Inglês | MEDLINE | ID: mdl-24495746

RESUMO

BACKGROUND: New technologies for analyzing biological samples, like next generation sequencing, are producing a growing amount of data together with quality scores. Moreover, software tools (e.g., for mapping sequence reads), calculating transcription factor binding probabilities, estimating epigenetic modification enriched regions or determining single nucleotide polymorphism increase this amount of position-specific DNA-related data even further. Hence, requesting data becomes challenging and expensive and is often implemented using specialised hardware. In addition, picking specific data as fast as possible becomes increasingly important in many fields of science. The general problem of handling big data sets was addressed by developing specialized databases like HBase, HyperTable or Cassandra. However, these database solutions require also specialized or distributed hardware leading to expensive investments. To the best of our knowledge, there is no database capable of (i) storing billions of position-specific DNA-related records, (ii) performing fast and resource saving requests, and (iii) running on a single standard computer hardware. RESULTS: Here, we present DRUMS (Disk Repository with Update Management and Select option), satisfying demands (i)-(iii). It tackles the weaknesses of traditional databases while handling position-specific DNA-related data in an efficient manner. DRUMS is capable of storing up to billions of records. Moreover, it focuses on optimizing relating single lookups as range request, which are needed permanently for computations in bioinformatics. To validate the power of DRUMS, we compare it to the widely used MySQL database. The test setting considers two biological data sets. We use standard desktop hardware as test environment. CONCLUSIONS: DRUMS outperforms MySQL in writing and reading records by a factor of two up to a factor of 10000. Furthermore, it can work with significantly larger data sets. Our work focuses on mid-sized data sets up to several billion records without requiring cluster technology. Storing position-specific data is a general problem and the concept we present here is a generalized approach. Hence, it can be easily applied to other fields of bioinformatics.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Retrovirus Endógenos/genética , Genoma Humano/genética , Humanos , Armazenamento e Recuperação da Informação , Polimorfismo de Nucleotídeo Único
7.
Front Microbiol ; 9: 2384, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30455669

RESUMO

More than eight percent of the human genome consists of human endogenous retroviruses (HERVs). Typically, the expression of HERVs is repressed, but varying activities of HERVs have been observed in diseases ranging from cancer to neuro-degeneration. Such activities can include the transcription of HERV-derived open reading frames, which can be translated into proteins. However, as a consequence of mutations that disrupt open reading frames, most HERV-like sequences have lost their protein-coding capacity. Nevertheless, these loci can still influence the expression of adjacent genes and, hence, mediate biological effects. Here, we present WebHERV (http://calypso.informatik.uni-halle.de/WebHERV/), a web server that enables the computational prediction of active HERV-like sequences in the human genome based on a comparison of genome coordinates of expressed sequences uploaded by the user and genome coordinates of HERV-like sequences stored in the specialized key-value store DRUMS. Using WebHERV, we predicted putative candidates of active HERV-like sequences in Hodgkin lymphoma (HL) cell lines, validated one of them by a modified SMART (switching mechanism at 5' end of RNA template) technique, and identified a new alternative transcription start site for cytochrome P450, family 4, subfamily Z, polypeptide 1 (CYP4Z1).

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa