Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 17(9): e0274338, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36084008

RESUMO

Gene expression is regulated through cis-regulatory elements (CREs), among which are promoters, enhancers, Polycomb/Trithorax Response Elements (PREs), silencers and insulators. Computational prediction of CREs can be achieved using a variety of statistical and machine learning methods combined with different feature space formulations. Although Python packages for DNA sequence feature sets and for machine learning are available, no existing package facilitates the combination of DNA sequence feature sets with machine learning methods for the genome-wide prediction of candidate CREs. We here present Gnocis, a Python package that streamlines the analysis and the modelling of CRE sequences by providing extensible APIs and implementing the glue required for combining feature sets and models for genome-wide prediction. Gnocis implements a variety of base feature sets, including motif pair occurrence frequencies and the k-spectrum mismatch kernel. It integrates with Scikit-learn and TensorFlow for state-of-the-art machine learning. Gnocis additionally implements a broad suite of tools for the handling and preparation of sequence, region and curve data, which can be useful for general DNA bioinformatics in Python. We also present Deep-MOCCA, a neural network architecture inspired by SVM-MOCCA that achieves moderate to high generalization without prior motif knowledge. To demonstrate the use of Gnocis, we applied multiple machine learning methods to the modelling of D. melanogaster PREs, including a Convolutional Neural Network (CNN), making this the first study to model PREs with CNNs. The models are readily adapted to new CRE modelling problems and to other organisms. In order to produce a high-performance, compiled package for Python 3, we implemented Gnocis in Cython. Gnocis can be installed using the PyPI package manager by running 'pip install gnocis'. The source code is available on GitHub, at https://github.com/bjornbredesen/gnocis.


Assuntos
Drosophila melanogaster , Software , Algoritmos , Animais , DNA/genética , Drosophila melanogaster/genética , Redes Neurais de Computação , Elementos de Resposta
3.
BMC Bioinformatics ; 23(1): 39, 2022 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-35030988

RESUMO

BACKGROUND: Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. RESULTS: Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. CONCLUSIONS: MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at https://github.com/hitaandrea/MGcount .


Assuntos
Software , Transcriptoma , RNA , RNA-Seq , Análise de Sequência de RNA
4.
BMC Bioinformatics ; 22(1): 234, 2021 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-33962556

RESUMO

BACKGROUND: Cis-regulatory elements (CREs) are DNA sequence segments that regulate gene expression. Among CREs are promoters, enhancers, Boundary Elements (BEs) and Polycomb Response Elements (PREs), all of which are enriched in specific sequence motifs that form particular occurrence landscapes. We have recently introduced a hierarchical machine learning approach (SVM-MOCCA) in which Support Vector Machines (SVMs) are applied on the level of individual motif occurrences, modelling local sequence composition, and then combined for the prediction of whole regulatory elements. We used SVM-MOCCA to predict PREs in Drosophila and found that it was superior to other methods. However, we did not publish a polished implementation of SVM-MOCCA, which can be useful for other researchers, and we only tested SVM-MOCCA with IUPAC motifs and PREs. RESULTS: We here present an expanded suite for modelling CRE sequences in terms of motif occurrence combinatorics-Motif Occurrence Combinatorics Classification Algorithms (MOCCA). MOCCA contains efficient implementations of several modelling methods, including SVM-MOCCA, and a new method, RF-MOCCA, a Random Forest-derivative of SVM-MOCCA. We used SVM-MOCCA and RF-MOCCA to model Drosophila PREs and BEs in cross-validation experiments, making this the first study to model PREs with Random Forests and the first study that applies the hierarchical MOCCA approach to the prediction of BEs. Both models significantly improve generalization to PREs and boundary elements beyond that of previous methods-including 4-spectrum and motif occurrence frequency Support Vector Machines and Random Forests-, with RF-MOCCA yielding the best results. CONCLUSION: MOCCA is a flexible and powerful suite of tools for the motif-based modelling of CRE sequences in terms of motif composition. MOCCA can be applied to any new CRE modelling problems where motifs have been identified. MOCCA supports IUPAC and Position Weight Matrix (PWM) motifs. For ease of use, MOCCA implements generation of negative training data, and additionally a mode that requires only that the user specifies positives, motifs and a genome. MOCCA is licensed under the MIT license and is available on Github at https://github.com/bjornbredesen/MOCCA .


Assuntos
Algoritmos , Máquina de Vetores de Suporte , Sequência de Bases , Motivos de Nucleotídeos/genética , Matrizes de Pontuação de Posição Específica
5.
Nucleic Acids Res ; 47(15): 7781-7797, 2019 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-31340029

RESUMO

Polycomb Response Elements (PREs) are cis-regulatory DNA elements that maintain gene transcription states through DNA replication and mitosis. PREs have little sequence similarity, but are enriched in a number of sequence motifs. Previous methods for modelling Drosophila melanogaster PRE sequences (PREdictor and EpiPredictor) have used a set of 7 motifs and a training set of 12 PREs and 16-23 non-PREs. Advances in experimental methods for mapping chromatin binding factors and modifications has led to the publication of several genome-wide sets of Polycomb targets. In addition to the seven motifs previously used, PREs are enriched in the GTGT motif, recently associated with the sequence-specific DNA binding protein Combgap. We investigated whether models trained on genome-wide Polycomb sites generalize to independent PREs when trained with control sequences generated by naive PRE models and including the GTGT motif. We also developed a new PRE predictor: SVM-MOCCA. Training PRE predictors with genome-wide experimental data improves generalization to independent data, and SVM-MOCCA predicts the majority of PREs in three independent experimental sets. We present 2908 candidate PREs enriched in sequence and chromatin signatures. 2412 of these are also enriched in H3K4me1, a mark of Trithorax activated chromatin, suggesting that PREs/TREs have a common sequence code.


Assuntos
Algoritmos , DNA/genética , Drosophila melanogaster/genética , Genoma de Inseto , Proteínas do Grupo Polycomb/genética , Elementos de Resposta , Animais , Sítios de Ligação , Cromatina/química , Cromatina/metabolismo , Proteínas Cromossômicas não Histona/genética , Proteínas Cromossômicas não Histona/metabolismo , DNA/química , DNA/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/metabolismo , Embrião não Mamífero , Ontologia Genética , Histonas/genética , Histonas/metabolismo , Larva/genética , Larva/metabolismo , Anotação de Sequência Molecular , Motivos de Nucleotídeos , Proteínas do Grupo Polycomb/metabolismo , Ligação Proteica , Software , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
6.
Bioinformatics ; 33(1): 145-147, 2017 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-27591081

RESUMO

The precision-recall plot is more informative than the ROC plot when evaluating classifiers on imbalanced datasets, but fast and accurate curve calculation tools for precision-recall plots are currently not available. We have developed Precrec, an R library that aims to overcome this limitation of the plot. Our tool provides fast and accurate precision-recall calculations together with multiple functionalities that work efficiently under different conditions. AVAILABILITY AND IMPLEMENTATION: Precrec is licensed under GPL-3 and freely available from CRAN (https://cran.r-project.org/package=precrec). It is implemented in R with C ++. CONTACT: takaya.saito@ii.uib.noSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Curva ROC , Software
7.
F1000Res ; 5: 1531, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27540470

RESUMO

Identifying functional modules or novel active pathways, recently termed de novo pathway enrichment, is a computational systems biology challenge that has gained much attention during the last decade. Given a large biological interaction network, KeyPathwayMiner extracts connected subnetworks that are enriched for differentially active entities from a series of molecular profiles encoded as binary indicator matrices. Since interaction networks constantly evolve, an important question is how robust the extracted results are when the network is modified. We enable users to study this effect through several network perturbation techniques and over a range of perturbation degrees. In addition, users may now provide a gold-standard set to determine how enriched extracted pathways are with relevant genes compared to randomized versions of the original network.

8.
Nucleic Acids Res ; 44(14): 6639-48, 2016 08 19.
Artigo em Inglês | MEDLINE | ID: mdl-27330136

RESUMO

High-throughput screening (HTS) is an indispensable tool for drug (target) discovery that currently lacks user-friendly software tools for the robust identification of putative hits from HTS experiments and for the interpretation of these findings in the context of systems biology. We developed HiTSeekR as a one-stop solution for chemical compound screens, siRNA knock-down and CRISPR/Cas9 knock-out screens, as well as microRNA inhibitor and -mimics screens. We chose three use cases that demonstrate the potential of HiTSeekR to fully exploit HTS screening data in quite heterogeneous contexts to generate novel hypotheses for follow-up experiments: (i) a genome-wide RNAi screen to uncover modulators of TNFα, (ii) a combined siRNA and miRNA mimics screen on vorinostat resistance and (iii) a small compound screen on KRAS synthetic lethality. HiTSeekR is publicly available at http://hitseekr.compbio.sdu.dk It is the first approach to close the gap between raw data processing, network enrichment and wet lab target generation for various HTS screen types.


Assuntos
Avaliação Pré-Clínica de Medicamentos , Ensaios de Triagem em Larga Escala/métodos , Caspases/metabolismo , Sistemas de Liberação de Medicamentos , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Controle de Qualidade , Interferência de RNA , Robótica , Transdução de Sinais , Fator de Necrose Tumoral alfa/metabolismo
9.
PLoS One ; 10(3): e0118432, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25738806

RESUMO

Binary classifiers are routinely evaluated with performance measures such as sensitivity and specificity, and performance is frequently illustrated with Receiver Operating Characteristics (ROC) plots. Alternative measures such as positive predictive value (PPV) and the associated Precision/Recall (PRC) plots are used less frequently. Many bioinformatics studies develop and evaluate classifiers that are to be applied to strongly imbalanced datasets in which the number of negatives outweighs the number of positives significantly. While ROC plots are visually appealing and provide an overview of a classifier's performance across a wide range of specificities, one can ask whether ROC plots could be misleading when applied in imbalanced classification scenarios. We show here that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity. PRC plots, on the other hand, can provide the viewer with an accurate prediction of future classification performance due to the fact that they evaluate the fraction of true positives among positive predictions. Our findings have potential implications for the interpretation of a large number of studies that use ROC plots on imbalanced datasets.


Assuntos
Conjuntos de Dados como Assunto , Curva ROC , Classificação/métodos
10.
Genetics ; 193(4): 1083-94, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23335332

RESUMO

Mathematical models of meiosis that relate offspring to parental genotypes through parameters such as meiotic recombination frequency have been difficult to develop for polyploids. Existing models have limitations with respect to their analytic potential, their compatibility with insights into mechanistic aspects of meiosis, and their treatment of model parameters in terms of parameter dependencies. In this article I put forward a computational approach to the probabilistic modeling of meiosis. A computer program enumerates all possible paths through the phases of replication, pairing, recombination, and segregation, while keeping track of the probabilities of the paths according to the various parameters involved. Probabilities for classes of genotypes or phenotypes are added, and the resulting formulas are simplified by the symbolic-computation system Mathematica. An example application to autotetraploids results in a model that remedies the limitations of previous models mentioned above. In addition to the immediate implications, the computational approach presented here can be expected to be useful through opening avenues for modeling a host of processes, including meiosis in higher-order ploidies.


Assuntos
Meiose/genética , Modelos Genéticos , Poliploidia , Plantas/genética
11.
BMC Biol ; 10: 32, 2012 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-22513177

RESUMO

This article is a response to Wang and Luo.See correspondence article http://www.biomedcentral.com/1741-7007/10/30 and the original research article http://www.biomedcentral.com/1741-7007/9/24.


Assuntos
Arabidopsis/genética , Pareamento Cromossômico , Cromossomos de Plantas/genética , Recombinação Genética , Tetraploidia
12.
PLoS Genet ; 7(6): e1002126, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21698132

RESUMO

Genomic imprinting is an epigenetic phenomenon leading to parent-of-origin specific differential expression of maternally and paternally inherited alleles. In plants, genomic imprinting has mainly been observed in the endosperm, an ephemeral triploid tissue derived after fertilization of the diploid central cell with a haploid sperm cell. In an effort to identify novel imprinted genes in Arabidopsis thaliana, we generated deep sequencing RNA profiles of F1 hybrid seeds derived after reciprocal crosses of Arabidopsis Col-0 and Bur-0 accessions. Using polymorphic sites to quantify allele-specific expression levels, we could identify more than 60 genes with potential parent-of-origin specific expression. By analyzing the distribution of DNA methylation and epigenetic marks established by Polycomb group (PcG) proteins using publicly available datasets, we suggest that for maternally expressed genes (MEGs) repression of the paternally inherited alleles largely depends on DNA methylation or PcG-mediated repression, whereas repression of the maternal alleles of paternally expressed genes (PEGs) predominantly depends on PcG proteins. While maternal alleles of MEGs are also targeted by PcG proteins, such targeting does not cause complete repression. Candidate MEGs and PEGs are enriched for cis-proximal transposons, suggesting that transposons might be a driving force for the evolution of imprinted genes in Arabidopsis. In addition, we find that MEGs and PEGs are significantly faster evolving when compared to other genes in the genome. In contrast to the predominant location of mammalian imprinted genes in clusters, cluster formation was only detected for few MEGs and PEGs, suggesting that clustering is not a major requirement for imprinted gene regulation in Arabidopsis.


Assuntos
Alelos , Arabidopsis/genética , Endosperma/genética , Regulação da Expressão Gênica de Plantas , Animais , Metilação de DNA/genética , Elementos de DNA Transponíveis/genética , Evolução Molecular , Perfilação da Expressão Gênica , Genes de Plantas , Genoma de Planta/genética , Impressão Genômica , Família Multigênica/genética , Proteínas do Grupo Polycomb , Proteínas Repressoras/metabolismo , Sementes/genética
13.
BMC Biol ; 9: 24, 2011 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-21510849

RESUMO

BACKGROUND: Polyploidization is the multiplication of the whole chromosome complement and has occurred frequently in vascular plants. Maintenance of stable polyploid state over generations requires special mechanisms to control pairing and distribution of more than two homologous chromosomes during meiosis. Since a minimal number of crossover events is essential for correct chromosome segregation, we investigated whether polyploidy has an influence on the frequency of meiotic recombination. RESULTS: Using two genetically linked transgenes providing seed-specific fluorescence, we compared a high number of progeny from diploid and tetraploid Arabidopsis plants. We show that rates of meiotic recombination in reciprocal crosses of genetically identical diploid and autotetraploid Arabidopsis plants were significantly higher in tetraploids compared to diploids. Although male and female gametogenesis differ substantially in meiotic recombination frequency, both rates were equally increased in tetraploids. To investigate whether multivalent formation in autotetraploids was responsible for the increased recombination rates, we also performed corresponding experiments with allotetraploid plants showing strict bivalent pairing. We found similarly increased rates in auto- and allotetraploids, suggesting that the ploidy effect is independent of chromosome pairing configurations. CONCLUSIONS: The evolutionary success of polyploid plants in nature and under domestication has been attributed to buffering of mutations and sub- and neo-functionalization of duplicated genes. Should the data described here be representative for polyploid plants, enhanced meiotic recombination, and the resulting rapid creation of genetic diversity, could have also contributed to their prevalence.


Assuntos
Arabidopsis/genética , Pareamento Cromossômico , Cromossomos de Plantas/genética , Recombinação Genética , Tetraploidia , Arabidopsis/citologia , Evolução Biológica , Gametogênese Vegetal , Plantas Geneticamente Modificadas/genética
14.
Nucleic Acids Res ; 37(12): 4010-21, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19417064

RESUMO

MicroRNAs (miRNAs) are 20-24 nt long endogenous non-coding RNAs that act as post-transcriptional regulators in metazoa and plants. Plant miRNA targets typically contain a single sequence motif with near-perfect complementarity to the miRNA. Here, we extended and applied the program RNAhybrid to identify novel miRNA targets in the complete annotated Arabidopsis thaliana transcriptome. RNAhybrid predicts the energetically most favorable miRNA:mRNA hybrids that are consistent with user-defined structural constraints. These were: (i) perfect base pairing of the duplex from nucleotide 8 to 12 counting from the 5'-end of the miRNA; (ii) loops with a maximum length of one nucleotide in either strand; (iii) bulges with no more than one nucleotide in size; and (iv) unpaired end overhangs not longer than two nucleotides. G:U base pairs are not treated as mismatches, but contribute less favorable to the overall free energy. The resulting hybrids were filtered according to their minimum free energy, resulting in an overall prediction of more than 600 novel miRNA targets. The specificity and signal-to-noise ratio of the prediction was assessed with either randomized miRNAs or randomized target sequences as negative controls. Our results are in line with recent observations that the majority of miRNA targets are not transcription factors.


Assuntos
Arabidopsis/genética , MicroRNAs/química , RNA Mensageiro/química , RNA de Plantas/química , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Software
15.
Nat Cell Biol ; 11(6): 705-16, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19465924

RESUMO

The microRNA pathway has been implicated in the regulation of synaptic protein synthesis and ultimately in dendritic spine morphogenesis, a phenomenon associated with long-lasting forms of memory. However, the particular microRNAs (miRNAs) involved are largely unknown. Here we identify specific miRNAs that function at synapses to control dendritic spine structure by performing a functional screen. One of the identified miRNAs, miR-138, is highly enriched in the brain, localized within dendrites and negatively regulates the size of dendritic spines in rat hippocampal neurons. miR-138 controls the expression of acyl protein thioesterase 1 (APT1), an enzyme regulating the palmitoylation status of proteins that are known to function at the synapse, including the alpha(13) subunits of G proteins (Galpha(13)). RNA-interference-mediated knockdown of APT1 and the expression of membrane-localized Galpha(13) both suppress spine enlargement caused by inhibition of miR-138, suggesting that APT1-regulated depalmitoylation of Galpha(13) might be an important downstream event of miR-138 function. Our results uncover a previously unknown miRNA-dependent mechanism in neurons and demonstrate a previously unrecognized complexity of miRNA-dependent control of dendritic spine morphogenesis.


Assuntos
Espinhas Dendríticas , MicroRNAs/metabolismo , Sinapses , Tioléster Hidrolases/metabolismo , Animais , Sequência de Bases , Linhagem Celular , Espinhas Dendríticas/enzimologia , Espinhas Dendríticas/ultraestrutura , Subunidades alfa G12-G13 de Proteínas de Ligação ao GTP/metabolismo , Perfilação da Expressão Gênica , Hipocampo/citologia , Humanos , Lipoilação , Camundongos , Camundongos Endogâmicos C57BL , MicroRNAs/genética , Dados de Sequência Molecular , Morfogênese , Neurônios/citologia , Neurônios/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Ratos , Receptores de Glutamato/metabolismo , Sinapses/metabolismo , Sinapses/ultraestrutura , Tioléster Hidrolases/antagonistas & inibidores , Tioléster Hidrolases/genética
16.
Bioinformatics ; 25(8): 1084-5, 2009 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-19246510

RESUMO

We introduce the tool mkESA, an open source program for constructing enhanced suffix arrays (ESAs), striving for low memory consumption, yet high practical speed. mkESA is a user-friendly program written in portable C99, based on a parallelized version of the Deep-Shallow suffix array construction algorithm, which is known for its high speed and small memory usage. The tool handles large FASTA files with multiple sequences, and computes suffix arrays and various additional tables, such as the LCP table (longest common prefix) or the inverse suffix array, from given sequence data.


Assuntos
Algoritmos , Biologia Computacional/métodos , Análise de Sequência/métodos , Software , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Análise de Sequência de RNA/métodos
17.
PLoS Biol ; 6(10): e261, 2008 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-18959483

RESUMO

cis-Regulatory DNA elements contain multiple binding sites for activators and repressors of transcription. Among these elements are enhancers, which establish gene expression states, and Polycomb/Trithorax response elements (PREs), which take over from enhancers and maintain transcription states of several hundred developmentally important genes. PREs are essential to the correct identities of both stem cells and differentiated cells. Evolutionary differences in cis-regulatory elements are a rich source of phenotypic diversity, and functional binding sites within regulatory elements turn over rapidly in evolution. However, more radical evolutionary changes that go beyond motif turnover have been difficult to assess. We used a combination of genome-wide bioinformatic prediction and experimental validation at specific loci, to evaluate PRE evolution across four Drosophila species. Our results show that PRE evolution is extraordinarily dynamic. First, we show that the numbers of PREs differ dramatically between species. Second, we demonstrate that functional binding sites within PREs at conserved positions turn over rapidly in evolution, as has been observed for enhancer elements. Finally, although it is theoretically possible that new elements can arise out of nonfunctional sequence, evidence that they do so is lacking. We show here that functional PREs are found at nonorthologous sites in conserved gene loci. By demonstrating that PRE evolution is not limited to the adaptation of preexisting elements, these findings document a novel dimension of cis-regulatory evolution.


Assuntos
Proteínas Cromossômicas não Histona/genética , Proteínas de Drosophila/genética , Drosophila/genética , Evolução Molecular , Elementos de Resposta/genética , Animais , Western Blotting , Imunoprecipitação da Cromatina , Proteínas Cromossômicas não Histona/metabolismo , Biologia Computacional/métodos , Drosophila/classificação , Drosophila/metabolismo , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Genoma/genética , Filogenia , Complexo Repressor Polycomb 1 , Especificidade da Espécie
18.
Methods Mol Biol ; 342: 87-99, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16957369

RESUMO

I describe the use of RNAhybrid, a program that predicts multiple potential binding sites of microRNAs (miRNAs) in large target RNAs. The core algorithm finds the energetically most favorable hybridization sites of a miRNA in a large potential target RNA. Intramolecular hybridizations, i.e., base pairings between target nucleotides or between miRNA nucleotides are, not allowed. For large targets, the time complexity of the algorithm is linear in the target length, allowing many long targets to be searched in a short time. Starting from the observation that the binding energies are results from an optimization procedure, we can model them as following an extreme value distribution. From this, we can calculate the statistical significance of individual binding sites, of multiple binding sites in a single target sequence, and of binding sites in comparative analyses of orthologous sequences across species. The latter involves the calculation of the effective number of orthologous sequences, which can be considerably smaller than the actual number, reflecting the statistical dependence of evolutionarily related sequences.


Assuntos
MicroRNAs/genética , MicroRNAs/metabolismo , Algoritmos , Animais , Sítios de Ligação , Gráficos por Computador , Humanos , Internet , MicroRNAs/química , Modelos Genéticos , Conformação de Ácido Nucleico , Software , Termodinâmica , Interface Usuário-Computador
19.
Nucleic Acids Res ; 34(Web Server issue): W451-4, 2006 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-16845047

RESUMO

In the elucidation of the microRNA regulatory network, knowledge of potential targets is of highest importance. Among existing target prediction methods, RNAhybrid [M. Rehmsmeier, P. Steffen, M. Höchsmann and R. Giegerich (2004) RNA, 10, 1507-1517] is unique in offering a flexible online prediction. Recently, some useful features have been added, among these the possibility to disallow G:U base pairs in the seed region, and a seed-match speed-up, which accelerates the program by a factor of 8. In addition, the program can now be used as a webservice for remote calls from user-implemented programs. We demonstrate RNAhybrid's flexibility with the prediction of a non-canonical target site for Caenorhabditis elegans miR-241 in the 3'-untranslated region of lin-39. RNAhybrid is available at http://bibiserv.techfak.uni-bielefeld.de/rnahybrid.


Assuntos
MicroRNAs/química , Interferência de RNA , Software , Regiões 3' não Traduzidas/química , Animais , Sítios de Ligação , Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/genética , Proteínas de Homeodomínio/genética , Internet
20.
Nucleic Acids Res ; 34(Web Server issue): W546-50, 2006 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-16845067

RESUMO

Gene regulation is the process through which an organism effects spatial and temporal differences in gene expression levels. Knowledge of cis-regulatory elements as key players in gene regulation is indispensable for the understanding of the latter and of the development of organisms. Here we present the tool jPREdictor for the fast and versatile prediction of cis-regulatory elements on a genome-wide scale. The prediction is based on clusters of individual motifs and any combination of these into multi-motifs with selectable minimal and maximal distances. Individual motifs can be of heterogenous classes, such as simple sequence motifs or position-specific scoring matrices. Cluster scores are weighted occurrences of multi-motifs, where the weights are derived from positive and negative training sets. We illustrate the flexibility of the jPREdictor with a new predic-tion of Polycomb/Trithorax Response Elements in Drosophila melanogaster. jPREdictor is available as a graphical user interface for online use and for download at http://bibiserv.techfak.uni-bielefeld.de/jpredictor.


Assuntos
Genômica/métodos , Elementos Reguladores de Transcrição , Software , Animais , Sítios de Ligação , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Internet , Complexo Repressor Polycomb 1 , Elementos de Resposta , Fatores de Transcrição/metabolismo , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...