Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Nucleic Acids Res ; 48(15): 8276-8289, 2020 09 04.
Artigo em Inglês | MEDLINE | ID: mdl-32735675

RESUMO

The manual production of reliable RNA structure models from chemical probing experiments benefits from the integration of information derived from multiple protocols and reagents. However, the interpretation of multiple probing profiles remains a complex task, hindering the quality and reproducibility of modeling efforts. We introduce IPANEMAP, the first automated method for the modeling of RNA structure from multiple probing reactivity profiles. Input profiles can result from experiments based on diverse protocols, reagents, or collection of variants, and are jointly analyzed to predict the dominant conformations of an RNA. IPANEMAP combines sampling, clustering and multi-optimization, to produce secondary structure models that are both stable and well-supported by experimental evidences. The analysis of multiple reactivity profiles, both publicly available and produced in our study, demonstrates the good performances of IPANEMAP, even in a mono probing setting. It confirms the potential of integrating multiple sources of probing data, informing the design of informative probing assays.


Assuntos
Conformação de Ácido Nucleico , RNA/química , Software , Amebozoários/genética , Benchmarking , Conjuntos de Dados como Assunto , Mutação , RNA/genética
2.
Bioinformatics ; 32(7): 984-92, 2016 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-26740523

RESUMO

MOTIVATION: Whole genome sequencing of paired-end reads can be applied to characterize the landscape of large somatic rearrangements of cancer genomes. Several methods for detecting structural variants with whole genome sequencing data have been developed. So far, none of these methods has combined information about abnormally mapped read pairs connecting rearranged regions and associated global copy number changes automatically inferred from the same sequencing data file. Our aim was to create a computational method that could use both types of information, i.e. normal and abnormal reads, and demonstrate that by doing so we can highly improve both sensitivity and specificity rates of structural variant prediction. RESULTS: We developed a computational method, SV-Bay, to detect structural variants from whole genome sequencing mate-pair or paired-end data using a probabilistic Bayesian approach. This approach takes into account depth of coverage by normal reads and abnormalities in read pair mappings. To estimate the model likelihood, SV-Bay considers GC-content and read mappability of the genome, thus making important corrections to the expected read count. For the detection of somatic variants, SV-Bay makes use of a matched normal sample when it is available. We validated SV-Bay on simulated datasets and an experimental mate-pair dataset for the CLB-GA neuroblastoma cell line. The comparison of SV-Bay with several other methods for structural variant detection demonstrated that SV-Bay has better prediction accuracy both in terms of sensitivity and false-positive detection rate. AVAILABILITY AND IMPLEMENTATION: https://github.com/InstitutCurie/SV-Bay CONTACT: valentina.boeva@inserm.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Teorema de Bayes , Estudo de Associação Genômica Ampla , Variação Estrutural do Genoma , Neoplasias/genética , Composição de Bases , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Metagenômica
3.
RNA Biol ; 14(8): 1075-1085, 2017 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-28277897

RESUMO

It is only recently that the abundant presence of circular RNAs (circRNAs) in all kingdoms of Life, including the hyperthermophilic archaeon Pyrococcus abyssi, has emerged. This led us to investigate the physiologic significance of a previously observed weak intramolecular ligation activity of Pab1020 RNA ligase. Here we demonstrate that this enzyme, despite sharing significant sequence similarity with DNA ligases, is indeed an RNA-specific polynucleotide ligase efficiently acting on physiologically significant substrates. Using a combination of RNA immunoprecipitation assays and RNA-seq, our genome-wide studies revealed 133 individual circRNA loci in P. abyssi. The large majority of these loci interacted with Pab1020 in cells and circularization of selected C/D Box and 5S rRNA transcripts was confirmed biochemically. Altogether these studies revealed that Pab1020 is required for RNA circularization. Our results further suggest the functional speciation of an ancestral NTase domain and/or DNA ligase toward RNA ligase activity and prompt for further characterization of the widespread functions of circular RNAs in prokaryotes. Detailed insight into the cellular substrates of Pab1020 may facilitate the development of new biotechnological applications e.g. in ligation of preadenylated adaptors to RNA molecules.


Assuntos
Processamento Alternativo , Proteínas Arqueais/genética , Genoma Arqueal , Pyrococcus abyssi/genética , RNA Ligase (ATP)/genética , RNA Arqueal/genética , RNA/genética , Proteínas Arqueais/metabolismo , Biologia Computacional , Imunoprecipitação , Pyrococcus abyssi/enzimologia , RNA/metabolismo , RNA Ligase (ATP)/metabolismo , Estabilidade de RNA , RNA Arqueal/metabolismo , RNA Circular , RNA Ribossômico 5S/genética , RNA Ribossômico 5S/metabolismo , Análise de Sequência de RNA , Especificidade por Substrato
4.
Chem Res Toxicol ; 24(12): 2061-70, 2011 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-21732636

RESUMO

The toxicity of carbon dioxide has been established for close to a century. A number of animal experiments have explored both acute and long-term toxicity with respect to the lungs, the cardiovascular system, and the bladder, showing inflammatory and possible carcinogenic effects. Carbon dioxide also induces multiple fetal malformations and probably reduces fertility in animals. The aim of the review is to recapitulate the physiological and metabolic mechanisms resulting from CO(2) inhalation. As smokers are exposed to a high level of carbon dioxide (13%) that is about 350 times the level in normal air, we propose the hypothesis that carbon dioxide plays a major role in the long term toxicity of tobacco smoke.


Assuntos
Dióxido de Carbono/toxicidade , Acidose Respiratória/metabolismo , Acidose Respiratória/patologia , Animais , Bicarbonatos/química , Carcinógenos/toxicidade , Sistema Cardiovascular/efeitos dos fármacos , Sistema Cardiovascular/metabolismo , Sistema Nervoso Central/efeitos dos fármacos , Sistema Nervoso Central/metabolismo , Humanos , Hipercapnia/metabolismo , Hipercapnia/patologia , Pulmão/efeitos dos fármacos , Pulmão/metabolismo , Reprodução/efeitos dos fármacos
5.
Nat Biotechnol ; 23(1): 137-44, 2005 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15637633

RESUMO

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.


Assuntos
Biologia Computacional/métodos , Expressão Gênica , Transcrição Gênica , Motivos de Aminoácidos , Animais , Sítios de Ligação , Bases de Dados de Proteínas , Drosophila , Proteínas Fúngicas/química , Humanos , Internet , Camundongos , Reprodutibilidade dos Testes , Software
6.
J Bioinform Comput Biol ; 4(2): 537-51, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16819801

RESUMO

We study and compare two classes of statistical criteria to assess the significance of exceptional words. Indeed, the Z-score-like criteria, or the normal approximation that is a strict equivalent, suffer from several drawbacks in terms of sensitivity and specificity. Thanks to the combinatorial structure of words, a computation of the exact P-value has been made possible by recent mathematical results. We study here the drawbacks of the Z-score, the choice of the threshold and the tightness to the P-value. A major conclusion is that the normal approximation is always very poor and overestimates statistical significance.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Modelos Genéticos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Fatores de Transcrição/genética , Sequência de Bases , Sítios de Ligação , Simulação por Computador , Interpretação Estatística de Dados , Entropia , Modelos Estatísticos , Dados de Sequência Molecular , Software
7.
Artigo em Inglês | MEDLINE | ID: mdl-27376057

RESUMO

Repetitive patterns in genomic sequences have a great biological significance and also algorithmic implications. Analytic combinatorics allow to derive formula for the expected length of repetitions in a random sequence. Asymptotic results, which generalize previous works on a binary alphabet, are easily computable. Simulations on random sequences show their accuracy. As an application, the sample case of Archaea genomes illustrates how biological sequences may differ from random sequences.

8.
Bioinformation ; 10(7): 472-3, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25187691

RESUMO

UNLABELLED: microRNAs are small RNA molecules that inhibit the translation of target genes. microRNA binding sites are located in the untranslated regions as well as in the coding domains. We describe TmiRUSite and TmiROSite scripts developed using python as tools for the extraction of nucleotide sequences for miRNA binding sites with their encoded amino acid residue sequences. The scripts allow for retrieving a set of additional sequences at left and at right from the binding site. The scripts presents all received data in table formats that are easy to analyse further. The predicted data finds utility in molecular and evolutionary biology studies. They find use in studying miRNA binding sites in animals and plants. AVAILABILITY: TmiRUSite and TmiROSite scripts are available for free from authors upon request and at https: //sites.google.com/site/malaheenee/downloads for download.

9.
Bioinformation ; 10(8): 539-43, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25258491

RESUMO

UNLABELLED: In recent times, information on miRNAs and their binding sites is gaining momentum. Therefore, there is interest in the development of tools extracting miRNA related information from known literature. Hence, we describe GeneAFinder and miRAFinder scripts (open source) developed using python programming for the semi-automatic extraction and arrangement of updated information on miRNAs, genes and additional data from published article abstracts in PubMed. The scripts are suitable for custom modification as per requirement. AVAILABILITY: miRAFinder and GeneAFinder scripts are free and available for download at http://sites.google.com /site/malaheenee/software.

10.
Algorithms Mol Biol ; 9(1): 25, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25648087

RESUMO

BACKGROUND: Finding new functional fragments in biological sequences is a challenging problem. Methods addressing this problem commonly search for clusters of pattern occurrences that are statistically significant. A measure of statistical significance is the P-value of a number of pattern occurrences, i.e. the probability to find at least S occurrences of words from a pattern in a random text of length N generated according to a given probability model. All words of the pattern are supposed to be of same length. RESULTS: We present a novel algorithm SufPref that computes an exact P-value for Hidden Markov models (HMM). The algorithm is based on recursive equations on text sets related to pattern occurrences; the equations can be used for any probability model. The algorithm inductively traverses a specific data structure, an overlap graph. The nodes of the graph are associated with the overlaps of words from . The edges are associated to the prefix and suffix relations between overlaps. An originality of our data structure is that pattern need not be explicitly represented in nodes or leaves. The algorithm relies on the Cartesian product of the overlap graph and the graph of HMM states; this approach is analogous to the automaton approach from JBCB 4: 553-569. The gain in size of SufPref data structure leads to significant improvements in space and time complexity compared to existent algorithms. The algorithm SufPref was implemented as a C++ program; the program can be used both as Web-server and a stand alone program for Linux and Windows. The program interface admits special formats to describe probability models of various types (HMM, Bernoulli, Markov); a pattern can be described with a list of words, a PSSM, a degenerate pattern or a word and a number of mismatches. It is available at http://server2.lpm.org.ru/bio/online/sf/. The program was applied to compare sensitivity and specificity of methods for TFBS prediction based on P-values computed for Bernoulli models, Markov models of orders one and two and HMMs. The experiments show that the methods have approximately the same qualities.

11.
Bioinformation ; 8(11): 513-8, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22829721

RESUMO

miRNAs regulate gene expression by binding with mRNAs of many genes. Studying their effects on genes involved in oncogenesis is important in cancer diagnostics and therapeutics. The RNAHybrid 2.1 program was used to predict the strong miRNA binding sites (p < 0.0005) in target mRNAs. The program Finder 2.2 was created to verify 784 intergenic miRNAs (ig-miRNA) origin. Among 54 considered oncogenes and tumor suppressor genes, 47 genes are the best targets for ig-miRNAs. Accordingly, these genes are strongly regulated by 111 ig-miRNAs. Some miRNAs bind several mRNAs, and some mRNAs have several binding sites for miRNAs. Of the 54 mRNAs, 21.8%, 43.0%, and 35.2% of the miRNA binding sites are present in the 5'UTRs, CDSes, and 3'UTRs, respectively. The average density of the binding sites for miRNAs in the 5'UTR was 4.4 times and 4.1 times greater than in the CDS and the 3'UTR, respectively. Three types of interactions between miRNAs and mRNAs were identified, which differ according to the region of the miRNA bound to the mRNA: 1) binding occurs predominantly via the 3'-region of the miRNA; 2) binding occurs predominantly through the central region of the miRNA; and 3) binding occurs predominantly via the 5'-region of the miRNA. Several miRNAs effectively regulate only one gene, and this information could be useful in molecular medicine to modulate translation of the target mRNA. We recommend described new sites for validation by experimental investigation.

12.
J Comput Biol ; 18(10): 1339-51, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21548808

RESUMO

In 2004, Condon and coauthors gave a hierarchical classification of exact RNA structure prediction algorithms according to the generality of structure classes that they handle. We complete this classification by adding two recent prediction algorithms. More importantly, we precisely quantify the hierarchy by giving closed or asymptotic formulas for the theoretical number of structures of given size n in all the classes but one. This allows us to assess the tradeoff between the expressiveness and the computational complexity of RNA structure prediction algorithms.


Assuntos
Algoritmos , Biologia Computacional/métodos , Modelos Moleculares , RNA/química , Simulação por Computador , Conformação de Ácido Nucleico , Probabilidade , Análise de Sequência de RNA
13.
Int J Biol Sci ; 5(1): 13-9, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19119309

RESUMO

The splice-site sequences of U2-type introns are highly degenerate, so many different sequences can function as U2-type splice sites. Using our new profiles based on hydrophobicity properties we pointed out specific properties for regions surrounding splice sites. We built a set T of flanking regions of genes with 1-3 introns from 21st and 22nd chromosomes extracted from GenBank to define positions having conserved properties, namely hydrophobicity, that are potentially essential for recognition by spliceosome. GT-AG introns exist in U2 and U12-types. Therefore, intron type cannot be simply determined by the dinucleotide termini. We attempted to distinguish U2 and U12-types introns with help of hydrophobicity profiles on sets of spice sites for U2 or U12-type introns extracted from SpliceRack database. The positions given by our method, which may be important for recognition by spliceosome, were compared to the nucleotide consensus provided by a classical method, Pictogram. We showed that there is a similarity of hydrophobicity profiles inside intron types. On one hand, GT-AG and GC-AG introns belonging to U2-type have resembling hydrophobicity profiles as well as AT-AC and GT-AG introns belonging to U12-type. On the other hand, hydrophobicity profiles of U2 and U12-types GT-AG introns are completely different. We suggest that hydrophobicity profiles facilitate definition of intron type, distinguishing U2 and U12 intron types and can be used to build programs to search splice site and to evaluate their strength. Therefore, our study proves that hydrophobicity profiles are informative method providing insights into mechanisms of splice sites recognition.


Assuntos
Cromossomos Humanos Par 21/genética , Cromossomos Humanos Par 22/genética , Biologia Computacional/métodos , Interações Hidrofóbicas e Hidrofílicas , Íntrons/genética , Nucleotídeos/química , Sítios de Splice de RNA/genética , Humanos
14.
Algorithms Mol Biol ; 2: 13, 2007 Oct 10.
Artigo em Inglês | MEDLINE | ID: mdl-17927813

RESUMO

BACKGROUND: cis-Regulatory modules (CRMs) of eukaryotic genes often contain multiple binding sites for transcription factors. The phenomenon that binding sites form clusters in CRMs is exploited in many algorithms to locate CRMs in a genome. This gives rise to the problem of calculating the statistical significance of the event that multiple sites, recognized by different factors, would be found simultaneously in a text of a fixed length. The main difficulty comes from overlapping occurrences of motifs. So far, no tools have been developed allowing the computation of p-values for simultaneous occurrences of different motifs which can overlap. RESULTS: We developed and implemented an algorithm computing the p-value that s different motifs occur respectively k1, ..., ks or more times, possibly overlapping, in a random text. Motifs can be represented with a majority of popular motif models, but in all cases, without indels. Zero or first order Markov chains can be adopted as a model for the random text. The computational tool was tested on the set of cis-regulatory modules involved in D. melanogaster early development, for which there exists an annotation of binding sites for transcription factors. Our test allowed us to correctly identify transcription factors cooperatively/competitively binding to DNA. METHOD: The algorithm that precisely computes the probability of simultaneous motif occurrences is inspired by the Aho-Corasick automaton and employs a prefix tree together with a transition function. The algorithm runs with the O(n|Sigma|m|H| + K|sigma|K) Piiki) time complexity, where n is the length of the text |Sigma| is the alphabet size, m is the maximal motif length, |H| is the total number of words in motifs, K is the order of Markov model, and ki is the number of occurrences of the ith motif. CONCLUSION: The primary objective of the program is to assess the likelihood that a given DNA segment is CRM regulated with a known set of regulatory factors. In addition, the program can also be used to select the appropriate threshold for PWM scanning. Another application is assessing similarity of different motifs. AVAILABILITY: Project web page, stand-alone version and documentation can be found at http://bioinform.genetika.ru/AhoPro/

15.
Bioinformatics ; 22(6): 676-84, 2006 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-16403795

RESUMO

MOTIVATION: Genomic sequences are highly redundant and contain many types of repetitive DNA. Fuzzy tandem repeats (FTRs) are of particular interest. They are found in regulatory regions of eukaryotic genes and are reported to interact with transcription factors. However, accurate assessment of FTR occurrences in different genome segments requires specific algorithm for efficient FTR identification and classification. RESULTS: We have obtained formulas for P-values of FTR occurrence and developed an FTR identification algorithm implemented in TandemSWAN software. Using TandemSWAN we compared the structure and the occurrence of FTRs with short period length (up to 24 bp) in coding and non-coding regions including UTRs, heterochromatic, intergenic and enhancer sequences of Drosophila melanogaster and Drosophila pseudoobscura. Tandems with period three and its multiples were found in coding segments, whereas FTRs with periods multiple of six are overrepresented in all non-coding segment. Periods equal to 5-7 and 11-14 were characteristic of the enhancer regions and other non-coding regions close to genes. AVAILABILITY: TandemSWAN web page, stand-alone version and documentation can be found at http://bioinform.genetika.ru/projects/swan/www/ SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Drosophila/genética , Lógica Fuzzy , Regulação da Expressão Gênica/genética , Análise de Sequência de DNA/métodos , Sequências de Repetição em Tandem/genética , Animais , Alinhamento de Sequência/métodos
16.
Comput Chem ; 26(5): 521-30, 2002 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-12144180

RESUMO

This paper presents an algorithm, DCFold, that automatically predicts the common secondary structure of a set of aligned homologous RNA sequences. It is based on the comparative approach. Helices are searched in one of the sequences, called the 'target sequence', and compared to the helices in the other sequences, called the 'test sequences'. Our algorithm searches in the target sequence for palindromes that have a high probability to define helices that are conserved in the test sequences. This selection of significant palindromes is based on criteria that take into account their length and their mutation rate. A recursive search of helices, starting from these likely ones, is implemented using the 'divide and conquer' approach. Indeed, as pseudo-knots are not searched by DCFold, a selected palindrome (p, p') makes possible to divide the initial sequence into two sequences, the internal one and the one resulting from the concatenation of the two external ones. New palindromes can be searched independently in these subsequences. This algorithm was run on ribosomal RNA sequences and recovered very efficiently their common secondary structures.


Assuntos
Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , RNA/genética , Algoritmos , Sequência de Bases , Sequência Conservada , Dados de Sequência Molecular , Mutação , RNA/classificação , RNA Ribossômico/química , RNA Ribossômico/genética , Software
17.
Genome Res ; 12(3): 470-81, 2002 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-11875036

RESUMO

The early developmental enhancers of Drosophila melanogaster comprise one of the most sophisticated regulatory systems in higher eukaryotes. An elaborate code in their DNA sequence translates both maternal and early embryonic regulatory signals into spatial distribution of transcription factors. One of the most striking features of this code is the redundancy of binding sites for these transcription factors (BSTF). Using this redundancy, we explored the possibility of predicting functional binding sites in a single enhancer region without any prior consensus/matrix description or evolutionary sequence comparisons. We developed a conceptually simple algorithm, Scanseq, that employs an original statistical evaluation for identifying the most redundant motifs and locates the position of potential BSTF in a given regulatory region. To estimate the biological relevance of our predictions, we built thorough literature-based annotations for the best-known Drosophila developmental enhancers and we generated detailed distribution maps for the most robust binding sites. The high statistical correlation between the location of BSTF in these experiment-based maps and the location predicted in silico by Scanseq confirmed the relevance of our approach. We also discuss the definition of true binding sites and the possible biological principles that govern patterning of regulatory regions and the distribution of transcriptional signals.


Assuntos
Proteínas de Bactérias , Proteínas de Drosophila , Drosophila melanogaster/genética , Elementos Facilitadores Genéticos/genética , Regulação da Expressão Gênica no Desenvolvimento/genética , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição , Algoritmos , Animais , Sequência de Bases , Sítios de Ligação/genética , Mapeamento Cromossômico/métodos , Mapeamento Cromossômico/estatística & dados numéricos , Proteínas de Ligação a DNA/genética , Genes de Insetos/genética , Proteínas de Homeodomínio/genética , Dados de Sequência Molecular , Família Multigênica/genética , Proteínas Nucleares , Proteínas de Protozoários/genética , Sequências de Repetição em Tandem/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA