Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genome Res ; 27(8): 1371-1383, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28487280

RESUMO

Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human-mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3' ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality.


Assuntos
Regulação da Expressão Gênica , Conformação de Ácido Nucleico , RNA/química , RNA/genética , Elementos Reguladores de Transcrição , Vertebrados/genética , Animais , Sequência de Bases , Sequência Conservada , Genoma Humano , Humanos , Camundongos , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo , Homologia de Sequência , Transcrição Gênica
2.
Methods Mol Biol ; 703: 53-65, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21125483

RESUMO

Genome browsers are important tools for studying genomes given the vast amounts of data available. This chapter focuses on providing the reader with the skills necessary to perform relatively simple, yet powerful, analysis relating to the structure of the transcription unit. Studying available data should be one of the very first steps taken in designing experiments. This can save considerable time in your research or as expressed by Alan Bleasby "Two months in the lab can easily save an afternoon on the computer."


Assuntos
Bases de Dados Genéticas , Genoma/genética , Genômica/métodos , Internet , Ferramenta de Busca , Software
3.
Trends Biotechnol ; 28(1): 9-19, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19942311

RESUMO

Growing recognition of the numerous, diverse and important roles played by non-coding RNA in all organisms motivates better elucidation of these cellular components. Comparative genomics is a powerful tool for this task and is arguably preferable to any high-throughput experimental technology currently available, because evolutionary conservation highlights functionally important regions. Conserved secondary structure, rather than primary sequence, is the hallmark of many functionally important RNAs, because compensatory substitutions in base-paired regions preserve structure. Unfortunately, such substitutions also obscure sequence identity and confound alignment algorithms, which complicates analysis greatly. This paper surveys recent computational advances in this difficult arena, which have enabled genome-scale prediction of cross-species conserved RNA elements. These predictions suggest that a wealth of these elements indeed exist.


Assuntos
Biologia Computacional/métodos , Genoma/genética , Conformação de Ácido Nucleico , RNA não Traduzido/genética , Análise de Sequência de RNA/métodos , Sequência de Bases/genética , Hibridização Genômica Comparativa , Sequência Conservada , Evolução Molecular
4.
Bioinformatics ; 25(5): 668-9, 2009 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-19136551

RESUMO

SUMMARY: Assessing the statistical significance of structured RNA predicted from multiple sequence alignments relies on the existence of a good null model. We present here a random shuffling algorithm, Multiperm, that preserves not only the gap and local conservation structure in alignments of arbitrarily many sequences, but also the approximate dinucleotide frequencies. No shuffling algorithm that simultaneously preserves these three characteristics of a multiple (beyond pairwise) alignment has been available to date. As one benchmark, we show that it produces shuffled exonic sequences having folding free energy closer to native sequences than shuffled alignments that do not preserve dinucleotide frequencies. AVAILABILITY: The Multiperm GNU Cb++ source code is available at http://www.anandam.name/multiperm


Assuntos
Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de RNA , Nucleotídeos/análise , RNA/química , Termodinâmica
5.
Bioinformatics ; 25(3): 291-4, 2009 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-19059941

RESUMO

UNLABELLED: MicroRNAs (miRNAs) are a group of small, approximately 21 nt long, riboregulators inhibiting gene expression at a post-transcriptional level. Their most distinctive structural feature is the foldback hairpin of their precursor pre-miRNAs. Even though each pre-miRNA deposited in miRBase has its secondary structure already predicted, little is known about the patterns of structural conservation among pre-miRNAs. We address this issue by clustering the human pre-miRNA sequences based on pairwise, sequence and secondary structure alignment using FOLDALIGN, followed by global multiple alignment of obtained clusters by WAR. As a result, the common secondary structure was successfully determined for four FOLDALIGN clusters: the RF00027 structural family of the Rfam database and three clusters with previously undescribed consensus structures. AVAILABILITY: http://genome.ku.dk/resources/mirclust


Assuntos
MicroRNAs/química , Sequência de Bases , Análise por Conglomerados , Biologia Computacional , Bases de Dados Genéticas , Genoma Humano , Humanos , MicroRNAs/genética , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Software
6.
Nucleic Acids Res ; 36(Web Server issue): W79-84, 2008 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-18492721

RESUMO

We present an easy-to-use webserver that makes it possible to simultaneously use a number of state of the art methods for performing multiple alignment and secondary structure prediction for noncoding RNA sequences. This makes it possible to use the programs without having to download the code and get the programs to run. The results of all the programs are presented on a webpage and can easily be downloaded for further analysis. Additional measures are calculated for each program to make it easier to judge the individual predictions, and a consensus prediction taking all the programs into account is also calculated. This website is free and open to all users and there is no login requirement. The webserver can be found at: http://genome.ku.dk/resources/war.


Assuntos
RNA não Traduzido/química , Alinhamento de Sequência , Análise de Sequência de RNA , Software , Internet , Conformação de Ácido Nucleico
7.
Genome Res ; 18(2): 242-51, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18096747

RESUMO

Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment methods to misalign, or even refuse to align, homologous ncRNAs, consequently obscuring that structural signal. We have used CMfinder, a structure-oriented local alignment tool, to search the ENCODE regions of vertebrate multiple alignments. In agreement with other studies, we find a large number of potential RNA structures in the ENCODE regions. We report 6587 candidate regions with an estimated false-positive rate of 50%. More intriguingly, many of these candidates may be better represented by alignments taking the RNA secondary structure into account than those based on primary sequence alone, often quite dramatically. For example, approximately one-quarter of our predicted motifs show revisions in >50% of their aligned positions. Furthermore, our results are strongly complementary to those discovered by sequence-alignment-based approaches--84% of our candidates are not covered by Washietl et al., increasing the number of ncRNA candidates in the ENCODE region by 32%. In a group of 11 ncRNA candidates that were tested by RT-PCR, 10 were confirmed to be present as RNA transcripts in human tissue, and most show evidence of significant differential expression across tissues. Our results broadly suggest caution in any analysis relying on multiple sequence alignments in less well-conserved regions, clearly support growing appreciation for the biological significance of ncRNAs, and strongly support the argument for considering RNA structure directly in any searches for these elements.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Conformação de Ácido Nucleico , RNA não Traduzido/genética , Análise de Sequência/métodos , Sequência de Bases , Biologia Computacional/tendências , Bases de Dados Genéticas , Genômica/tendências , Humanos
8.
PLoS Comput Biol ; 3(10): 1896-908, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17937495

RESUMO

It has become clear that noncoding RNAs (ncRNA) play important roles in cells, and emerging studies indicate that there might be a large number of unknown ncRNAs in mammalian genomes. There exist computational methods that can be used to search for ncRNAs by comparing sequences from different genomes. One main problem with these methods is their computational complexity, and heuristics are therefore employed. Two heuristics are currently very popular: pre-folding and pre-aligning. However, these heuristics are not ideal, as pre-aligning is dependent on sequence similarity that may not be present and pre-folding ignores the comparative information. Here, pruning of the dynamical programming matrix is presented as an alternative novel heuristic constraint. All subalignments that do not exceed a length-dependent minimum score are discarded as the matrix is filled out, thus giving the advantage of providing the constraints dynamically. This has been included in a new implementation of the FOLDALIGN algorithm for pairwise local or global structural alignment of RNA sequences. It is shown that time and memory requirements are dramatically lowered while overall performance is maintained. Furthermore, a new divide and conquer method is introduced to limit the memory requirement during global alignment and backtrack of local alignment. All branch points in the computed RNA structure are found and used to divide the structure into smaller unbranched segments. Each segment is then realigned and backtracked in a normal fashion. Finally, the FOLDALIGN algorithm has also been updated with a better memory implementation and an improved energy model. With these improvements in the algorithm, the FOLDALIGN software package provides the molecular biologist with an efficient and user-friendly tool for searching for new ncRNAs. The software package is available for download at http://foldalign.ku.dk.


Assuntos
Biologia Computacional/métodos , RNA não Traduzido/química , RNA/química , RNA/genética , Algoritmos , Animais , Humanos , Camundongos , Modelos Estatísticos , Modelos Teóricos , Conformação de Ácido Nucleico , Software , Fatores de Tempo
9.
RNA ; 13(11): 1850-9, 2007 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17804647

RESUMO

We have developed a semiautomated RNA sequence editor (SARSE) that integrates tools for analyzing RNA alignments. The editor highlights different properties of the alignment by color, and its integrated analysis tools prevent the introduction of errors when doing alignment editing. SARSE readily connects to external tools to provide a flexible semiautomatic editing environment. A new method, Pcluster, is introduced for dividing the sequences of an RNA alignment into subgroups with secondary structure differences. Pcluster was used to evaluate 574 seed alignments obtained from the Rfam database and we identified 71 alignments with significant prediction of inconsistent base pairs and 102 alignments with significant prediction of novel base pairs. Four RNA families were used to illustrate how SARSE can be used to manually or automatically correct the inconsistent base pairs detected by Pcluster: the mir-399 RNA, vertebrate telomase RNA (vert-TR), bacterial transfer-messenger RNA (tmRNA), and the signal recognition particle (SRP) RNA. The general use of the method is illustrated by the ability to accommodate pseudoknots and handle even large and divergent RNA families. The open architecture of the SARSE editor makes it a flexible tool to improve all RNA alignments with relatively little human intervention. Online documentation and software are available at (http://sarse.ku.dk).


Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de RNA , Software , Biologia Computacional , Bases de Dados Genéticas , Conformação de Ácido Nucleico , RNA/química , Homologia de Sequência do Ácido Nucleico , Interface Usuário-Computador
10.
Bioinformatics ; 23(8): 926-32, 2007 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-17324941

RESUMO

MOTIVATION: An apparent paradox in computational RNA structure prediction is that many methods, in advance, require a multiple alignment of a set of related sequences, when searching for a common structure between them. However, such a multiple alignment is hard to obtain even for few sequences with low sequence similarity without simultaneously folding and aligning them. Furthermore, it is of interest to conduct a multiple alignment of RNA sequence candidates found from searching as few as two genomic sequences. RESULTS: Here, based on the PMcomp program, we present a global multiple alignment program, foldalignM, which performs especially well on few sequences with low sequence similarity, and is comparable in performance with state of the art programs in general. In addition, it can cluster sequences based on sequence and structure similarity and output a multiple alignment for each cluster. Furthermore, preliminary results with local datasets indicate that the program is useful for post processing foldalign pairwise scans. AVAILABILITY: The program foldalignM is implemented in JAVA and is, along with some accompanying PERL scripts, available at http://foldalign.ku.dk/


Assuntos
Análise por Conglomerados , RNA/química , RNA/genética , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Software , Algoritmos , Sequência de Bases , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico
11.
Genome Res ; 16(7): 885-9, 2006 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-16751343

RESUMO

Human and mouse genome sequences contain roughly 100,000 regions that are unalignable in primary sequence and neighbor corresponding alignable regions between both organisms. These pairs are generally assumed to be nonconserved, although the level of structural conservation between these has never been investigated. Owing to the limitations in computational methods, comparative genomics has been lacking the ability to compare such nonconserved sequence regions for conserved structural RNA elements. We have investigated the presence of structural RNA elements by conducting a local structural alignment, using FOLDALIGN, on a subset of these 100,000 corresponding regions and estimate that 1800 contain common RNA structures. Comparing our results with the recent mapping of transcribed fragments (transfrags) in human, we find that high-scoring candidates are twice as likely to be found in regions overlapped by transfrags than regions that are not overlapped by transfrags. To verify the coexpression between predicted candidates in human and mouse, we conducted expression studies by RT-PCR and Northern blotting on mouse candidates, which overlap with transfrags on human chromosome 20. RT-PCR results confirmed expression of 32 out of 36 candidates, whereas Northern blots confirmed four out of 12 candidates. Furthermore, many RT-PCR results indicate differential expression in different tissues. Hence, our findings suggest that there are corresponding regions between human and mouse, which contain expressed non-coding RNA sequences not alignable in primary sequence.


Assuntos
Genoma Humano , Genoma , Camundongos/genética , RNA/química , Animais , Pareamento de Bases , Sequência de Bases , Galinhas/genética , Mapeamento Cromossômico , Cromossomos Humanos Par 20 , Sequência Conservada , Cães , Humanos , Conformação de Ácido Nucleico , Ratos , Análise de Sequência de RNA/estatística & dados numéricos , Homologia de Sequência do Ácido Nucleico , Software , Transcrição Gênica
12.
J Bacteriol ; 187(14): 4992-9, 2005 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-15995215

RESUMO

Sulfolobus acidocaldarius is an aerobic thermoacidophilic crenarchaeon which grows optimally at 80 degrees C and pH 2 in terrestrial solfataric springs. Here, we describe the genome sequence of strain DSM639, which has been used for many seminal studies on archaeal and crenarchaeal biology. The circular genome carries 2,225,959 bp (37% G+C) with 2,292 predicted protein-encoding genes. Many of the smaller genes were identified for the first time on the basis of comparison of three Sulfolobus genome sequences. Of the protein-coding genes, 305 are exclusive to S. acidocaldarius and 866 are specific to the Sulfolobus genus. Moreover, 82 genes for untranslated RNAs were identified and annotated. Owing to the probable absence of active autonomous and nonautonomous mobile elements, the genome stability and organization of S. acidocaldarius differ radically from those of Sulfolobus solfataricus and Sulfolobus tokodaii. The S. acidocaldarius genome contains an integrated, and probably encaptured, pARN-type conjugative plasmid which may facilitate intercellular chromosomal gene exchange in S. acidocaldarius. Moreover, it contains genes for a characteristic restriction modification system, a UV damage excision repair system, thermopsin, and an aromatic ring dioxygenase, all of which are absent from genomes of other Sulfolobus species. However, it lacks genes for some of their sugar transporters, consistent with it growing on a more limited range of carbon sources. These results, together with the many newly identified protein-coding genes for Sulfolobus, are incorporated into a public Sulfolobus database which can be accessed at http://dac.molbio.ku.dk/dbs/Sulfolobus.


Assuntos
Genoma Arqueal , Sulfolobus acidocaldarius/genética , Sequência de Bases , Mapeamento Cromossômico , Replicação do DNA/genética , DNA Arqueal/genética , DNA Circular/genética , Genoma Bacteriano , Modelos Genéticos , Dados de Sequência Molecular , Mapeamento por Restrição/métodos , Sulfolobus acidocaldarius/classificação
13.
Environ Microbiol ; 7(1): 47-54, 2005 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15643935

RESUMO

Many Archaea, in contrast to bacteria, produce a high proportion of leaderless transcripts, show a wide variation in their consensus Shine-Dalgarno (S-D) sequences and frequently use GUG and UUG start codons. In order to understand the basis for these differences, 18 complete archaeal genomes were examined for sequence signals that are positionally conserved upstream from genes. These functional motifs include box A promoter sequences for leaderless transcripts and S-D sequences for transcripts with leaders. Most of the box A sequences were preceded by a BRE-like motif and followed by a previously undetected A/T peak centred on position -10. Moreover, the sequence of the predominant S-D motifs in an archaeon is shown to depend on the precise number of nucleotides between the conserved anti-S-D CCUCC sequence and the 3'-terminal nucleotide of 16S RNA. Correlations with phylogenetic trees, constructed for the 18 Archaea, reveal that usage of high levels of both S-D motifs, and GUG and UUG start codons occurs exclusively in the shorter branched Archaea. High levels of leaderless transcripts are found in the longer branched Archaea.


Assuntos
Archaea/genética , Códon de Iniciação , Regulação da Expressão Gênica em Archaea , Biossíntese de Proteínas , Transcrição Gênica , Archaea/classificação , Proteínas Arqueais/química , Proteínas Arqueais/genética , Sequência de Bases , DNA Arqueal/análise , DNA Ribossômico/análise , Genoma Arqueal , Genômica , Dados de Sequência Molecular , Filogenia , Regiões Promotoras Genéticas , RNA Ribossômico 16S/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...