Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 505(7485): 701-5, 2014 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-24336214

RESUMO

RNA has a dual role as an informational molecule and a direct effector of biological tasks. The latter function is enabled by RNA's ability to adopt complex secondary and tertiary folds and thus has motivated extensive computational and experimental efforts for determining RNA structures. Existing approaches for evaluating RNA structure have been largely limited to in vitro systems, yet the thermodynamic forces which drive RNA folding in vitro may not be sufficient to predict stable RNA structures in vivo. Indeed, the presence of RNA-binding proteins and ATP-dependent helicases can influence which structures are present inside cells. Here we present an approach for globally monitoring RNA structure in native conditions in vivo with single-nucleotide precision. This method is based on in vivo modification with dimethyl sulphate (DMS), which reacts with unpaired adenine and cytosine residues, followed by deep sequencing to monitor modifications. Our data from yeast and mammalian cells are in excellent agreement with known messenger RNA structures and with the high-resolution crystal structure of the Saccharomyces cerevisiae ribosome. Comparison between in vivo and in vitro data reveals that in rapidly dividing cells there are vastly fewer structured mRNA regions in vivo than in vitro. Even thermostable RNA structures are often denatured in cells, highlighting the importance of cellular processes in regulating RNA structure. Indeed, analysis of mRNA structure under ATP-depleted conditions in yeast shows that energy-dependent processes strongly contribute to the predominantly unfolded state of mRNAs inside cells. Our studies broadly enable the functional analysis of physiological RNA structures and reveal that, in contrast to the Anfinsen view of protein folding whereby the structure formed is the most thermodynamically favourable, thermodynamics have an incomplete role in determining mRNA structure in vivo.


Assuntos
Genoma Fúngico/genética , Conformação de Ácido Nucleico , Dobramento de RNA , Estabilidade de RNA , RNA Mensageiro/química , RNA Mensageiro/genética , Saccharomyces cerevisiae/genética , Fibroblastos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Células K562 , Desnaturação de Ácido Nucleico , Dobramento de RNA/genética , Estabilidade de RNA/genética , RNA Fúngico/química , RNA Fúngico/genética , RNA Fúngico/metabolismo , RNA Mensageiro/metabolismo , Ésteres do Ácido Sulfúrico/química , Termodinâmica
2.
Genome Res ; 24(4): 616-28, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24429298

RESUMO

Long intergenic noncoding RNAs (lincRNAs) play diverse regulatory roles in human development and disease, but little is known about their evolutionary history and constraint. Here, we characterize human lincRNA expression patterns in nine tissues across six mammalian species and multiple individuals. Of the 1898 human lincRNAs expressed in these tissues, we find orthologous transcripts for 80% in chimpanzee, 63% in rhesus, 39% in cow, 38% in mouse, and 35% in rat. Mammalian-expressed lincRNAs show remarkably strong conservation of tissue specificity, suggesting that it is selectively maintained. In contrast, abundant splice-site turnover suggests that exact splice sites are not critical. Relative to evolutionarily young lincRNAs, mammalian-expressed lincRNAs show higher primary sequence conservation in their promoters and exons, increased proximity to protein-coding genes enriched for tissue-specific functions, fewer repeat elements, and more frequent single-exon transcripts. Remarkably, we find that ∼20% of human lincRNAs are not expressed beyond chimpanzee and are undetectable even in rhesus. These hominid-specific lincRNAs are more tissue specific, enriched for testis, and faster evolving within the human lineage.


Assuntos
Sequência Conservada/genética , Evolução Molecular , Regiões Promotoras Genéticas , RNA Longo não Codificante/genética , Animais , Bovinos , Éxons , Humanos , Camundongos , Especificidade de Órgãos , Ratos
3.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-21993624

RESUMO

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Assuntos
Evolução Molecular , Genoma Humano/genética , Genoma/genética , Mamíferos/genética , Animais , Doença , Éxons/genética , Genômica , Saúde , Humanos , Anotação de Sequência Molecular , Filogenia , RNA/classificação , RNA/genética , Seleção Genética/genética , Alinhamento de Sequência , Análise de Sequência de DNA
4.
Genome Res ; 21(11): 1916-28, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21994248

RESUMO

The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes--especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ∼2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape.


Assuntos
Genoma , Mamíferos/genética , Fases de Leitura Aberta/genética , Seleção Genética , Animais , Composição de Bases , Sequência de Bases , Códon , Códon de Iniciação , Biologia Computacional , Sequência Conservada , Elementos Facilitadores Genéticos , Éxons , Ordem dos Genes , Genes BRCA1 , Proteínas de Homeodomínio/genética , Humanos , MicroRNAs/metabolismo , Dados de Sequência Molecular , Taxa de Mutação , Conformação de Ácido Nucleico , Nucleossomos/metabolismo , Iniciação Traducional da Cadeia Peptídica , Splicing de RNA , Alinhamento de Sequência , Transcrição Gênica
5.
Genome Res ; 21(11): 1929-43, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21994249

RESUMO

Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN ß lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.


Assuntos
Genoma , Genômica , RNA não Traduzido/química , Sequências Reguladoras de Ácido Ribonucleico , Vertebrados/genética , Regiões 3' não Traduzidas , Animais , Sequência de Bases , Sequência Conservada , Regulação da Expressão Gênica , Humanos , Imunidade/genética , Metionina Adenosiltransferase/genética , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Filogenia , Biossíntese de Proteínas , Edição de RNA , Precursores de RNA/metabolismo , Processamento Pós-Transcricional do RNA , Estabilidade de RNA , RNA Mensageiro/metabolismo , RNA de Transferência/química , RNA de Transferência/metabolismo , RNA não Traduzido/genética , Alinhamento de Sequência
6.
Nucleic Acids Res ; 40(10): 4261-72, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22287623

RESUMO

Thermodynamic folding algorithms and structure probing experiments are commonly used to determine the secondary structure of RNAs. Here we propose a formal framework to reconcile information from both prediction algorithms and probing experiments. The thermodynamic energy parameters are adjusted using 'pseudo-energies' to minimize the discrepancy between prediction and experiment. Our framework differs from related approaches that used pseudo-energies in several key aspects. (i) The energy model is only changed when necessary and no adjustments are made if prediction and experiment are consistent. (ii) Pseudo-energies remain biophysically interpretable and hold positional information where experiment and model disagree. (iii) The whole thermodynamic ensemble of structures is considered thus allowing to reconstruct mixtures of suboptimal structures from seemingly contradicting data. (iv) The noise of the energy model and the experimental data is explicitly modeled leading to an intuitive weighting factor through which the problem can be seen as folding with 'soft' constraints of different strength. We present an efficient algorithm to iteratively calculate pseudo-energies within this framework and demonstrate how this approach can be used in combination with SHAPE chemical probing data to improve secondary structure prediction. We further demonstrate that the pseudo-energies correlate with biophysical effects that are known to affect RNA folding such as chemical nucleotide modifications and protein binding.


Assuntos
Algoritmos , Dobramento de RNA , Termodinâmica , Sequência de Bases , Nucleotídeos/química , RNA/química , RNA de Transferência/química , Proteínas de Ligação a RNA/química
7.
RNA ; 17(4): 578-94, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21357752

RESUMO

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.


Assuntos
Código Genético , RNA Mensageiro/genética , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Software , Algoritmos , Animais , Pareamento de Bases , Drosophila melanogaster/genética , Escherichia coli/genética , Espectrometria de Massas , Anotação de Sequência Molecular , Dados de Sequência Molecular , Fases de Leitura Aberta , Peptídeos/genética , RNA não Traduzido/genética
8.
Trends Genet ; 24(12): 583-7, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18951646

RESUMO

Using genome-wide maps of nucleosome positions in yeast, we have analyzed the influence of chromatin structure on the molecular evolution of genomic DNA. We have observed, on average, 10-15% lower substitution rates in linker regions than in nucleosomal DNA. This widespread local rate heterogeneity represents an evolutionary footprint of nucleosome positions and reveals that nucleosome organization is a genomic feature conserved over evolutionary timescales.


Assuntos
Evolução Molecular , Nucleossomos/genética , Saccharomyces cerevisiae/genética , Composição de Bases , Sequência Conservada , DNA Intergênico/genética , Mutação/genética , Fases de Leitura Aberta/genética
9.
Anal Bioanal Chem ; 398(7-8): 2867-81, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20803007

RESUMO

Proteins with molecular weights of <25 kDa are involved in major biological processes such as ribosome formation, stress adaption (e.g., temperature reduction) and cell cycle control. Despite their importance, the coverage of smaller proteins in standard proteome studies is rather sparse. Here we investigated biochemical and mass spectrometric parameters that influence coverage and validity of identification. The underrepresentation of low molecular weight (LMW) proteins may be attributed to the low numbers of proteolytic peptides formed by tryptic digestion as well as their tendency to be lost in protein separation and concentration/desalting procedures. In a systematic investigation of the LMW proteome of Escherichia coli, a total of 455 LMW proteins (27% of the 1672 listed in the SwissProt protein database) were identified, corresponding to a coverage of 62% of the known cytosolic LMW proteins. Of these proteins, 93 had not yet been functionally classified, and five had not previously been confirmed at the protein level. In this study, the influences of protein extraction (either urea or TFA), proteolytic digestion (solely, and the combined usage of trypsin and AspN as endoproteases) and protein separation (gel- or non-gel-based) were investigated. Compared to the standard procedure based solely on the use of urea lysis buffer, in-gel separation and tryptic digestion, the complementary use of TFA for extraction or endoprotease AspN for proteolysis permits the identification of an extra 72 (32%) and 51 proteins (23%), respectively. Regarding mass spectrometry analysis with an LTQ Orbitrap mass spectrometer, collision-induced fragmentation (CID and HCD) and electron transfer dissociation using the linear ion trap (IT) or the Orbitrap as the analyzer were compared. IT-CID was found to yield the best identification rate, whereas IT-ETD provided almost comparable results in terms of LMW proteome coverage. The high overlap between the proteins identified with IT-CID and IT-ETD allowed the validation of 75% of the identified proteins using this orthogonal fragmentation technique. Furthermore, a new approach to evaluating and improving the completeness of protein databases that utilizes the program RNAcode was introduced and examined.


Assuntos
Cromatografia Líquida/métodos , Escherichia coli K12/química , Proteínas de Escherichia coli/isolamento & purificação , Espectrometria de Massas por Ionização por Electrospray/métodos , Espectrometria de Massas em Tandem/métodos , Proteínas de Escherichia coli/análise , Peso Molecular
10.
Nucleic Acids Res ; 35(Web Server issue): W335-8, 2007 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-17452347

RESUMO

Many non-coding RNA genes and cis-acting regulatory elements of mRNAs contain RNA secondary structures that are critical for their function. Such functional RNAs can be predicted on the basis of thermodynamic stability and evolutionary conservation. We present a web server that uses the RNAz algorithm to detect functional RNA structures in multiple alignments of nucleotide sequences. The server provides access to a complete and fully automatic analysis pipeline that allows not only to analyze single alignments in a variety of formats, but also to conduct complex screens of large genomic regions. Results are presented on a website that is illustrated by various structure representations and can be downloaded for local view. The web server is available at: rna.tbi.univie.ac.at/RNAz.


Assuntos
Algoritmos , Biologia Computacional/métodos , Internet , RNA/química , Sequências Reguladoras de Ácido Ribonucleico , Análise de Sequência de RNA/métodos , Sequência de Bases , Sequência Conservada , Evolução Molecular , Genoma , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Alinhamento de Sequência , Software , Termodinâmica
11.
BMC Bioinformatics ; 9: 248, 2008 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-18505553

RESUMO

BACKGROUND: Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. RESULTS: We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. CONCLUSION: SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. AVAILABILITY: SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.


Assuntos
Biologia Computacional/métodos , RNA não Traduzido/química , Análise de Sequência de RNA/métodos , Algoritmos , Animais , Composição de Bases , Humanos , Cadeias de Markov , Alinhamento de Sequência , Homologia de Sequência do Ácido Nucleico , Software
12.
BMC Bioinformatics ; 9: 122, 2008 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-18302738

RESUMO

BACKGROUND: Evolutionary conservation of RNA secondary structure is a typical feature of many functional non-coding RNAs. Since almost all of the available methods used for prediction and annotation of non-coding RNA genes rely on this evolutionary signature, accurate measures for structural conservation are essential. RESULTS: We systematically assessed the ability of various measures to detect conserved RNA structures in multiple sequence alignments. We tested three existing and eight novel strategies that are based on metrics of folding energies, metrics of single optimal structure predictions, and metrics of structure ensembles. We find that the folding energy based SCI score used in the RNAz program and a simple base-pair distance metric are by far the most accurate. The use of more complex metrics like for example tree editing does not improve performance. A variant of the SCI performed particularly well on highly conserved alignments and is thus a viable alternative when only little evolutionary information is available. Surprisingly, ensemble based methods that, in principle, could benefit from the additional information contained in sub-optimal structures, perform particularly poorly. As a general trend, we observed that methods that include a consensus structure prediction outperformed equivalent methods that only consider pairwise comparisons. CONCLUSION: Structural conservation can be measured accurately with relatively simple and intuitive metrics. They have the potential to form the basis of future RNA gene finders, that face new challenges like finding lineage specific structures or detecting mis-aligned sequences.


Assuntos
Algoritmos , Sequência Conservada/genética , Evolução Molecular , RNA/genética , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Pareamento Incorreto de Bases , Sequência de Bases , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico
13.
Nat Biotechnol ; 23(11): 1383-90, 2005 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-16273071

RESUMO

In contrast to the fairly reliable and complete annotation of the protein coding genes in the human genome, comparable information is lacking for noncoding RNAs (ncRNAs). We present a comparative screen of vertebrate genomes for structural noncoding RNAs, which evaluates conserved genomic DNA sequences for signatures of structural conservation of base-pairing patterns and exceptional thermodynamic stability. We predict more than 30,000 structured RNA elements in the human genome, almost 1,000 of which are conserved across all vertebrates. Roughly a third are found in introns of known genes, a sixth are potential regulatory elements in untranslated regions of protein-coding mRNAs and about half are located far away from any known gene. Only a small fraction of these sequences has been described previously. A comparison with recent tiling array data shows that more than 40% of the predicted structured RNAs overlap with experimentally detected sites of transcription. The widespread conservation of secondary structure points to a large number of functional ncRNAs and cis-acting mRNA structures in the human genome.


Assuntos
Genoma Humano , Conformação de Ácido Nucleico , RNA não Traduzido/química , Animais , Pareamento de Bases , Sequência de Bases , Mapeamento Cromossômico , Biologia Computacional/métodos , Sequência Conservada , Humanos , Íntrons , Modelos Estatísticos , Filogenia , RNA/química , RNA Mensageiro/metabolismo , Elementos Reguladores de Transcrição , Sensibilidade e Especificidade , Análise de Sequência de DNA , Termodinâmica , Transcrição Gênica
14.
Nucleic Acids Res ; 34(Database issue): D135-9, 2006 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-16381831

RESUMO

Recent work has demonstrated that microRNAs (miRNAs) are involved in critical biological processes by suppressing the translation of coding genes. This work develops an integrated database, miRNAMap, to store the known miRNA genes, the putative miRNA genes, the known miRNA targets and the putative miRNA targets. The known miRNA genes in four mammalian genomes such as human, mouse, rat and dog are obtained from miRBase, and experimentally validated miRNA targets are identified in a survey of the literature. Putative miRNA precursors were identified by RNAz, which is a non-coding RNA prediction tool based on comparative sequence analysis. The mature miRNA of the putative miRNA genes is accurately determined using a machine learning approach, mmiRNA. Then, miRanda was applied to predict the miRNA targets within the conserved regions in 3'-UTR of the genes in the four mammalian genomes. The miRNAMap also provides the expression profiles of the known miRNAs, cross-species comparisons, gene annotations and cross-links to other biological databases. Both textual and graphical web interface are provided to facilitate the retrieval of data from the miRNAMap. The database is freely available at http://mirnamap.mbc.nctu.edu.tw/.


Assuntos
Bases de Dados de Ácidos Nucleicos , Regulação da Expressão Gênica , MicroRNAs/genética , MicroRNAs/fisiologia , Animais , Mapeamento Cromossômico , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Cães , Genoma , Genômica , Humanos , Internet , Camundongos , MicroRNAs/química , Precursores de RNA/química , Ratos , Interface Usuário-Computador
15.
BMC Genomics ; 8: 406, 2007 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-17996037

RESUMO

BACKGROUND: Recent experimental and computational studies have provided overwhelming evidence for a plethora of diverse transcripts that are unrelated to protein-coding genes. One subclass consists of those RNAs that require distinctive secondary structure motifs to exert their biological function and hence exhibit distinctive patterns of sequence conservation characteristic for positive selection on RNA secondary structure. The deep-sequencing of 12 drosophilid species coordinated by the NHGRI provides an ideal data set of comparative computational approaches to determine those genomic loci that code for evolutionarily conserved RNA motifs. This class of loci includes the majority of the known small ncRNAs as well as structured RNA motifs in mRNAs. We report here on a genome-wide survey using RNAz. RESULTS: We obtain 16 000 high quality predictions among which we recover the majority of the known ncRNAs. Taking a pessimistically estimated false discovery rate of 40% into account, this implies that at least some ten thousand loci in the Drosophila genome show the hallmarks of stabilizing selection action of RNA structure, and hence are most likely functional at the RNA level. A subset of RNAz predictions overlapping with TRF1 and BRF binding sites [Isogai et al., EMBO J. 26: 79-89 (2007)], which are plausible candidates of Pol III transcripts, have been studied in more detail. Among these sequences we identify several "clusters" of ncRNA candidates with striking structural similarities. CONCLUSION: The statistical evaluation of the RNAz predictions in comparison with a similar analysis of vertebrate genomes [Washietl et al., Nat. Biotech. 23: 1383-1390 (2005)] shows that qualitatively similar fractions of structured RNAs are found in introns, UTRs, and intergenic regions. The intergenic RNA structures, however, are concentrated much more closely around known protein-coding loci, suggesting that flies have significantly smaller complement of independent structured ncRNAs compared to mammals.


Assuntos
Drosophila melanogaster/genética , RNA/genética , Animais , Humanos , Conformação de Ácido Nucleico , Filogenia , RNA/química , Sensibilidade e Especificidade
16.
Methods Mol Biol ; 395: 503-26, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17993695

RESUMO

The function of many noncoding RNAs (ncRNAs) depend on a defined secondary structure. RNAz detects evolutionarily conserved and thermodynamically stable RNA secondary structures in multiple sequence alignments and, thus, efficiently filters for candidate ncRNAs. In this chapter, we provide a step-by-step guide on how to use RNAz. Starting with basic concepts, we also cover advanced analysis techniques and, as an example for a large scale application, demonstrate a complete screen of the Saccharomyces cerevisiae genome.


Assuntos
Conformação de Ácido Nucleico , RNA não Traduzido/química , Sequência de Bases , Homologia de Sequência do Ácido Nucleico , Termodinâmica
17.
Nucleic Acids Res ; 33(8): 2433-9, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15860779

RESUMO

To date, few attempts have been made to benchmark the alignment algorithms upon nucleic acid sequences. Frequently, sophisticated PAM or BLOSUM like models are used to align proteins, yet equivalents are not considered for nucleic acids; instead, rather ad hoc models are generally favoured. Here, we systematically test the performance of existing alignment algorithms on structural RNAs. This work was aimed at achieving the following goals: (i) to determine conditions where it is appropriate to apply common sequence alignment methods to the structural RNA alignment problem. This indicates where and when researchers should consider augmenting the alignment process with auxiliary information, such as secondary structure and (ii) to determine which sequence alignment algorithms perform well under the broadest range of conditions. We find that sequence alignment alone, using the current algorithms, is generally inappropriate <50-60% sequence identity. Second, we note that the probabilistic method ProAlign and the aging Clustal algorithms generally outperform other sequence-based algorithms, under the broadest range of applications.


Assuntos
Algoritmos , RNA/química , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Conformação de Ácido Nucleico , RNA não Traduzido/química , Reprodutibilidade dos Testes
18.
J Mol Biol ; 342(1): 19-30, 2004 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-15313604

RESUMO

Facing the ever-growing list of newly discovered classes of functional RNAs, it can be expected that further types of functional RNAs are still hidden in recently completed genomes. The computational identification of such RNA genes is, therefore, of major importance. While most known functional RNAs have characteristic secondary structures, their free energies are generally not statistically significant enough to distinguish RNA genes from the genomic background. Additional information is required. Considering the wide availability of new genomic data of closely related species, comparative studies seem to be the most promising approach. Here, we show that prediction of consensus structures of aligned sequences can be a significant measure to detect functional RNAs. We report a new method to test multiple sequence alignments for the existence of an unusually structured and conserved fold. We show for alignments of six types of well-known functional RNA that an energy score consisting of free energy and a covariation term significantly improves sensitivity compared to single sequence predictions. We further test our method on a number of non-coding RNAs from Caenorhabditis elegans/Caenorhabditis briggsae and seven Saccharomyces species. Most RNAs can be detected with high significance. We provide a Perl implementation that can be used readily to score single alignments and discuss how the methods described here can be extended to allow for efficient genome-wide screens.


Assuntos
Sequência de Bases , Genômica , Conformação de Ácido Nucleico , RNA não Traduzido , Alinhamento de Sequência , Algoritmos , Animais , Caenorhabditis/genética , Genoma , Dados de Sequência Molecular , RNA não Traduzido/química , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Distribuição Aleatória , Saccharomyces/genética
19.
Theory Biosci ; 123(4): 301-69, 2005 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-18202870

RESUMO

A plethora of new functions of non-coding RNAs (ncRNAs) have been discovered in past few years. In fact, RNA is emerging as the central player in cellular regulation, taking on active roles in multiple regulatory layers from transcription, RNA maturation, and RNA modification to translational regulation. Nevertheless, very little is known about the evolution of this "Modern RNA World" and its components. In this contribution, we attempt to provide at least a cursory overview of the diversity of ncRNAs and functional RNA motifs in non-translated regions of regular messenger RNAs (mRNAs) with an emphasis on evolutionary questions. This survey is complemented by an in-depth analysis of examples from different classes of RNAs focusing mostly on their evolution in the vertebrate lineage. We present a survey of Y RNA genes in vertebrates and study the molecular evolution of the U7 snRNA, the snoRNAs E1/U17, E2, and E3, the Y RNA family, the let-7 microRNA (miRNA) family, and the mRNA-like evf-1 gene. We furthermore discuss the statistical distribution of miRNAs in metazoans, which suggests an explosive increase in the miRNA repertoire in vertebrates. The analysis of the transcription of ncRNAs suggests that small RNAs in general are genetically mobile in the sense that their association with a hostgene (e.g. when transcribed from introns of a mRNA) can change on evolutionary time scales. The let-7 family demonstrates, that even the mode of transcription (as intron or as exon) can change among paralogous ncRNA.

20.
BMC Bioinformatics ; 4: 55, 2003 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-14604445

RESUMO

BACKGROUND: The genome of the avian adenovirus Chicken Embryo Lethal Orphan (CELO) has two terminal regions without detectable homology in mammalian adenoviruses that are left without annotation in the initial analysis. Since adenoviruses have been a rich source of new insights into molecular cell biology and practical applications of CELO as gene a delivery vector are being considered, this genome appeared worth revisiting. We conducted a systematic reannotation and in-depth sequence analysis of the CELO genome. RESULTS: We describe a strongly diverged paralogous cluster including ORF-2, ORF-12, ORF-13, and ORF-14 with an ATPase/helicase domain most likely acquired from adeno-associated parvoviruses. None of these ORFs appear to have retained ATPase/helicase function and alternative functions (e.g. modulation of gene expression during the early life-cycle) must be considered in an adenoviral context. Further, we identified a cluster of three putative type-1-transmembrane glycoproteins with IG-like domains (ORF-9, ORF-10, ORF-11) which are good candidates to substitute for the missing immunomodulatory functions of mammalian adenoviruses. ORF-16 (located directly adjacent) displays distant homology to vertebrate mono-ADP-ribosyltransferases. Members of this family are known to be involved in immuno-regulation and similiar functions during CELO life cycle can be considered for this ORF. Finally, we describe a putative triglyceride lipase (merged ORF-18/19) with additional domains, which can be expected to have specific roles during the infection of birds, since they are unique to avian adenoviruses and Marek's disease-like viruses, a group of pathogenic avian herpesviruses. CONCLUSIONS: We could characterize most of the previously unassigned ORFs pointing to functions in host-virus interaction. The results provide new directives for rationally designed experiments.


Assuntos
Adenovirus A das Aves/genética , Adenovirus A das Aves/patogenicidade , Genes Virais/fisiologia , Genoma Viral , Fases de Leitura Aberta/fisiologia , Proteínas Estruturais Virais/genética , ADP Ribose Transferases/genética , Adenosina Trifosfatases/fisiologia , Infecções por Adenoviridae/genética , Alphaherpesvirinae/enzimologia , Alphaherpesvirinae/genética , Sequência de Aminoácidos , Sequência Conservada/fisiologia , DNA Helicases/fisiologia , Adenovirus A das Aves/enzimologia , Glicoproteínas/fisiologia , Imunoglobulinas/fisiologia , Lipase/fisiologia , Proteínas de Membrana/fisiologia , Dados de Sequência Molecular , Peptídeos/fisiologia , Estrutura Terciária de Proteína/fisiologia , Alinhamento de Sequência/métodos , Homologia de Sequência de Aminoácidos , Proteínas Virais/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA