Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Nature ; 505(7485): 701-5, 2014 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-24336214

RESUMEN

RNA has a dual role as an informational molecule and a direct effector of biological tasks. The latter function is enabled by RNA's ability to adopt complex secondary and tertiary folds and thus has motivated extensive computational and experimental efforts for determining RNA structures. Existing approaches for evaluating RNA structure have been largely limited to in vitro systems, yet the thermodynamic forces which drive RNA folding in vitro may not be sufficient to predict stable RNA structures in vivo. Indeed, the presence of RNA-binding proteins and ATP-dependent helicases can influence which structures are present inside cells. Here we present an approach for globally monitoring RNA structure in native conditions in vivo with single-nucleotide precision. This method is based on in vivo modification with dimethyl sulphate (DMS), which reacts with unpaired adenine and cytosine residues, followed by deep sequencing to monitor modifications. Our data from yeast and mammalian cells are in excellent agreement with known messenger RNA structures and with the high-resolution crystal structure of the Saccharomyces cerevisiae ribosome. Comparison between in vivo and in vitro data reveals that in rapidly dividing cells there are vastly fewer structured mRNA regions in vivo than in vitro. Even thermostable RNA structures are often denatured in cells, highlighting the importance of cellular processes in regulating RNA structure. Indeed, analysis of mRNA structure under ATP-depleted conditions in yeast shows that energy-dependent processes strongly contribute to the predominantly unfolded state of mRNAs inside cells. Our studies broadly enable the functional analysis of physiological RNA structures and reveal that, in contrast to the Anfinsen view of protein folding whereby the structure formed is the most thermodynamically favourable, thermodynamics have an incomplete role in determining mRNA structure in vivo.


Asunto(s)
Genoma Fúngico/genética , Conformación de Ácido Nucleico , Pliegue del ARN , Estabilidad del ARN , ARN Mensajero/química , ARN Mensajero/genética , Saccharomyces cerevisiae/genética , Fibroblastos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Células K562 , Desnaturalización de Ácido Nucleico , Pliegue del ARN/genética , Estabilidad del ARN/genética , ARN de Hongos/química , ARN de Hongos/genética , ARN de Hongos/metabolismo , ARN Mensajero/metabolismo , Ésteres del Ácido Sulfúrico/química , Termodinámica
2.
Genome Res ; 24(4): 616-28, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24429298

RESUMEN

Long intergenic noncoding RNAs (lincRNAs) play diverse regulatory roles in human development and disease, but little is known about their evolutionary history and constraint. Here, we characterize human lincRNA expression patterns in nine tissues across six mammalian species and multiple individuals. Of the 1898 human lincRNAs expressed in these tissues, we find orthologous transcripts for 80% in chimpanzee, 63% in rhesus, 39% in cow, 38% in mouse, and 35% in rat. Mammalian-expressed lincRNAs show remarkably strong conservation of tissue specificity, suggesting that it is selectively maintained. In contrast, abundant splice-site turnover suggests that exact splice sites are not critical. Relative to evolutionarily young lincRNAs, mammalian-expressed lincRNAs show higher primary sequence conservation in their promoters and exons, increased proximity to protein-coding genes enriched for tissue-specific functions, fewer repeat elements, and more frequent single-exon transcripts. Remarkably, we find that ∼20% of human lincRNAs are not expressed beyond chimpanzee and are undetectable even in rhesus. These hominid-specific lincRNAs are more tissue specific, enriched for testis, and faster evolving within the human lineage.


Asunto(s)
Secuencia Conservada/genética , Evolución Molecular , Regiones Promotoras Genéticas , ARN Largo no Codificante/genética , Animales , Bovinos , Exones , Humanos , Ratones , Especificidad de Órganos , Ratas
3.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-21993624

RESUMEN

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Asunto(s)
Evolución Molecular , Genoma Humano/genética , Genoma/genética , Mamíferos/genética , Animales , Enfermedad , Exones/genética , Genómica , Salud , Humanos , Anotación de Secuencia Molecular , Filogenia , ARN/clasificación , ARN/genética , Selección Genética/genética , Alineación de Secuencia , Análisis de Secuencia de ADN
4.
Genome Res ; 21(11): 1916-28, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21994248

RESUMEN

The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes--especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ∼2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape.


Asunto(s)
Genoma , Mamíferos/genética , Sistemas de Lectura Abierta/genética , Selección Genética , Animales , Composición de Base , Secuencia de Bases , Codón , Codón Iniciador , Biología Computacional , Secuencia Conservada , Elementos de Facilitación Genéticos , Exones , Orden Génico , Genes BRCA1 , Proteínas de Homeodominio/genética , Humanos , MicroARNs/metabolismo , Datos de Secuencia Molecular , Tasa de Mutación , Conformación de Ácido Nucleico , Nucleosomas/metabolismo , Iniciación de la Cadena Peptídica Traduccional , Empalme del ARN , Alineación de Secuencia , Transcripción Genética
5.
Genome Res ; 21(11): 1929-43, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21994249

RESUMEN

Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN ß lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.


Asunto(s)
Genoma , Genómica , ARN no Traducido/química , Secuencias Reguladoras de Ácido Ribonucleico , Vertebrados/genética , Regiones no Traducidas 3' , Animales , Secuencia de Bases , Secuencia Conservada , Regulación de la Expresión Génica , Humanos , Inmunidad/genética , Metionina Adenosiltransferasa/genética , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Filogenia , Biosíntesis de Proteínas , Edición de ARN , Precursores del ARN/metabolismo , Procesamiento Postranscripcional del ARN , Estabilidad del ARN , ARN Mensajero/metabolismo , ARN de Transferencia/química , ARN de Transferencia/metabolismo , ARN no Traducido/genética , Alineación de Secuencia
6.
Nucleic Acids Res ; 40(10): 4261-72, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-22287623

RESUMEN

Thermodynamic folding algorithms and structure probing experiments are commonly used to determine the secondary structure of RNAs. Here we propose a formal framework to reconcile information from both prediction algorithms and probing experiments. The thermodynamic energy parameters are adjusted using 'pseudo-energies' to minimize the discrepancy between prediction and experiment. Our framework differs from related approaches that used pseudo-energies in several key aspects. (i) The energy model is only changed when necessary and no adjustments are made if prediction and experiment are consistent. (ii) Pseudo-energies remain biophysically interpretable and hold positional information where experiment and model disagree. (iii) The whole thermodynamic ensemble of structures is considered thus allowing to reconstruct mixtures of suboptimal structures from seemingly contradicting data. (iv) The noise of the energy model and the experimental data is explicitly modeled leading to an intuitive weighting factor through which the problem can be seen as folding with 'soft' constraints of different strength. We present an efficient algorithm to iteratively calculate pseudo-energies within this framework and demonstrate how this approach can be used in combination with SHAPE chemical probing data to improve secondary structure prediction. We further demonstrate that the pseudo-energies correlate with biophysical effects that are known to affect RNA folding such as chemical nucleotide modifications and protein binding.


Asunto(s)
Algoritmos , Pliegue del ARN , Termodinámica , Secuencia de Bases , Nucleótidos/química , ARN/química , ARN de Transferencia/química , Proteínas de Unión al ARN/química
7.
RNA ; 17(4): 578-94, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21357752

RESUMEN

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.


Asunto(s)
Código Genético , ARN Mensajero/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Animales , Emparejamiento Base , Drosophila melanogaster/genética , Escherichia coli/genética , Espectrometría de Masas , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Sistemas de Lectura Abierta , Péptidos/genética , ARN no Traducido/genética
8.
Trends Genet ; 24(12): 583-7, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-18951646

RESUMEN

Using genome-wide maps of nucleosome positions in yeast, we have analyzed the influence of chromatin structure on the molecular evolution of genomic DNA. We have observed, on average, 10-15% lower substitution rates in linker regions than in nucleosomal DNA. This widespread local rate heterogeneity represents an evolutionary footprint of nucleosome positions and reveals that nucleosome organization is a genomic feature conserved over evolutionary timescales.


Asunto(s)
Evolución Molecular , Nucleosomas/genética , Saccharomyces cerevisiae/genética , Composición de Base , Secuencia Conservada , ADN Intergénico/genética , Mutación/genética , Sistemas de Lectura Abierta/genética
9.
Anal Bioanal Chem ; 398(7-8): 2867-81, 2010 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-20803007

RESUMEN

Proteins with molecular weights of <25 kDa are involved in major biological processes such as ribosome formation, stress adaption (e.g., temperature reduction) and cell cycle control. Despite their importance, the coverage of smaller proteins in standard proteome studies is rather sparse. Here we investigated biochemical and mass spectrometric parameters that influence coverage and validity of identification. The underrepresentation of low molecular weight (LMW) proteins may be attributed to the low numbers of proteolytic peptides formed by tryptic digestion as well as their tendency to be lost in protein separation and concentration/desalting procedures. In a systematic investigation of the LMW proteome of Escherichia coli, a total of 455 LMW proteins (27% of the 1672 listed in the SwissProt protein database) were identified, corresponding to a coverage of 62% of the known cytosolic LMW proteins. Of these proteins, 93 had not yet been functionally classified, and five had not previously been confirmed at the protein level. In this study, the influences of protein extraction (either urea or TFA), proteolytic digestion (solely, and the combined usage of trypsin and AspN as endoproteases) and protein separation (gel- or non-gel-based) were investigated. Compared to the standard procedure based solely on the use of urea lysis buffer, in-gel separation and tryptic digestion, the complementary use of TFA for extraction or endoprotease AspN for proteolysis permits the identification of an extra 72 (32%) and 51 proteins (23%), respectively. Regarding mass spectrometry analysis with an LTQ Orbitrap mass spectrometer, collision-induced fragmentation (CID and HCD) and electron transfer dissociation using the linear ion trap (IT) or the Orbitrap as the analyzer were compared. IT-CID was found to yield the best identification rate, whereas IT-ETD provided almost comparable results in terms of LMW proteome coverage. The high overlap between the proteins identified with IT-CID and IT-ETD allowed the validation of 75% of the identified proteins using this orthogonal fragmentation technique. Furthermore, a new approach to evaluating and improving the completeness of protein databases that utilizes the program RNAcode was introduced and examined.


Asunto(s)
Cromatografía Liquida/métodos , Escherichia coli K12/química , Proteínas de Escherichia coli/aislamiento & purificación , Espectrometría de Masa por Ionización de Electrospray/métodos , Espectrometría de Masas en Tándem/métodos , Proteínas de Escherichia coli/análisis , Peso Molecular
10.
Nucleic Acids Res ; 35(Web Server issue): W335-8, 2007 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-17452347

RESUMEN

Many non-coding RNA genes and cis-acting regulatory elements of mRNAs contain RNA secondary structures that are critical for their function. Such functional RNAs can be predicted on the basis of thermodynamic stability and evolutionary conservation. We present a web server that uses the RNAz algorithm to detect functional RNA structures in multiple alignments of nucleotide sequences. The server provides access to a complete and fully automatic analysis pipeline that allows not only to analyze single alignments in a variety of formats, but also to conduct complex screens of large genomic regions. Results are presented on a website that is illustrated by various structure representations and can be downloaded for local view. The web server is available at: rna.tbi.univie.ac.at/RNAz.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Internet , ARN/química , Secuencias Reguladoras de Ácido Ribonucleico , Análisis de Secuencia de ARN/métodos , Secuencia de Bases , Secuencia Conservada , Evolución Molecular , Genoma , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Alineación de Secuencia , Programas Informáticos , Termodinámica
11.
BMC Bioinformatics ; 9: 248, 2008 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-18505553

RESUMEN

BACKGROUND: Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. RESULTS: We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. CONCLUSION: SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. AVAILABILITY: SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.


Asunto(s)
Biología Computacional/métodos , ARN no Traducido/química , Análisis de Secuencia de ARN/métodos , Algoritmos , Animales , Composición de Base , Humanos , Cadenas de Markov , Alineación de Secuencia , Homología de Secuencia de Ácido Nucleico , Programas Informáticos
12.
BMC Bioinformatics ; 9: 122, 2008 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-18302738

RESUMEN

BACKGROUND: Evolutionary conservation of RNA secondary structure is a typical feature of many functional non-coding RNAs. Since almost all of the available methods used for prediction and annotation of non-coding RNA genes rely on this evolutionary signature, accurate measures for structural conservation are essential. RESULTS: We systematically assessed the ability of various measures to detect conserved RNA structures in multiple sequence alignments. We tested three existing and eight novel strategies that are based on metrics of folding energies, metrics of single optimal structure predictions, and metrics of structure ensembles. We find that the folding energy based SCI score used in the RNAz program and a simple base-pair distance metric are by far the most accurate. The use of more complex metrics like for example tree editing does not improve performance. A variant of the SCI performed particularly well on highly conserved alignments and is thus a viable alternative when only little evolutionary information is available. Surprisingly, ensemble based methods that, in principle, could benefit from the additional information contained in sub-optimal structures, perform particularly poorly. As a general trend, we observed that methods that include a consensus structure prediction outperformed equivalent methods that only consider pairwise comparisons. CONCLUSION: Structural conservation can be measured accurately with relatively simple and intuitive metrics. They have the potential to form the basis of future RNA gene finders, that face new challenges like finding lineage specific structures or detecting mis-aligned sequences.


Asunto(s)
Algoritmos , Secuencia Conservada/genética , Evolución Molecular , ARN/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Disparidad de Par Base , Secuencia de Bases , Datos de Secuencia Molecular , Homología de Secuencia de Ácido Nucleico
13.
Nat Biotechnol ; 23(11): 1383-90, 2005 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-16273071

RESUMEN

In contrast to the fairly reliable and complete annotation of the protein coding genes in the human genome, comparable information is lacking for noncoding RNAs (ncRNAs). We present a comparative screen of vertebrate genomes for structural noncoding RNAs, which evaluates conserved genomic DNA sequences for signatures of structural conservation of base-pairing patterns and exceptional thermodynamic stability. We predict more than 30,000 structured RNA elements in the human genome, almost 1,000 of which are conserved across all vertebrates. Roughly a third are found in introns of known genes, a sixth are potential regulatory elements in untranslated regions of protein-coding mRNAs and about half are located far away from any known gene. Only a small fraction of these sequences has been described previously. A comparison with recent tiling array data shows that more than 40% of the predicted structured RNAs overlap with experimentally detected sites of transcription. The widespread conservation of secondary structure points to a large number of functional ncRNAs and cis-acting mRNA structures in the human genome.


Asunto(s)
Genoma Humano , Conformación de Ácido Nucleico , ARN no Traducido/química , Animales , Emparejamiento Base , Secuencia de Bases , Mapeo Cromosómico , Biología Computacional/métodos , Secuencia Conservada , Humanos , Intrones , Modelos Estadísticos , Filogenia , ARN/química , ARN Mensajero/metabolismo , Elementos Reguladores de la Transcripción , Sensibilidad y Especificidad , Análisis de Secuencia de ADN , Termodinámica , Transcripción Genética
14.
Nucleic Acids Res ; 34(Database issue): D135-9, 2006 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-16381831

RESUMEN

Recent work has demonstrated that microRNAs (miRNAs) are involved in critical biological processes by suppressing the translation of coding genes. This work develops an integrated database, miRNAMap, to store the known miRNA genes, the putative miRNA genes, the known miRNA targets and the putative miRNA targets. The known miRNA genes in four mammalian genomes such as human, mouse, rat and dog are obtained from miRBase, and experimentally validated miRNA targets are identified in a survey of the literature. Putative miRNA precursors were identified by RNAz, which is a non-coding RNA prediction tool based on comparative sequence analysis. The mature miRNA of the putative miRNA genes is accurately determined using a machine learning approach, mmiRNA. Then, miRanda was applied to predict the miRNA targets within the conserved regions in 3'-UTR of the genes in the four mammalian genomes. The miRNAMap also provides the expression profiles of the known miRNAs, cross-species comparisons, gene annotations and cross-links to other biological databases. Both textual and graphical web interface are provided to facilitate the retrieval of data from the miRNAMap. The database is freely available at http://mirnamap.mbc.nctu.edu.tw/.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Regulación de la Expresión Génica , MicroARNs/genética , MicroARNs/fisiología , Animales , Mapeo Cromosómico , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Perros , Genoma , Genómica , Humanos , Internet , Ratones , MicroARNs/química , Precursores del ARN/química , Ratas , Interfaz Usuario-Computador
15.
BMC Genomics ; 8: 406, 2007 Nov 08.
Artículo en Inglés | MEDLINE | ID: mdl-17996037

RESUMEN

BACKGROUND: Recent experimental and computational studies have provided overwhelming evidence for a plethora of diverse transcripts that are unrelated to protein-coding genes. One subclass consists of those RNAs that require distinctive secondary structure motifs to exert their biological function and hence exhibit distinctive patterns of sequence conservation characteristic for positive selection on RNA secondary structure. The deep-sequencing of 12 drosophilid species coordinated by the NHGRI provides an ideal data set of comparative computational approaches to determine those genomic loci that code for evolutionarily conserved RNA motifs. This class of loci includes the majority of the known small ncRNAs as well as structured RNA motifs in mRNAs. We report here on a genome-wide survey using RNAz. RESULTS: We obtain 16 000 high quality predictions among which we recover the majority of the known ncRNAs. Taking a pessimistically estimated false discovery rate of 40% into account, this implies that at least some ten thousand loci in the Drosophila genome show the hallmarks of stabilizing selection action of RNA structure, and hence are most likely functional at the RNA level. A subset of RNAz predictions overlapping with TRF1 and BRF binding sites [Isogai et al., EMBO J. 26: 79-89 (2007)], which are plausible candidates of Pol III transcripts, have been studied in more detail. Among these sequences we identify several "clusters" of ncRNA candidates with striking structural similarities. CONCLUSION: The statistical evaluation of the RNAz predictions in comparison with a similar analysis of vertebrate genomes [Washietl et al., Nat. Biotech. 23: 1383-1390 (2005)] shows that qualitatively similar fractions of structured RNAs are found in introns, UTRs, and intergenic regions. The intergenic RNA structures, however, are concentrated much more closely around known protein-coding loci, suggesting that flies have significantly smaller complement of independent structured ncRNAs compared to mammals.


Asunto(s)
Drosophila melanogaster/genética , ARN/genética , Animales , Humanos , Conformación de Ácido Nucleico , Filogenia , ARN/química , Sensibilidad y Especificidad
16.
Methods Mol Biol ; 395: 503-26, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17993695

RESUMEN

The function of many noncoding RNAs (ncRNAs) depend on a defined secondary structure. RNAz detects evolutionarily conserved and thermodynamically stable RNA secondary structures in multiple sequence alignments and, thus, efficiently filters for candidate ncRNAs. In this chapter, we provide a step-by-step guide on how to use RNAz. Starting with basic concepts, we also cover advanced analysis techniques and, as an example for a large scale application, demonstrate a complete screen of the Saccharomyces cerevisiae genome.


Asunto(s)
Conformación de Ácido Nucleico , ARN no Traducido/química , Secuencia de Bases , Homología de Secuencia de Ácido Nucleico , Termodinámica
17.
Nucleic Acids Res ; 33(8): 2433-9, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-15860779

RESUMEN

To date, few attempts have been made to benchmark the alignment algorithms upon nucleic acid sequences. Frequently, sophisticated PAM or BLOSUM like models are used to align proteins, yet equivalents are not considered for nucleic acids; instead, rather ad hoc models are generally favoured. Here, we systematically test the performance of existing alignment algorithms on structural RNAs. This work was aimed at achieving the following goals: (i) to determine conditions where it is appropriate to apply common sequence alignment methods to the structural RNA alignment problem. This indicates where and when researchers should consider augmenting the alignment process with auxiliary information, such as secondary structure and (ii) to determine which sequence alignment algorithms perform well under the broadest range of conditions. We find that sequence alignment alone, using the current algorithms, is generally inappropriate <50-60% sequence identity. Second, we note that the probabilistic method ProAlign and the aging Clustal algorithms generally outperform other sequence-based algorithms, under the broadest range of applications.


Asunto(s)
Algoritmos , ARN/química , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Conformación de Ácido Nucleico , ARN no Traducido/química , Reproducibilidad de los Resultados
18.
J Mol Biol ; 342(1): 19-30, 2004 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-15313604

RESUMEN

Facing the ever-growing list of newly discovered classes of functional RNAs, it can be expected that further types of functional RNAs are still hidden in recently completed genomes. The computational identification of such RNA genes is, therefore, of major importance. While most known functional RNAs have characteristic secondary structures, their free energies are generally not statistically significant enough to distinguish RNA genes from the genomic background. Additional information is required. Considering the wide availability of new genomic data of closely related species, comparative studies seem to be the most promising approach. Here, we show that prediction of consensus structures of aligned sequences can be a significant measure to detect functional RNAs. We report a new method to test multiple sequence alignments for the existence of an unusually structured and conserved fold. We show for alignments of six types of well-known functional RNA that an energy score consisting of free energy and a covariation term significantly improves sensitivity compared to single sequence predictions. We further test our method on a number of non-coding RNAs from Caenorhabditis elegans/Caenorhabditis briggsae and seven Saccharomyces species. Most RNAs can be detected with high significance. We provide a Perl implementation that can be used readily to score single alignments and discuss how the methods described here can be extended to allow for efficient genome-wide screens.


Asunto(s)
Secuencia de Bases , Genómica , Conformación de Ácido Nucleico , ARN no Traducido , Alineación de Secuencia , Algoritmos , Animales , Caenorhabditis/genética , Genoma , Datos de Secuencia Molecular , ARN no Traducido/química , ARN no Traducido/genética , ARN no Traducido/metabolismo , Distribución Aleatoria , Saccharomyces/genética
19.
Theory Biosci ; 123(4): 301-69, 2005 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-18202870

RESUMEN

A plethora of new functions of non-coding RNAs (ncRNAs) have been discovered in past few years. In fact, RNA is emerging as the central player in cellular regulation, taking on active roles in multiple regulatory layers from transcription, RNA maturation, and RNA modification to translational regulation. Nevertheless, very little is known about the evolution of this "Modern RNA World" and its components. In this contribution, we attempt to provide at least a cursory overview of the diversity of ncRNAs and functional RNA motifs in non-translated regions of regular messenger RNAs (mRNAs) with an emphasis on evolutionary questions. This survey is complemented by an in-depth analysis of examples from different classes of RNAs focusing mostly on their evolution in the vertebrate lineage. We present a survey of Y RNA genes in vertebrates and study the molecular evolution of the U7 snRNA, the snoRNAs E1/U17, E2, and E3, the Y RNA family, the let-7 microRNA (miRNA) family, and the mRNA-like evf-1 gene. We furthermore discuss the statistical distribution of miRNAs in metazoans, which suggests an explosive increase in the miRNA repertoire in vertebrates. The analysis of the transcription of ncRNAs suggests that small RNAs in general are genetically mobile in the sense that their association with a hostgene (e.g. when transcribed from introns of a mRNA) can change on evolutionary time scales. The let-7 family demonstrates, that even the mode of transcription (as intron or as exon) can change among paralogous ncRNA.

20.
BMC Bioinformatics ; 4: 55, 2003 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-14604445

RESUMEN

BACKGROUND: The genome of the avian adenovirus Chicken Embryo Lethal Orphan (CELO) has two terminal regions without detectable homology in mammalian adenoviruses that are left without annotation in the initial analysis. Since adenoviruses have been a rich source of new insights into molecular cell biology and practical applications of CELO as gene a delivery vector are being considered, this genome appeared worth revisiting. We conducted a systematic reannotation and in-depth sequence analysis of the CELO genome. RESULTS: We describe a strongly diverged paralogous cluster including ORF-2, ORF-12, ORF-13, and ORF-14 with an ATPase/helicase domain most likely acquired from adeno-associated parvoviruses. None of these ORFs appear to have retained ATPase/helicase function and alternative functions (e.g. modulation of gene expression during the early life-cycle) must be considered in an adenoviral context. Further, we identified a cluster of three putative type-1-transmembrane glycoproteins with IG-like domains (ORF-9, ORF-10, ORF-11) which are good candidates to substitute for the missing immunomodulatory functions of mammalian adenoviruses. ORF-16 (located directly adjacent) displays distant homology to vertebrate mono-ADP-ribosyltransferases. Members of this family are known to be involved in immuno-regulation and similiar functions during CELO life cycle can be considered for this ORF. Finally, we describe a putative triglyceride lipase (merged ORF-18/19) with additional domains, which can be expected to have specific roles during the infection of birds, since they are unique to avian adenoviruses and Marek's disease-like viruses, a group of pathogenic avian herpesviruses. CONCLUSIONS: We could characterize most of the previously unassigned ORFs pointing to functions in host-virus interaction. The results provide new directives for rationally designed experiments.


Asunto(s)
Adenovirus A Aviar/genética , Adenovirus A Aviar/patogenicidad , Genes Virales/fisiología , Genoma Viral , Sistemas de Lectura Abierta/fisiología , Proteínas Estructurales Virales/genética , ADP Ribosa Transferasas/genética , Adenosina Trifosfatasas/fisiología , Infecciones por Adenoviridae/genética , Alphaherpesvirinae/enzimología , Alphaherpesvirinae/genética , Secuencia de Aminoácidos , Secuencia Conservada/fisiología , ADN Helicasas/fisiología , Adenovirus A Aviar/enzimología , Glicoproteínas/fisiología , Inmunoglobulinas/fisiología , Lipasa/fisiología , Proteínas de la Membrana/fisiología , Datos de Secuencia Molecular , Péptidos/fisiología , Estructura Terciaria de Proteína/fisiología , Alineación de Secuencia/métodos , Homología de Secuencia de Aminoácido , Proteínas Virales/fisiología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA