Búsqueda | Portal de Búsqueda de la BVS

Probing-directed identification of novel structured RNAs.

Vinogradova, Svetlana V; Sutormin, Roman A; Mironov, Andrey A; Soldatov, Ruslan A.

RNA Biol ; 13(2): 232-42, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-26732206

RESUMEN

Transcripts often harbor RNA elements, which regulate cell processes co- or post-transcriptionally. The functions of many regulatory RNA elements depend on their structure, thus it is important to determine the structure as well as to scan genomes for structured elements. State of the art ab initio approaches to predict structured RNAs rely on DNA sequence analysis. They use 2 major types of information inferred from a sequence: thermodynamic stability of an RNA structure and evolutionary footprints of base-pair interactions. In recent years, chemical probing of RNA has arisen as an alternative source of structural information. RNA probing experiments detect positions accessible to specific types of chemicals or enzymes indicating their propensity to be in a paired or unpaired state. There exist several strategies to integrate probing data into RNA secondary structure prediction algorithms that substantially improve the prediction quality. However, whether and how probing data could contribute to detection of structured RNAs remains an open question. We previously developed the energy-based approach RNASurface to detect locally optimal structured RNA elements. Here, we integrate probing data into the RNASurface energy model using a general framework. We show that the use of experimental data allows for better discrimination of ncRNAs from other transcripts. Application of RNASurface to genome-wide analysis of the human transcriptome with PARS data identifies previously undetectable segments, with evidence of functionality for some of them.

Asunto(s)

Conformación de Ácido Nucleico , ARN/genética , Análisis de Secuencia de ADN , Transcriptoma/genética , Algoritmos , Genoma Humano , Humanos , Anotación de Secuencia Molecular , ARN/química

Comparative genome analysis of Pseudogymnoascus spp. reveals primarily clonal evolution with small genome fragments exchanged between lineages.

Leushkin, Evgeny V; Logacheva, Maria D; Penin, Aleksey A; Sutormin, Roman A; Gerasimov, Evgeny S; Kochkina, Galina A; Ivanushkina, Natalia E; Vasilenko, Oleg V; Kondrashov, Alexey S; Ozerskaya, Svetlana M.

BMC Genomics ; 16: 400, 2015 May 21.

Artículo en Inglés | MEDLINE | ID: mdl-25994131

RESUMEN

BACKGROUND: Pseudogymnoascus spp. is a wide group of fungi lineages in the family Pseudorotiaceae including an aggressive pathogen of bats P. destructans. Although several lineages of P. spp. were shown to produce ascospores in culture, the vast majority of P. spp. demonstrates no evidence of sexual reproduction. P. spp. can tolerate a wide range of different temperatures and salinities and can survive even in permafrost layer. Adaptability of P. spp. to different environments is accompanied by extremely variable morphology and physiology. RESULTS: We sequenced genotypes of 14 strains of P. spp., 5 of which were extracted from permafrost, 1 from a cryopeg, a layer of unfrozen ground in permafrost, and 8 from temperate surface environments. All sequenced genotypes are haploid. Nucleotide diversity among these genomes is very high, with a typical evolutionary distance at synonymous sites dS ≈ 0.5, suggesting that the last common ancestor of these strains lived >50 Mya. The strains extracted from permafrost do not form a separate clade. Instead, each permafrost strain has close relatives from temperate environments. We observed a strictly clonal population structure with no conflicting topologies for ~99% of genome sequences. However, there is a number of short (~100-10,000 nt) genomic segments with the total length of 67.6 Kb which possess phylogenetic patterns strikingly different from the rest of the genome. The most remarkable case is a MAT-locus, which has 2 distinct alleles interspersed along the whole-genome phylogenetic tree. CONCLUSIONS: Predominantly clonal structure of genome sequences is consistent with the observations that sexual reproduction is rare in P. spp. Small number of regions with noncanonical phylogenies seem to arise due to some recombination events between derived lineages of P. spp., with MAT-locus being transferred on multiple occasions. All sequenced strains have heterothallic configuration of MAT-locus.

Asunto(s)

Ascomicetos/fisiología , Evolución Clonal , Genoma Fúngico , Ascomicetos/clasificación , Ascomicetos/genética , Evolución Molecular , Filogenia , Reproducción Asexuada , Análisis de Secuencia de ADN , Especificidad de la Especie

Functional implications of splicing polymorphisms in the human genome.

Kurmangaliyev, Yerbol Z; Sutormin, Roman A; Naumenko, Sergey A; Bazykin, Georgii A; Gelfand, Mikhail S.

Hum Mol Genet ; 22(17): 3449-59, 2013 Sep 01.

Artículo en Inglés | MEDLINE | ID: mdl-23640990

RESUMEN

Proper splicing is often crucial for gene functioning and its disruption may be strongly deleterious. Nevertheless, even the essential for splicing canonical dinucleotides of the splice sites are often polymorphic. Here, we use data from The 1000 Genomes Project to study single-nucleotide polymorphisms (SNPs) in the canonical dinucleotides. Splice sites carrying SNPs are enriched in weakly expressed genes and in rarely used alternative splice sites. Genes with disrupted splice sites tend to have low selective constraint, and the splice sites disrupted by SNPs are less likely to be conserved in mouse. Furthermore, SNPs are enriched in splice sites whose effects on gene function are minor: splice sites located outside of protein-coding regions, in shorter exons, closer to the 3'-ends of proteins, and outside of functional protein domains. Most of these effects are more pronounced for high-frequency SNPs. Despite these trends, many of the polymorphic sites may still substantially affect the function of the corresponding genes. A number of the observed splice site-disrupting SNPs, including several high-frequency ones, were found among mutations described in OMIM.

Asunto(s)

Genoma Humano , Polimorfismo de Nucleótido Simple , Sitios de Empalme de ARN , Animales , Bases de Datos Genéticas , Evolución Molecular , Variación Genética , Genoma , Humanos , Ratones , Conformación Proteica , Proteínas/química , Empalme del ARN , Alineación de Secuencia

CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation.

Nikulova, Anna A; Favorov, Alexander V; Sutormin, Roman A; Makeev, Vsevolod J; Mironov, Andrey A.

Nucleic Acids Res ; 40(12): e93, 2012 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-22422836

RESUMEN

Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.

Asunto(s)

Regulación de la Expresión Génica , Elementos Reguladores de la Transcripción , Análisis de Secuencia de ADN , Algoritmos , Animales , Tipificación del Cuerpo/genética , Drosophila/embriología , Drosophila/genética , Drosophila/metabolismo , Elementos de Facilitación Genéticos , Regulación del Desarrollo de la Expresión Génica , Músculos/metabolismo , Posición Específica de Matrices de Puntuación , Programas Informáticos

The miniature genome of a carnivorous plant Genlisea aurea contains a low number of genes and short non-coding sequences.

Leushkin, Evgeny V; Sutormin, Roman A; Nabieva, Elena R; Penin, Aleksey A; Kondrashov, Alexey S; Logacheva, Maria D.

BMC Genomics ; 14: 476, 2013 Jul 15.

Artículo en Inglés | MEDLINE | ID: mdl-23855885

RESUMEN

BACKGROUND: Genlisea aurea (Lentibulariaceae) is a carnivorous plant with unusually small genome size - 63.6 Mb - one of the smallest known among higher plants. Data on the genome sizes and the phylogeny of Genlisea suggest that this is a derived state within the genus. Thus, G. aurea is an excellent model organism for studying evolutionary mechanisms of genome contraction. RESULTS: Here we report sequencing and de novo draft assembly of G. aurea genome. The assembly consists of 10,687 contigs of the total length of 43.4 Mb and includes 17,755 complete and partial protein-coding genes. Its comparison with the genome of Mimulus guttatus, another representative of higher core Lamiales clade, reveals striking differences in gene content and length of non-coding regions. CONCLUSIONS: Genome contraction was a complex process, which involved gene loss and reduction of lengths of introns and intergenic regions, but not intron loss. The gene loss is more frequent for the genes that belong to multigenic families indicating that genetic redundancy is an important prerequisite for genome size reduction.

Asunto(s)

Tamaño del Genoma , Genoma de Planta , Magnoliopsida/genética , Evolución Biológica , Hibridación Genómica Comparativa , ADN Intergénico/genética , ADN de Plantas/genética , Intrones , Anotación de Secuencia Molecular , Filogenia , Análisis de Secuencia de ADN , Transcriptoma

RegPrecise 3.0--a resource for genome-scale exploration of transcriptional regulation in bacteria.

Novichkov, Pavel S; Kazakov, Alexey E; Ravcheev, Dmitry A; Leyn, Semen A; Kovaleva, Galina Y; Sutormin, Roman A; Kazanov, Marat D; Riehl, William; Arkin, Adam P; Dubchak, Inna; Rodionov, Dmitry A.

BMC Genomics ; 14: 745, 2013 Nov 01.

Artículo en Inglés | MEDLINE | ID: mdl-24175918

RESUMEN

BACKGROUND: Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in prokaryotes is one of the critical tasks of modern genomics. Bacteria from different taxonomic groups, whose lifestyles and natural environments are substantially different, possess highly diverged transcriptional regulatory networks. The comparative genomics approaches are useful for in silico reconstruction of bacterial regulons and networks operated by both transcription factors (TFs) and RNA regulatory elements (riboswitches). DESCRIPTION: RegPrecise (http://regprecise.lbl.gov) is a web resource for collection, visualization and analysis of transcriptional regulons reconstructed by comparative genomics. We significantly expanded a reference collection of manually curated regulons we introduced earlier. RegPrecise 3.0 provides access to inferred regulatory interactions organized by phylogenetic, structural and functional properties. Taxonomy-specific collections include 781 TF regulogs inferred in more than 160 genomes representing 14 taxonomic groups of Bacteria. TF-specific collections include regulogs for a selected subset of 40 TFs reconstructed across more than 30 taxonomic lineages. Novel collections of regulons operated by RNA regulatory elements (riboswitches) include near 400 regulogs inferred in 24 bacterial lineages. RegPrecise 3.0 provides four classifications of the reference regulons implemented as controlled vocabularies: 55 TF protein families; 43 RNA motif families; ~150 biological processes or metabolic pathways; and ~200 effectors or environmental signals. Genome-wide visualization of regulatory networks and metabolic pathways covered by the reference regulons are available for all studied genomes. A separate section of RegPrecise 3.0 contains draft regulatory networks in 640 genomes obtained by an conservative propagation of the reference regulons to closely related genomes. CONCLUSIONS: RegPrecise 3.0 gives access to the transcriptional regulons reconstructed in bacterial genomes. Analytical capabilities include exploration of: regulon content, structure and function; TF binding site motifs; conservation and variations in genome-wide regulatory networks across all taxonomic groups of Bacteria. RegPrecise 3.0 was selected as a core resource on transcriptional regulation of the Department of Energy Systems Biology Knowledgebase, an emerging software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions, perform large-scale analyses, and model interactions in microbes, plants, and their communities.

Asunto(s)

Bacterias/genética , Bases de Datos Genéticas , Genoma Bacteriano , Bacterias/clasificación , Redes Reguladoras de Genes/genética , Internet , Redes y Vías Metabólicas/genética , Factores de Transcripción/genética , Interfaz Usuario-Computador

Recognition of transmembrane segments in proteins: review and consistency-based benchmarking of internet servers.

Sadovskaya, Nataliya S; Sutormin, Roman A; Gelfand, Mikhail S.

J Bioinform Comput Biol ; 4(5): 1033-56, 2006 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-17099940

RESUMEN

Membrane proteins perform a number of crucial functions as transporters, receptors, and components of enzyme complexes. Identification of membrane proteins and prediction of their topology is thus an important part of genome annotation. We present here an overview of transmembrane segments in protein sequences, summarize data from large-scale genome studies, and report results of benchmarking of several popular internet servers.

Asunto(s)

Algoritmos , Membrana Celular/química , Internet , Proteínas de la Membrana/química , Análisis de Secuencia de Proteína/métodos , Validación de Programas de Computación , Secuencia de Aminoácidos , Benchmarking , Datos de Secuencia Molecular , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Análisis de Secuencia de Proteína/normas

BATMAS30: amino acid substitution matrix for alignment of bacterial transporters.

Sutormin, Roman A; Rakhmaninova, Aleksandra B; Gelfand, Mikhail S.

Proteins ; 51(1): 85-95, 2003 Apr 01.

Artículo en Inglés | MEDLINE | ID: mdl-12596266

RESUMEN

Aligned amino acid sequences of three functionally independent samples of transmembrane (TM) transport proteins have been analyzed. The concept of TM-kernel is proposed as the most probable transmembrane region of a sequence. The average amino acid composition of TM-kernels differs from the published amino acid composition of transmembrane segments. TM-kernels contain more alanines, glycines, and less polar, charged, and aromatic residues in contrast to non-TM-proteins. There are also differences between TM-kernels of bacterial and eukaryotic proteins. We have constructed amino acid substitution matrices for bacterial TM-kernels, named the BATMAS (BActerial Transmembrane MAtrix of Substitutions) series. In TM-kernels, polar and charged residues, as well as proline and tyrosine, are highly conserved, whereas there are more substitutions within the group of hydrophobic residues, in contrast to non-TM-proteins that have fewer, relatively more conserved, hydrophobic residues. These results demonstrate that alignment of transmembrane proteins should be based on at least two amino acid substitution matrices, one for loops (e.g., the BLOSUM series) and one for TM-segments (the BATMAS series), and the choice of the TM-matrix should be different for eukaryotic and bacterial proteins.

Asunto(s)

Proteínas Bacterianas/química , Proteínas de Transporte de Membrana/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína , Transportadoras de Casetes de Unión a ATP/química , Algoritmos , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Datos de Secuencia Molecular

The channel in transporters is formed by residues that are rare in transmembrane helices.

Kalinina, Olga V; Makeev, Vsevolod J; Sutormin, Roman A; Gelfand, Mikhail S; Rakhmaninova, Aleksandra B.

In Silico Biol ; 3(1-2): 197-204, 2003.

Artículo en Inglés | MEDLINE | ID: mdl-14524337

RESUMEN

Transmembrane transport is an essential component of the cell life. Many genes encoding known or putative transport proteins are found in bacterial genomes. In most cases their substrate specificity is not experimentally determined and only approximately predicted by comparative genomic analysis. Even less is known about the 3D structure of transporters. Nevertheless, the published experimental data demonstrate that channel-forming residues determine the substrate specificity of secondary transporters and analysis of these residues would provide better understanding of the transport mechanism. We developed a simple computational method for identification of channel-forming residues in transporter sequences. It is based on the analysis of amino acids frequencies in bacterial secondary transporters. We applied this method to a variety of transmembrane proteins with resolved 3D structure. The predictions are in sufficiently good agreement with the real protein structure.

Asunto(s)

Proteínas Bacterianas/química , Canales Iónicos/química , Canales Iónicos/fisiología , Proteínas de la Membrana/química , Transportadoras de Casetes de Unión a ATP/química , Algoritmos , Secuencia de Aminoácidos , Proteínas Bacterianas/fisiología , Transporte Biológico , Proteínas de la Membrana/fisiología , Modelos Moleculares , Modelos Teóricos , Fragmentos de Péptidos/química , Conformación Proteica , Estructura Secundaria de Proteína

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA