Búsqueda | BVS CLAP/SMR-OPS/OMS

1.

Tissue-specific regulation of gene expression via unproductive splicing.

Mironov, Alexei; Petrova, Marina; Margasyuk, Sergey; Vlasenok, Maria; Mironov, Andrey A; Skvortsov, Dmitry; Pervouchine, Dmitri D.

Nucleic Acids Res ; 51(7): 3055-3066, 2023 04 24.

Artículo en Inglés | MEDLINE | ID: mdl-36912101

RESUMEN

Eukaryotic gene expression is regulated post-transcriptionally by a mechanism called unproductive splicing, in which mRNA is triggered to degrade by the nonsense-mediated decay (NMD) pathway as a result of regulated alternative splicing (AS). Only a few dozen unproductive splicing events (USEs) are currently documented, and many more remain to be identified. Here, we analyzed RNA-seq experiments from the Genotype-Tissue Expression (GTEx) Consortium to identify USEs, in which an increase in the NMD isoform splicing rate is accompanied by tissue-specific down-regulation of the host gene. To characterize RNA-binding proteins (RBPs) that regulate USEs, we superimposed these results with RBP footprinting data and experiments on the response of the transcriptome to the perturbation of expression of a large panel of RBPs. Concordant tissue-specific changes between the expression of RBP and USE splicing rate revealed a high-confidence regulatory network including 27 tissue-specific USEs with strong evidence of RBP binding. Among them, we found previously unknown PTBP1-controlled events in the DCLK2 and IQGAP1 genes, for which we confirmed the regulatory effect using small interfering RNA (siRNA) knockdown experiments in the A549 cell line. In sum, we present a transcriptomic pipeline that allows the identification of tissue-specific USEs, potentially many more than were reported here using stringent filters.

Asunto(s)

Empalme Alternativo , Empalme del ARN , Regulación de la Expresión Génica , Degradación de ARNm Mediada por Codón sin Sentido , Isoformas de Proteínas/genética , ARN Mensajero/metabolismo , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo , Humanos , Línea Celular

2.

Foreign RNA spike-ins enable accurate allele-specific expression analysis at scale.

Mendelevich, Asia; Gupta, Saumya; Pakharev, Aleksei; Teodosiadis, Athanasios; Mironov, Andrey A; Gimelbrant, Alexander A.

Bioinformatics ; 39(39 Suppl 1): i431-i439, 2023 06 30.

Artículo en Inglés | MEDLINE | ID: mdl-37387154

RESUMEN

MOTIVATION: Analysis of allele-specific expression is strongly affected by the technical noise present in RNA-seq experiments. Previously, we showed that technical replicates can be used for precise estimates of this noise, and we provided a tool for correction of technical noise in allele-specific expression analysis. This approach is very accurate but costly due to the need for two or more replicates of each library. Here, we develop a spike-in approach which is highly accurate at only a small fraction of the cost. RESULTS: We show that a distinct RNA added as a spike-in before library preparation reflects technical noise of the whole library and can be used in large batches of samples. We experimentally demonstrate the effectiveness of this approach using combinations of RNA from species distinguishable by alignment, namely, mouse, human, and Caenorhabditis elegans. Our new approach, controlFreq, enables highly accurate and computationally efficient analysis of allele-specific expression in (and between) arbitrarily large studies at an overall cost increase of â¼5%. AVAILABILITY AND IMPLEMENTATION: Analysis pipeline for this approach is available at GitHub as R package controlFreq (github.com/gimelbrantlab/controlFreq).

Asunto(s)

Caenorhabditis elegans , Bibliotecas , Humanos , Animales , Ratones , Alelos , Caenorhabditis elegans/genética , Biblioteca de Genes , ARN/genética

3.

OrthoQuantum: visualizing evolutionary repertoire of eukaryotic proteins.

Ilnitskiy, Ivan S; Zharikova, Anastasia A; Mironov, Andrey A.

Nucleic Acids Res ; 50(W1): W534-W540, 2022 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-35610035

RESUMEN

Extensive amounts of data from next-generation sequencing and omics studies have led to the accumulation of information that provides insight into the evolutionary landscape of related proteins. Here, we present OrthoQuantum, a web server that allows for time-efficient analysis and visualization of phylogenetic profiles of any set of eukaryotic proteins. It is a simple-to-use tool capable of searching large input sets of proteins. Using data from open source databases of orthologous sequences in a wide range of taxonomic groups, it enables users to assess coupled evolutionary patterns and helps define lineage-specific innovations. The web interface allows to perform queries with gene names and UniProt identifiers in different phylogenetic clades and supplement presence with an additional BLAST search. The conservation patterns of proteins are coded as binary vectors, i.e., strings that encode the presence or absence of orthologous proteins in other genomes. These strings are used to calculate top-scoring correlation pairs needed for finding co-inherited proteins which are simultaneously present or simultaneously absent in specific lineages. Profiles are visualized in combination with phylogenetic trees in a JavaScript-based interface. The OrthoQuantum v1.0 web server is freely available at http://orthoq.bioinf.fbb.msu.ru along with documentation and tutorial.

Asunto(s)

Eucariontes , Filogenia , Proteínas , Programas Informáticos , Eucariontes/genética , Genoma , Internet , Proteínas/genética

4.

Studying RNA-DNA interactome by Red-C identifies noncoding RNAs associated with various chromatin types and reveals transcription dynamics.

Gavrilov, Alexey A; Zharikova, Anastasiya A; Galitsyna, Aleksandra A; Luzhin, Artem V; Rubanova, Natalia M; Golov, Arkadiy K; Petrova, Nadezhda V; Logacheva, Maria D; Kantidze, Omar L; Ulianov, Sergey V; Magnitov, Mikhail D; Mironov, Andrey A; Razin, Sergey V.

Nucleic Acids Res ; 48(12): 6699-6714, 2020 07 09.

Artículo en Inglés | MEDLINE | ID: mdl-32479626

RESUMEN

Non-coding RNAs (ncRNAs) participate in various biological processes, including regulating transcription and sustaining genome 3D organization. Here, we present a method termed Red-C that exploits proximity ligation to identify contacts with the genome for all RNA molecules present in the nucleus. Using Red-C, we uncovered the RNA-DNA interactome of human K562 cells and identified hundreds of ncRNAs enriched in active or repressed chromatin, including previously undescribed RNAs. Analysis of the RNA-DNA interactome also allowed us to trace the kinetics of messenger RNA production. Our data support the model of co-transcriptional intron splicing, but not the hypothesis of the circularization of actively transcribed genes.

Asunto(s)

Cromatina/genética , ADN/genética , Genoma/genética , ARN no Traducido/genética , Transcripción Genética , Núcleo Celular/genética , Humanos , ARN Mensajero/genética , ARN no Traducido/aislamiento & purificación , Factores de Transcripción/genética

5.

Coloc-stats: a unified web interface to perform colocalization analysis of genomic features.

Simovski, Boris; Kanduri, Chakravarthi; Gundersen, Sveinung; Titov, Dmytro; Domanska, Diana; Bock, Christoph; Bossini-Castillo, Lara; Chikina, Maria; Favorov, Alexander; Layer, Ryan M; Mironov, Andrey A; Quinlan, Aaron R; Sheffield, Nathan C; Trynka, Gosia; Sandve, Geir K.

Nucleic Acids Res ; 46(W1): W186-W193, 2018 07 02.

Artículo en Inglés | MEDLINE | ID: mdl-29873782

RESUMEN

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.

Asunto(s)

Genómica/métodos , Programas Informáticos , Inmunoprecipitación de Cromatina , Factor de Transcripción GATA1/metabolismo , Internet , Análisis de Secuencia de ADN , Interfaz Usuario-Computador

6.

Application of sorting and next generation sequencing to study 5Î-UTR influence on translation efficiency in Escherichia coli.

Evfratov, Sergey A; Osterman, Ilya A; Komarova, Ekaterina S; Pogorelskaya, Alexandra M; Rubtsova, Maria P; Zatsepin, Timofei S; Semashko, Tatiana A; Kostryukova, Elena S; Mironov, Andrey A; Burnaev, Evgeny; Krymova, Ekaterina; Gelfand, Mikhail S; Govorun, Vadim M; Bogdanov, Alexey A; Sergiev, Petr V; Dontsova, Olga A.

Nucleic Acids Res ; 45(6): 3487-3502, 2017 04 07.

Artículo en Inglés | MEDLINE | ID: mdl-27899632

RESUMEN

Yield of protein per translated mRNA may vary by four orders of magnitude. Many studies analyzed the influence of mRNA features on the translation yield. However, a detailed understanding of how mRNA sequence determines its propensity to be translated is still missing. Here, we constructed a set of reporter plasmid libraries encoding CER fluorescent protein preceded by randomized 5Î untranslated regions (5Î-UTR) and Red fluorescent protein (RFP) used as an internal control. Each library was transformed into Escherchia coli cells, separated by efficiency of CER mRNA translation by a cell sorter and subjected to next generation sequencing. We tested efficiency of translation of the CER gene preceded by each of 48 natural 5Î-UTR sequences and introduced random and designed mutations into natural and artificially selected 5Î-UTRs. Several distinct properties could be ascribed to a group of 5Î-UTRs most efficient in translation. In addition to known ones, several previously unrecognized features that contribute to the translation enhancement were found, such as low proportion of cytidine residues, multiple SD sequences and AG repeats. The latter could be identified as translation enhancer, albeit less efficient than SD sequence in several natural 5Î-UTRs.

Asunto(s)

Regiones no Traducidas 5' , Escherichia coli/genética , Biosíntesis de Proteínas , Secuencias Reguladoras de Ácido Ribonucleico , Separación Celular , Citometría de Flujo , Genes Reporteros , Secuenciación de Nucleótidos de Alto Rendimiento , Mutación , Conformación de Ácido Nucleico , Nucleótidos/fisiología

7.

StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data.

Stavrovskaya, Elena D; Niranjan, Tejasvi; Fertig, Elana J; Wheelan, Sarah J; Favorov, Alexander V; Mironov, Andrey A.

Bioinformatics ; 33(20): 3158-3165, 2017 Oct 15.

Artículo en Inglés | MEDLINE | ID: mdl-29028265

RESUMEN

MOTIVATION: Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. RESULTS: Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. AVAILABILITY AND IMPLEMENTATION: The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. CONTACT: favorov@sensi.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Regulación de la Expresión Génica , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Inmunoprecipitación de Cromatina/métodos , Epigenómica/métodos , Genoma Humano , Humanos

8.

Probing-directed identification of novel structured RNAs.

Vinogradova, Svetlana V; Sutormin, Roman A; Mironov, Andrey A; Soldatov, Ruslan A.

RNA Biol ; 13(2): 232-42, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-26732206

RESUMEN

Transcripts often harbor RNA elements, which regulate cell processes co- or post-transcriptionally. The functions of many regulatory RNA elements depend on their structure, thus it is important to determine the structure as well as to scan genomes for structured elements. State of the art ab initio approaches to predict structured RNAs rely on DNA sequence analysis. They use 2 major types of information inferred from a sequence: thermodynamic stability of an RNA structure and evolutionary footprints of base-pair interactions. In recent years, chemical probing of RNA has arisen as an alternative source of structural information. RNA probing experiments detect positions accessible to specific types of chemicals or enzymes indicating their propensity to be in a paired or unpaired state. There exist several strategies to integrate probing data into RNA secondary structure prediction algorithms that substantially improve the prediction quality. However, whether and how probing data could contribute to detection of structured RNAs remains an open question. We previously developed the energy-based approach RNASurface to detect locally optimal structured RNA elements. Here, we integrate probing data into the RNASurface energy model using a general framework. We show that the use of experimental data allows for better discrimination of ncRNAs from other transcripts. Application of RNASurface to genome-wide analysis of the human transcriptome with PARS data identifies previously undetectable segments, with evidence of functionality for some of them.

Asunto(s)

Conformación de Ácido Nucleico , ARN/genética , Análisis de Secuencia de ADN , Transcriptoma/genética , Algoritmos , Genoma Humano , Humanos , Anotación de Secuencia Molecular , ARN/química

9.

RNASurface: fast and accurate detection of locally optimal potentially structured RNA segments.

Soldatov, Ruslan A; Vinogradova, Svetlana V; Mironov, Andrey A.

Bioinformatics ; 30(4): 457-63, 2014 Feb 15.

Artículo en Inglés | MEDLINE | ID: mdl-24292360

RESUMEN

MOTIVATION: During the past decade, new classes of non-coding RNAs (ncRNAs) and their unexpected functions were discovered. Stable secondary structure is the key feature of many non-coding RNAs. Taking into account huge amounts of genomic data, development of computational methods to survey genomes for structured RNAs remains an actual problem, especially when homologous sequences are not available for comparative analysis. Existing programs scan genomes with a fixed window by efficiently constructing a matrix of RNA minimum free energies. A wide range of lengths of structured RNAs necessitates the use of many different window lengths that substantially increases the output size and computational efforts. RESULTS: In this article, we present an algorithm RNASurface to efficiently scan genomes by constructing a matrix of significance of RNA secondary structures and to identify all locally optimal structured RNA segments up to a predefined size. RNASurface significantly improves precision of identification of known ncRNA in Bacillus subtilis. AVAILABILITY AND IMPLEMENTATION: RNASurface C source code is available from http://bioinf.fbb.msu.ru/RNASurface/downloads.html.

Asunto(s)

Bacillus subtilis/genética , Genoma Bacteriano , ARN no Traducido/genética , Análisis de Secuencia de ARN/métodos , Algoritmos , Simulación por Computador , Genómica , Conformación de Ácido Nucleico

10.

CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation.

Nikulova, Anna A; Favorov, Alexander V; Sutormin, Roman A; Makeev, Vsevolod J; Mironov, Andrey A.

Nucleic Acids Res ; 40(12): e93, 2012 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-22422836

RESUMEN

Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.

Asunto(s)

Regulación de la Expresión Génica , Elementos Reguladores de la Transcripción , Análisis de Secuencia de ADN , Algoritmos , Animales , Tipificación del Cuerpo/genética , Drosophila/embriología , Drosophila/genética , Drosophila/metabolismo , Elementos de Facilitación Genéticos , Regulación del Desarrollo de la Expresión Génica , Músculos/metabolismo , Posición Específica de Matrices de Puntuación , Programas Informáticos

11.

BaRDIC: robust peak calling for RNA-DNA interaction data.

Mylarshchikov, Dmitry E; Nikolskaya, Arina I; Bogomaz, Olesja D; Zharikova, Anastasia A; Mironov, Andrey A.

NAR Genom Bioinform ; 6(2): lqae054, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38774512

RESUMEN

Chromatin-associated non-coding RNAs play important roles in various cellular processes by targeting genomic loci. Two types of genome-wide NGS experiments exist to detect such targets: 'one-to-al', which focuses on targets of a single RNA, and 'all-to-al', which captures targets of all RNAs in a sample. As with many NGS experiments, they are prone to biases and noise, so it becomes essential to detect 'peaks'-specific interactions of an RNA with genomic targets. Here, we present BaRDIC-Binomial RNA-DNA Interaction Caller-a tailored method to detect peaks in both types of RNA-DNA interaction data. BaRDIC is the first tool to simultaneously take into account the two most prominent biases in the data: chromatin heterogeneity and distance-dependent decay of interaction frequency. Since RNAs differ in their interaction preferences, BaRDIC adapts peak sizes according to the abundances and contact patterns of individual RNAs. These features enable BaRDIC to make more robust predictions than currently applied peak-calling algorithms and better handle the characteristic sparsity of all-to-all data. The BaRDIC package is freely available at https://github.com/dmitrymyl/BaRDIC.

12.

Exploring massive, genome scale datasets with the GenometriCorr package.

Favorov, Alexander; Mularoni, Loris; Cope, Leslie M; Medvedeva, Yulia; Mironov, Andrey A; Makeev, Vsevolod J; Wheelan, Sarah J.

PLoS Comput Biol ; 8(5): e1002529, 2012 May.

Artículo en Inglés | MEDLINE | ID: mdl-22693437

RESUMEN

UNLABELLED: We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. AVAILABILITY AND IMPLEMENTATION: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor.

Asunto(s)

Bases de Datos Genéticas , Genómica/métodos , Almacenamiento y Recuperación de la Información , Modelos Genéticos , Modelos Estadísticos , Programas Informáticos , Animales , Cromosomas , Epigenómica , Sitios Genéticos , Genoma , Humanos , Internet , ARN de Transferencia/genética , Estadísticas no Paramétricas , Interfaz Usuario-Computador

13.

Foreign RNA spike-ins enable accurate allele-specific expression analysis at scale.

Mendelevich, Asia; Gupta, Saumya; Pakharev, Aleksei; Teodosiadis, Athanasios; Mironov, Andrey A; Gimelbrant, Alexander A.

bioRxiv ; 2023 Feb 12.

Artículo en Inglés | MEDLINE | ID: mdl-36798258

RESUMEN

Motivation: Analysis of allele-specific expression is strongly affected by the technical noise present in RNA-seq experiments. Previously, we showed that technical replicates can be used for precise estimates of this noise, and we provided a tool for correction of technical noise in allele-specific expression analysis. This approach is very accurate but costly due to the need for two or more replicates of each library. Here, we develop a spike-in approach that is highly accurate at only a small fraction of the cost. Results: We show that a distinct RNA added as a spike-in before library preparation reflects technical noise of the whole library and can be used in large batches of samples. We experimentally demonstrate the effectiveness of this approach using combinations of RNA from species distinguishable by alignment, namely, mouse, human, and C.elegans . Our new approach, controlFreq , enables highly accurate and computationally efficient analysis of allele-specific expression in (and between) arbitrarily large studies at an overall cost increase of ~ 5%. Availability: Analysis pipeline for this approach is available at GitHub as R package controlFreq ( github.com/gimelbrantlab/controlFreq ). Contact: agimelbrant@altius.org.

14.

RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach.

Novichkov, Pavel S; Rodionov, Dmitry A; Stavrovskaya, Elena D; Novichkova, Elena S; Kazakov, Alexey E; Gelfand, Mikhail S; Arkin, Adam P; Mironov, Andrey A; Dubchak, Inna.

Nucleic Acids Res ; 38(Web Server issue): W299-307, 2010 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-20542910

RESUMEN

RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.

Asunto(s)

Genoma Bacteriano , Regulón , Programas Informáticos , Genómica , Internet , Operón , Staphylococcaceae/genética , Integración de Sistemas , Interfaz Usuario-Computador

15.

Investigation of the Role of PUFA Metabolism in Breast Cancer Using a Rank-Based Random Forest Algorithm.

Guryleva, Mariia V; Penzar, Dmitry D; Chistyakov, Dmitry V; Mironov, Andrey A; Favorov, Alexander V; Sergeeva, Marina G.

Cancers (Basel) ; 14(19)2022 Sep 25.

Artículo en Inglés | MEDLINE | ID: mdl-36230586

RESUMEN

Polyunsaturated fatty acid (PUFA) metabolism is currently a focus in cancer research due to PUFAs functioning as structural components of the membrane matrix, as fuel sources for energy production, and as sources of secondary messengers, so called oxylipins, important players of inflammatory processes. Although breast cancer (BC) is the leading cause of cancer death among women worldwide, no systematic study of PUFA metabolism as a system of interrelated processes in this disease has been carried out. Here, we implemented a Boruta-based feature selection algorithm to determine the list of most important PUFA metabolism genes altered in breast cancer tissues compared with in normal tissues. A rank-based Random Forest (RF) model was built on the selected gene list (33 genes) and applied to predict the cancer phenotype to ascertain the PUFA genes involved in cancerogenesis. It showed high-performance of dichotomic classification (balanced accuracy of 0.94, ROC AUC 0.99) We also retrieved a list of the important PUFA genes (46 genes) that differed between molecular subtypes at the level of breast cancer molecular subtypes. The balanced accuracy of the classification model built on the specified genes was 0.82, while the ROC AUC for the sensitivity analysis was 0.85. Specific patterns of PUFA metabolic changes were obtained for each molecular subtype of breast cancer. These results show evidence that (1) PUFA metabolism genes are critical for the pathogenesis of breast cancer; (2) BC subtypes differ in PUFA metabolism genes expression; and (3) the lists of genes selected in the models are enriched with genes involved in the metabolism of signaling lipids.

16.

Ectopic expression of HIV-1 Tat modifies gene expression in cultured B cells: implications for the development of B-cell lymphomas in HIV-1-infected patients.

Valyaeva, Anna A; Tikhomirova, Maria A; Potashnikova, Daria M; Bogomazova, Alexandra N; Snigiryova, Galina P; Penin, Aleksey A; Logacheva, Maria D; Arifulin, Eugene A; Shmakova, Anna A; Germini, Diego; Kachalova, Anastasia I; Saidova, Aleena A; Zharikova, Anastasia A; Musinova, Yana R; Mironov, Andrey A; Vassetzky, Yegor S; Sheval, Eugene V.

PeerJ ; 10: e13986, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-36275462

RESUMEN

An increased frequency of B-cell lymphomas is observed in human immunodeficiency virus-1 (HIV-1)-infected patients, although HIV-1 does not infect B cells. Development of B-cell lymphomas may be potentially due to the action of the HIV-1 Tat protein, which is actively released from HIV-1-infected cells, on uninfected B cells. The exact mechanism of Tat-induced B-cell lymphomagenesis has not yet been precisely identified. Here, we ectopically expressed either Tat or its TatC22G mutant devoid of transactivation activity in the RPMI 8866 lymphoblastoid B cell line and performed a genome-wide analysis of host gene expression. Stable expression of both Tat and TatC22G led to substantial modifications of the host transcriptome, including pronounced changes in antiviral response and cell cycle pathways. We did not find any strong action of Tat on cell proliferation, but during prolonged culturing, Tat-expressing cells were displaced by non-expressing cells, indicating that Tat expression slightly inhibited cell growth. We also found an increased frequency of chromosome aberrations in cells expressing Tat. Thus, Tat can modify gene expression in cultured B cells, leading to subtle modifications in cellular growth and chromosome instability, which could promote lymphomagenesis over time.

Asunto(s)

VIH-1 , Linfoma de Células B , Humanos , VIH-1/genética , Productos del Gen tat del Virus de la Inmunodeficiencia Humana/genética , Expresión Génica Ectópica , Linfoma de Células B/genética , Expresión Génica

17.

Evolution of prokaryotic genes by shift of stop codons.

Vakhrusheva, Anna A; Kazanov, Marat D; Mironov, Andrey A; Bazykin, Georgii A.

J Mol Evol ; 72(2): 138-46, 2011 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-21082168

RESUMEN

De novo origin of coding sequence remains an obscure issue in molecular evolution. One of the possible paths for addition (subtraction) of DNA segments to (from) a gene is stop codon shift. Single nucleotide substitutions can destroy the existing stop codon, leading to uninterrupted translation up to the next stop codon in the gene's reading frame, or create a premature stop codon via a nonsense mutation. Furthermore, short indels-caused frameshifts near gene's end may lead to premature stop codons or to translation past the existing stop codon. Here, we describe the evolution of the length of coding sequence of prokaryotic genes by change of positions of stop codons. We observed cases of addition of regions of 3'UTR to genes due to mutations at the existing stop codon, and cases of subtraction of C-terminal coding segments due to nonsense mutations upstream of the stop codon. Many of the observed stop codon shifts cannot be attributed to sequencing errors or rare deleterious variants segregating within bacterial populations. The additions of regions of 3'UTR tend to occur in those genes in which they are facilitated by nearby downstream in-frame triplets which may serve as new stop codons. Conversely, subtractions of coding sequence often give rise to in-frame stop codons located nearby. The amino acid composition of the added region is significantly biased, compared to the overall amino acid composition of the genes. Our results show that in prokaryotes, shift of stop codon is an underappreciated contributor to functional evolution of gene length.

Asunto(s)

Bacterias/genética , Codón de Terminación , Evolución Molecular , Genes Bacterianos , Algoritmos , Análisis por Conglomerados , Bases de Datos Genéticas , Mutación INDEL , Modelos Genéticos , Sistemas de Lectura Abierta , Mutación Puntual

18.

Replicate sequencing libraries are important for quantification of allelic imbalance.

Mendelevich, Asia; Vinogradova, Svetlana; Gupta, Saumya; Mironov, Andrey A; Sunyaev, Shamil R; Gimelbrant, Alexander A.

Nat Commun ; 12(1): 3370, 2021 06 07.

Artículo en Inglés | MEDLINE | ID: mdl-34099647

RESUMEN

A sensitive approach to quantitative analysis of transcriptional regulation in diploid organisms is analysis of allelic imbalance (AI) in RNA sequencing (RNA-seq) data. A near-universal practice in such studies is to prepare and sequence only one library per RNA sample. We present theoretical and experimental evidence that data from a single RNA-seq library is insufficient for reliable quantification of the contribution of technical noise to the observed AI signal; consequently, reliance on one-replicate experimental design can lead to unaccounted-for variation in error rates in allele-specific analysis. We develop a computational approach, Qllelic, that accurately accounts for technical noise by making use of replicate RNA-seq libraries. Testing on new and existing datasets shows that application of Qllelic greatly decreases false positive rate in allele-specific analysis while conserving appropriate signal, and thus greatly improves reproducibility of AI estimates. We explore sources of technical overdispersion in observed AI signal and conclude by discussing design of RNA-seq studies addressing two biologically important questions: quantification of transcriptome-wide AI in one sample, and differential analysis of allele-specific expression between samples.

Asunto(s)

Desequilibrio Alélico , Biblioteca de Genes , Polimorfismo de Nucleótido Simple , ARN/genética , Análisis de Secuencia de ARN/métodos , Transcriptoma/genética , Algoritmos , Alelos , Animales , Femenino , Ratones de la Cepa 129 , Modelos Genéticos , ARN/metabolismo

19.

Cumulative contact frequency of a chromatin region is an intrinsic property linked to its function.

Samborskaia, Margarita D; Galitsyna, Aleksandra; Pletenev, Ilya; Trofimova, Anna; Mironov, Andrey A; Gelfand, Mikhail S; Khrameeva, Ekaterina E.

PeerJ ; 8: e9566, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32864204

RESUMEN

Regulation of gene transcription is a complex process controlled by many factors, including the conformation of chromatin in the nucleus. Insights into chromatin conformation on both local and global scales can be provided by the Hi-C (high-throughput chromosomes conformation capture) method. One of the drawbacks of Hi-C analysis and interpretation is the presence of systematic biases, such as different accessibility to enzymes, amplification, and mappability of DNA regions, which all result in different visibility of the regions. Iterative correction (IC) is one of the most popular techniques developed for the elimination of these systematic biases. IC is based on the assumption that all chromatin regions have an equal number of observed contacts in Hi-C. In other words, the IC procedure is equalizing the experimental visibility approximated by the cumulative contact frequency (CCF) for all genomic regions. However, the differences in experimental visibility might be explained by biological factors such as chromatin openness, which is characteristic of distinct chromatin states. Here we show that CCF is positively correlated with active transcription. It is associated with compartment organization, since compartment A demonstrates higher CCF and gene expression levels than compartment B. Notably, this observation holds for a wide range of species, including human, mouse, and Drosophila. Moreover, we track the CCF state for syntenic blocks between human and mouse and conclude that active state assessed by CCF is an intrinsic property of the DNA region, which is independent of local genomic and epigenomic context. Our findings establish a missing link between Hi-C normalization procedures removing CCF from the data and poorly investigated and possibly relevant biological factors contributing to CCF.

20.

Origin of the nuclear proteome on the basis of pre-existing nuclear localization signals in prokaryotic proteins.

Lisitsyna, Olga M; Kurnaeva, Margarita A; Arifulin, Eugene A; Shubina, Maria Y; Musinova, Yana R; Mironov, Andrey A; Sheval, Eugene V.

Biol Direct ; 15(1): 9, 2020 04 28.

Artículo en Inglés | MEDLINE | ID: mdl-32345340

RESUMEN

BACKGROUND: The origin of the selective nuclear protein import machinery, which consists of nuclear pore complexes and adaptor molecules interacting with the nuclear localization signals (NLSs) of cargo molecules, is one of the most important events in the evolution of eukaryotic cells. How proteins were selected for import into the forming nucleus remains an open question. RESULTS: Here, we demonstrate that functional NLSs may be integrated in the nucleotide-binding domains of both eukaryotic and prokaryotic proteins and may coevolve with these domains. CONCLUSION: The presence of sequences similar to NLSs in the DNA-binding domains of prokaryotic proteins might have created an advantage for nuclear accumulation of these proteins during evolution of the nuclear-cytoplasmic barrier, influencing which proteins accumulated and became compartmentalized inside the forming nucleus (i.e., the content of the nuclear proteome). REVIEWERS: This article was reviewed by Sergey Melnikov and Igor Rogozin. OPEN PEER REVIEW: Reviewed by Sergey Melnikov and Igor Rogozin. For the full reviews, please go to the Reviewers' comments section.

Asunto(s)

Proteínas Arqueales/química , Proteínas Bacterianas/química , Núcleo Celular/fisiología , Evolución Molecular , Señales de Localización Nuclear/química , Proteoma , Células Eucariotas/química , Células Procariotas/química

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA