Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
1.
Nucleic Acids Res ; 51(7): 3055-3066, 2023 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-36912101

RESUMO

Eukaryotic gene expression is regulated post-transcriptionally by a mechanism called unproductive splicing, in which mRNA is triggered to degrade by the nonsense-mediated decay (NMD) pathway as a result of regulated alternative splicing (AS). Only a few dozen unproductive splicing events (USEs) are currently documented, and many more remain to be identified. Here, we analyzed RNA-seq experiments from the Genotype-Tissue Expression (GTEx) Consortium to identify USEs, in which an increase in the NMD isoform splicing rate is accompanied by tissue-specific down-regulation of the host gene. To characterize RNA-binding proteins (RBPs) that regulate USEs, we superimposed these results with RBP footprinting data and experiments on the response of the transcriptome to the perturbation of expression of a large panel of RBPs. Concordant tissue-specific changes between the expression of RBP and USE splicing rate revealed a high-confidence regulatory network including 27 tissue-specific USEs with strong evidence of RBP binding. Among them, we found previously unknown PTBP1-controlled events in the DCLK2 and IQGAP1 genes, for which we confirmed the regulatory effect using small interfering RNA (siRNA) knockdown experiments in the A549 cell line. In sum, we present a transcriptomic pipeline that allows the identification of tissue-specific USEs, potentially many more than were reported here using stringent filters.


Assuntos
Processamento Alternativo , Splicing de RNA , Regulação da Expressão Gênica , Degradação do RNAm Mediada por Códon sem Sentido , Isoformas de Proteínas/genética , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Humanos , Linhagem Celular
2.
Bioinformatics ; 39(39 Suppl 1): i431-i439, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387154

RESUMO

MOTIVATION: Analysis of allele-specific expression is strongly affected by the technical noise present in RNA-seq experiments. Previously, we showed that technical replicates can be used for precise estimates of this noise, and we provided a tool for correction of technical noise in allele-specific expression analysis. This approach is very accurate but costly due to the need for two or more replicates of each library. Here, we develop a spike-in approach which is highly accurate at only a small fraction of the cost. RESULTS: We show that a distinct RNA added as a spike-in before library preparation reflects technical noise of the whole library and can be used in large batches of samples. We experimentally demonstrate the effectiveness of this approach using combinations of RNA from species distinguishable by alignment, namely, mouse, human, and Caenorhabditis elegans. Our new approach, controlFreq, enables highly accurate and computationally efficient analysis of allele-specific expression in (and between) arbitrarily large studies at an overall cost increase of ∼5%. AVAILABILITY AND IMPLEMENTATION: Analysis pipeline for this approach is available at GitHub as R package controlFreq (github.com/gimelbrantlab/controlFreq).


Assuntos
Caenorhabditis elegans , Bibliotecas , Humanos , Animais , Camundongos , Alelos , Caenorhabditis elegans/genética , Biblioteca Gênica , RNA/genética
3.
Nucleic Acids Res ; 50(W1): W534-W540, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35610035

RESUMO

Extensive amounts of data from next-generation sequencing and omics studies have led to the accumulation of information that provides insight into the evolutionary landscape of related proteins. Here, we present OrthoQuantum, a web server that allows for time-efficient analysis and visualization of phylogenetic profiles of any set of eukaryotic proteins. It is a simple-to-use tool capable of searching large input sets of proteins. Using data from open source databases of orthologous sequences in a wide range of taxonomic groups, it enables users to assess coupled evolutionary patterns and helps define lineage-specific innovations. The web interface allows to perform queries with gene names and UniProt identifiers in different phylogenetic clades and supplement presence with an additional BLAST search. The conservation patterns of proteins are coded as binary vectors, i.e., strings that encode the presence or absence of orthologous proteins in other genomes. These strings are used to calculate top-scoring correlation pairs needed for finding co-inherited proteins which are simultaneously present or simultaneously absent in specific lineages. Profiles are visualized in combination with phylogenetic trees in a JavaScript-based interface. The OrthoQuantum v1.0 web server is freely available at http://orthoq.bioinf.fbb.msu.ru along with documentation and tutorial.


Assuntos
Eucariotos , Filogenia , Proteínas , Software , Eucariotos/genética , Genoma , Internet , Proteínas/genética
4.
Nucleic Acids Res ; 48(12): 6699-6714, 2020 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-32479626

RESUMO

Non-coding RNAs (ncRNAs) participate in various biological processes, including regulating transcription and sustaining genome 3D organization. Here, we present a method termed Red-C that exploits proximity ligation to identify contacts with the genome for all RNA molecules present in the nucleus. Using Red-C, we uncovered the RNA-DNA interactome of human K562 cells and identified hundreds of ncRNAs enriched in active or repressed chromatin, including previously undescribed RNAs. Analysis of the RNA-DNA interactome also allowed us to trace the kinetics of messenger RNA production. Our data support the model of co-transcriptional intron splicing, but not the hypothesis of the circularization of actively transcribed genes.


Assuntos
Cromatina/genética , DNA/genética , Genoma/genética , RNA não Traduzido/genética , Transcrição Gênica , Núcleo Celular/genética , Humanos , RNA Mensageiro/genética , RNA não Traduzido/isolamento & purificação , Fatores de Transcrição/genética
5.
Nucleic Acids Res ; 46(W1): W186-W193, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29873782

RESUMO

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.


Assuntos
Genômica/métodos , Software , Imunoprecipitação da Cromatina , Fator de Transcrição GATA1/metabolismo , Internet , Análise de Sequência de DNA , Interface Usuário-Computador
6.
Nucleic Acids Res ; 45(6): 3487-3502, 2017 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-27899632

RESUMO

Yield of protein per translated mRNA may vary by four orders of magnitude. Many studies analyzed the influence of mRNA features on the translation yield. However, a detailed understanding of how mRNA sequence determines its propensity to be translated is still missing. Here, we constructed a set of reporter plasmid libraries encoding CER fluorescent protein preceded by randomized 5΄ untranslated regions (5΄-UTR) and Red fluorescent protein (RFP) used as an internal control. Each library was transformed into Escherchia coli cells, separated by efficiency of CER mRNA translation by a cell sorter and subjected to next generation sequencing. We tested efficiency of translation of the CER gene preceded by each of 48 natural 5΄-UTR sequences and introduced random and designed mutations into natural and artificially selected 5΄-UTRs. Several distinct properties could be ascribed to a group of 5΄-UTRs most efficient in translation. In addition to known ones, several previously unrecognized features that contribute to the translation enhancement were found, such as low proportion of cytidine residues, multiple SD sequences and AG repeats. The latter could be identified as translation enhancer, albeit less efficient than SD sequence in several natural 5΄-UTRs.


Assuntos
Regiões 5' não Traduzidas , Escherichia coli/genética , Biossíntese de Proteínas , Sequências Reguladoras de Ácido Ribonucleico , Separação Celular , Citometria de Fluxo , Genes Reporter , Sequenciamento de Nucleotídeos em Larga Escala , Mutação , Conformação de Ácido Nucleico , Nucleotídeos/fisiologia
7.
Bioinformatics ; 33(20): 3158-3165, 2017 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-29028265

RESUMO

MOTIVATION: Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. RESULTS: Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. AVAILABILITY AND IMPLEMENTATION: The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. CONTACT: favorov@sensi.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Regulação da Expressão Gênica , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Imunoprecipitação da Cromatina/métodos , Epigenômica/métodos , Genoma Humano , Humanos
8.
RNA Biol ; 13(2): 232-42, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26732206

RESUMO

Transcripts often harbor RNA elements, which regulate cell processes co- or post-transcriptionally. The functions of many regulatory RNA elements depend on their structure, thus it is important to determine the structure as well as to scan genomes for structured elements. State of the art ab initio approaches to predict structured RNAs rely on DNA sequence analysis. They use 2 major types of information inferred from a sequence: thermodynamic stability of an RNA structure and evolutionary footprints of base-pair interactions. In recent years, chemical probing of RNA has arisen as an alternative source of structural information. RNA probing experiments detect positions accessible to specific types of chemicals or enzymes indicating their propensity to be in a paired or unpaired state. There exist several strategies to integrate probing data into RNA secondary structure prediction algorithms that substantially improve the prediction quality. However, whether and how probing data could contribute to detection of structured RNAs remains an open question. We previously developed the energy-based approach RNASurface to detect locally optimal structured RNA elements. Here, we integrate probing data into the RNASurface energy model using a general framework. We show that the use of experimental data allows for better discrimination of ncRNAs from other transcripts. Application of RNASurface to genome-wide analysis of the human transcriptome with PARS data identifies previously undetectable segments, with evidence of functionality for some of them.


Assuntos
Conformação de Ácido Nucleico , RNA/genética , Análise de Sequência de DNA , Transcriptoma/genética , Algoritmos , Genoma Humano , Humanos , Anotação de Sequência Molecular , RNA/química
9.
Bioinformatics ; 30(4): 457-63, 2014 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-24292360

RESUMO

MOTIVATION: During the past decade, new classes of non-coding RNAs (ncRNAs) and their unexpected functions were discovered. Stable secondary structure is the key feature of many non-coding RNAs. Taking into account huge amounts of genomic data, development of computational methods to survey genomes for structured RNAs remains an actual problem, especially when homologous sequences are not available for comparative analysis. Existing programs scan genomes with a fixed window by efficiently constructing a matrix of RNA minimum free energies. A wide range of lengths of structured RNAs necessitates the use of many different window lengths that substantially increases the output size and computational efforts. RESULTS: In this article, we present an algorithm RNASurface to efficiently scan genomes by constructing a matrix of significance of RNA secondary structures and to identify all locally optimal structured RNA segments up to a predefined size. RNASurface significantly improves precision of identification of known ncRNA in Bacillus subtilis. AVAILABILITY AND IMPLEMENTATION: RNASurface C source code is available from http://bioinf.fbb.msu.ru/RNASurface/downloads.html.


Assuntos
Bacillus subtilis/genética , Genoma Bacteriano , RNA não Traduzido/genética , Análise de Sequência de RNA/métodos , Algoritmos , Simulação por Computador , Genômica , Conformação de Ácido Nucleico
10.
Nucleic Acids Res ; 40(12): e93, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22422836

RESUMO

Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.


Assuntos
Regulação da Expressão Gênica , Elementos Reguladores de Transcrição , Análise de Sequência de DNA , Algoritmos , Animais , Padronização Corporal/genética , Drosophila/embriologia , Drosophila/genética , Drosophila/metabolismo , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica no Desenvolvimento , Músculos/metabolismo , Matrizes de Pontuação de Posição Específica , Software
11.
NAR Genom Bioinform ; 6(2): lqae054, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38774512

RESUMO

Chromatin-associated non-coding RNAs play important roles in various cellular processes by targeting genomic loci. Two types of genome-wide NGS experiments exist to detect such targets: 'one-to-al', which focuses on targets of a single RNA, and 'all-to-al', which captures targets of all RNAs in a sample. As with many NGS experiments, they are prone to biases and noise, so it becomes essential to detect 'peaks'-specific interactions of an RNA with genomic targets. Here, we present BaRDIC-Binomial RNA-DNA Interaction Caller-a tailored method to detect peaks in both types of RNA-DNA interaction data. BaRDIC is the first tool to simultaneously take into account the two most prominent biases in the data: chromatin heterogeneity and distance-dependent decay of interaction frequency. Since RNAs differ in their interaction preferences, BaRDIC adapts peak sizes according to the abundances and contact patterns of individual RNAs. These features enable BaRDIC to make more robust predictions than currently applied peak-calling algorithms and better handle the characteristic sparsity of all-to-all data. The BaRDIC package is freely available at https://github.com/dmitrymyl/BaRDIC.

12.
PLoS Comput Biol ; 8(5): e1002529, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22693437

RESUMO

UNLABELLED: We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. AVAILABILITY AND IMPLEMENTATION: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Armazenamento e Recuperação da Informação , Modelos Genéticos , Modelos Estatísticos , Software , Animais , Cromossomos , Epigenômica , Loci Gênicos , Genoma , Humanos , Internet , RNA de Transferência/genética , Estatísticas não Paramétricas , Interface Usuário-Computador
13.
bioRxiv ; 2023 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-36798258

RESUMO

Motivation: Analysis of allele-specific expression is strongly affected by the technical noise present in RNA-seq experiments. Previously, we showed that technical replicates can be used for precise estimates of this noise, and we provided a tool for correction of technical noise in allele-specific expression analysis. This approach is very accurate but costly due to the need for two or more replicates of each library. Here, we develop a spike-in approach that is highly accurate at only a small fraction of the cost. Results: We show that a distinct RNA added as a spike-in before library preparation reflects technical noise of the whole library and can be used in large batches of samples. We experimentally demonstrate the effectiveness of this approach using combinations of RNA from species distinguishable by alignment, namely, mouse, human, and C.elegans . Our new approach, controlFreq , enables highly accurate and computationally efficient analysis of allele-specific expression in (and between) arbitrarily large studies at an overall cost increase of ~ 5%. Availability: Analysis pipeline for this approach is available at GitHub as R package controlFreq ( github.com/gimelbrantlab/controlFreq ). Contact: agimelbrant@altius.org.

14.
Nucleic Acids Res ; 38(Web Server issue): W299-307, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20542910

RESUMO

RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.


Assuntos
Genoma Bacteriano , Regulon , Software , Genômica , Internet , Óperon , Staphylococcaceae/genética , Integração de Sistemas , Interface Usuário-Computador
15.
Cancers (Basel) ; 14(19)2022 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-36230586

RESUMO

Polyunsaturated fatty acid (PUFA) metabolism is currently a focus in cancer research due to PUFAs functioning as structural components of the membrane matrix, as fuel sources for energy production, and as sources of secondary messengers, so called oxylipins, important players of inflammatory processes. Although breast cancer (BC) is the leading cause of cancer death among women worldwide, no systematic study of PUFA metabolism as a system of interrelated processes in this disease has been carried out. Here, we implemented a Boruta-based feature selection algorithm to determine the list of most important PUFA metabolism genes altered in breast cancer tissues compared with in normal tissues. A rank-based Random Forest (RF) model was built on the selected gene list (33 genes) and applied to predict the cancer phenotype to ascertain the PUFA genes involved in cancerogenesis. It showed high-performance of dichotomic classification (balanced accuracy of 0.94, ROC AUC 0.99) We also retrieved a list of the important PUFA genes (46 genes) that differed between molecular subtypes at the level of breast cancer molecular subtypes. The balanced accuracy of the classification model built on the specified genes was 0.82, while the ROC AUC for the sensitivity analysis was 0.85. Specific patterns of PUFA metabolic changes were obtained for each molecular subtype of breast cancer. These results show evidence that (1) PUFA metabolism genes are critical for the pathogenesis of breast cancer; (2) BC subtypes differ in PUFA metabolism genes expression; and (3) the lists of genes selected in the models are enriched with genes involved in the metabolism of signaling lipids.

16.
PeerJ ; 10: e13986, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36275462

RESUMO

An increased frequency of B-cell lymphomas is observed in human immunodeficiency virus-1 (HIV-1)-infected patients, although HIV-1 does not infect B cells. Development of B-cell lymphomas may be potentially due to the action of the HIV-1 Tat protein, which is actively released from HIV-1-infected cells, on uninfected B cells. The exact mechanism of Tat-induced B-cell lymphomagenesis has not yet been precisely identified. Here, we ectopically expressed either Tat or its TatC22G mutant devoid of transactivation activity in the RPMI 8866 lymphoblastoid B cell line and performed a genome-wide analysis of host gene expression. Stable expression of both Tat and TatC22G led to substantial modifications of the host transcriptome, including pronounced changes in antiviral response and cell cycle pathways. We did not find any strong action of Tat on cell proliferation, but during prolonged culturing, Tat-expressing cells were displaced by non-expressing cells, indicating that Tat expression slightly inhibited cell growth. We also found an increased frequency of chromosome aberrations in cells expressing Tat. Thus, Tat can modify gene expression in cultured B cells, leading to subtle modifications in cellular growth and chromosome instability, which could promote lymphomagenesis over time.


Assuntos
HIV-1 , Linfoma de Células B , Humanos , HIV-1/genética , Produtos do Gene tat do Vírus da Imunodeficiência Humana/genética , Expressão Ectópica do Gene , Linfoma de Células B/genética , Expressão Gênica
17.
J Mol Evol ; 72(2): 138-46, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21082168

RESUMO

De novo origin of coding sequence remains an obscure issue in molecular evolution. One of the possible paths for addition (subtraction) of DNA segments to (from) a gene is stop codon shift. Single nucleotide substitutions can destroy the existing stop codon, leading to uninterrupted translation up to the next stop codon in the gene's reading frame, or create a premature stop codon via a nonsense mutation. Furthermore, short indels-caused frameshifts near gene's end may lead to premature stop codons or to translation past the existing stop codon. Here, we describe the evolution of the length of coding sequence of prokaryotic genes by change of positions of stop codons. We observed cases of addition of regions of 3'UTR to genes due to mutations at the existing stop codon, and cases of subtraction of C-terminal coding segments due to nonsense mutations upstream of the stop codon. Many of the observed stop codon shifts cannot be attributed to sequencing errors or rare deleterious variants segregating within bacterial populations. The additions of regions of 3'UTR tend to occur in those genes in which they are facilitated by nearby downstream in-frame triplets which may serve as new stop codons. Conversely, subtractions of coding sequence often give rise to in-frame stop codons located nearby. The amino acid composition of the added region is significantly biased, compared to the overall amino acid composition of the genes. Our results show that in prokaryotes, shift of stop codon is an underappreciated contributor to functional evolution of gene length.


Assuntos
Bactérias/genética , Códon de Terminação , Evolução Molecular , Genes Bacterianos , Algoritmos , Análise por Conglomerados , Bases de Dados Genéticas , Mutação INDEL , Modelos Genéticos , Fases de Leitura Aberta , Mutação Puntual
18.
Nat Commun ; 12(1): 3370, 2021 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-34099647

RESUMO

A sensitive approach to quantitative analysis of transcriptional regulation in diploid organisms is analysis of allelic imbalance (AI) in RNA sequencing (RNA-seq) data. A near-universal practice in such studies is to prepare and sequence only one library per RNA sample. We present theoretical and experimental evidence that data from a single RNA-seq library is insufficient for reliable quantification of the contribution of technical noise to the observed AI signal; consequently, reliance on one-replicate experimental design can lead to unaccounted-for variation in error rates in allele-specific analysis. We develop a computational approach, Qllelic, that accurately accounts for technical noise by making use of replicate RNA-seq libraries. Testing on new and existing datasets shows that application of Qllelic greatly decreases false positive rate in allele-specific analysis while conserving appropriate signal, and thus greatly improves reproducibility of AI estimates. We explore sources of technical overdispersion in observed AI signal and conclude by discussing design of RNA-seq studies addressing two biologically important questions: quantification of transcriptome-wide AI in one sample, and differential analysis of allele-specific expression between samples.


Assuntos
Desequilíbrio Alélico , Biblioteca Gênica , Polimorfismo de Nucleotídeo Único , RNA/genética , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Algoritmos , Alelos , Animais , Feminino , Camundongos da Linhagem 129 , Modelos Genéticos , RNA/metabolismo
19.
PeerJ ; 8: e9566, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32864204

RESUMO

Regulation of gene transcription is a complex process controlled by many factors, including the conformation of chromatin in the nucleus. Insights into chromatin conformation on both local and global scales can be provided by the Hi-C (high-throughput chromosomes conformation capture) method. One of the drawbacks of Hi-C analysis and interpretation is the presence of systematic biases, such as different accessibility to enzymes, amplification, and mappability of DNA regions, which all result in different visibility of the regions. Iterative correction (IC) is one of the most popular techniques developed for the elimination of these systematic biases. IC is based on the assumption that all chromatin regions have an equal number of observed contacts in Hi-C. In other words, the IC procedure is equalizing the experimental visibility approximated by the cumulative contact frequency (CCF) for all genomic regions. However, the differences in experimental visibility might be explained by biological factors such as chromatin openness, which is characteristic of distinct chromatin states. Here we show that CCF is positively correlated with active transcription. It is associated with compartment organization, since compartment A demonstrates higher CCF and gene expression levels than compartment B. Notably, this observation holds for a wide range of species, including human, mouse, and Drosophila. Moreover, we track the CCF state for syntenic blocks between human and mouse and conclude that active state assessed by CCF is an intrinsic property of the DNA region, which is independent of local genomic and epigenomic context. Our findings establish a missing link between Hi-C normalization procedures removing CCF from the data and poorly investigated and possibly relevant biological factors contributing to CCF.

20.
Biol Direct ; 15(1): 9, 2020 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-32345340

RESUMO

BACKGROUND: The origin of the selective nuclear protein import machinery, which consists of nuclear pore complexes and adaptor molecules interacting with the nuclear localization signals (NLSs) of cargo molecules, is one of the most important events in the evolution of eukaryotic cells. How proteins were selected for import into the forming nucleus remains an open question. RESULTS: Here, we demonstrate that functional NLSs may be integrated in the nucleotide-binding domains of both eukaryotic and prokaryotic proteins and may coevolve with these domains. CONCLUSION: The presence of sequences similar to NLSs in the DNA-binding domains of prokaryotic proteins might have created an advantage for nuclear accumulation of these proteins during evolution of the nuclear-cytoplasmic barrier, influencing which proteins accumulated and became compartmentalized inside the forming nucleus (i.e., the content of the nuclear proteome). REVIEWERS: This article was reviewed by Sergey Melnikov and Igor Rogozin. OPEN PEER REVIEW: Reviewed by Sergey Melnikov and Igor Rogozin. For the full reviews, please go to the Reviewers' comments section.


Assuntos
Proteínas Arqueais/química , Proteínas de Bactérias/química , Núcleo Celular/fisiologia , Evolução Molecular , Sinais de Localização Nuclear/química , Proteoma , Células Eucarióticas/química , Células Procarióticas/química
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa