Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 51
Filter
1.
NAR Genom Bioinform ; 6(2): lqae054, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38774512

ABSTRACT

Chromatin-associated non-coding RNAs play important roles in various cellular processes by targeting genomic loci. Two types of genome-wide NGS experiments exist to detect such targets: 'one-to-al', which focuses on targets of a single RNA, and 'all-to-al', which captures targets of all RNAs in a sample. As with many NGS experiments, they are prone to biases and noise, so it becomes essential to detect 'peaks'-specific interactions of an RNA with genomic targets. Here, we present BaRDIC-Binomial RNA-DNA Interaction Caller-a tailored method to detect peaks in both types of RNA-DNA interaction data. BaRDIC is the first tool to simultaneously take into account the two most prominent biases in the data: chromatin heterogeneity and distance-dependent decay of interaction frequency. Since RNAs differ in their interaction preferences, BaRDIC adapts peak sizes according to the abundances and contact patterns of individual RNAs. These features enable BaRDIC to make more robust predictions than currently applied peak-calling algorithms and better handle the characteristic sparsity of all-to-all data. The BaRDIC package is freely available at https://github.com/dmitrymyl/BaRDIC.

2.
Bioinformatics ; 39(39 Suppl 1): i431-i439, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387154

ABSTRACT

MOTIVATION: Analysis of allele-specific expression is strongly affected by the technical noise present in RNA-seq experiments. Previously, we showed that technical replicates can be used for precise estimates of this noise, and we provided a tool for correction of technical noise in allele-specific expression analysis. This approach is very accurate but costly due to the need for two or more replicates of each library. Here, we develop a spike-in approach which is highly accurate at only a small fraction of the cost. RESULTS: We show that a distinct RNA added as a spike-in before library preparation reflects technical noise of the whole library and can be used in large batches of samples. We experimentally demonstrate the effectiveness of this approach using combinations of RNA from species distinguishable by alignment, namely, mouse, human, and Caenorhabditis elegans. Our new approach, controlFreq, enables highly accurate and computationally efficient analysis of allele-specific expression in (and between) arbitrarily large studies at an overall cost increase of ∼5%. AVAILABILITY AND IMPLEMENTATION: Analysis pipeline for this approach is available at GitHub as R package controlFreq (github.com/gimelbrantlab/controlFreq).


Subject(s)
Caenorhabditis elegans , Libraries , Humans , Animals , Mice , Alleles , Caenorhabditis elegans/genetics , Gene Library , RNA/genetics
3.
Nucleic Acids Res ; 51(7): 3055-3066, 2023 04 24.
Article in English | MEDLINE | ID: mdl-36912101

ABSTRACT

Eukaryotic gene expression is regulated post-transcriptionally by a mechanism called unproductive splicing, in which mRNA is triggered to degrade by the nonsense-mediated decay (NMD) pathway as a result of regulated alternative splicing (AS). Only a few dozen unproductive splicing events (USEs) are currently documented, and many more remain to be identified. Here, we analyzed RNA-seq experiments from the Genotype-Tissue Expression (GTEx) Consortium to identify USEs, in which an increase in the NMD isoform splicing rate is accompanied by tissue-specific down-regulation of the host gene. To characterize RNA-binding proteins (RBPs) that regulate USEs, we superimposed these results with RBP footprinting data and experiments on the response of the transcriptome to the perturbation of expression of a large panel of RBPs. Concordant tissue-specific changes between the expression of RBP and USE splicing rate revealed a high-confidence regulatory network including 27 tissue-specific USEs with strong evidence of RBP binding. Among them, we found previously unknown PTBP1-controlled events in the DCLK2 and IQGAP1 genes, for which we confirmed the regulatory effect using small interfering RNA (siRNA) knockdown experiments in the A549 cell line. In sum, we present a transcriptomic pipeline that allows the identification of tissue-specific USEs, potentially many more than were reported here using stringent filters.


Subject(s)
Alternative Splicing , RNA Splicing , Gene Expression Regulation , Nonsense Mediated mRNA Decay , Protein Isoforms/genetics , RNA, Messenger/metabolism , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Humans , Cell Line
4.
bioRxiv ; 2023 Feb 12.
Article in English | MEDLINE | ID: mdl-36798258

ABSTRACT

Motivation: Analysis of allele-specific expression is strongly affected by the technical noise present in RNA-seq experiments. Previously, we showed that technical replicates can be used for precise estimates of this noise, and we provided a tool for correction of technical noise in allele-specific expression analysis. This approach is very accurate but costly due to the need for two or more replicates of each library. Here, we develop a spike-in approach that is highly accurate at only a small fraction of the cost. Results: We show that a distinct RNA added as a spike-in before library preparation reflects technical noise of the whole library and can be used in large batches of samples. We experimentally demonstrate the effectiveness of this approach using combinations of RNA from species distinguishable by alignment, namely, mouse, human, and C.elegans . Our new approach, controlFreq , enables highly accurate and computationally efficient analysis of allele-specific expression in (and between) arbitrarily large studies at an overall cost increase of ~ 5%. Availability: Analysis pipeline for this approach is available at GitHub as R package controlFreq ( github.com/gimelbrantlab/controlFreq ). Contact: agimelbrant@altius.org.

5.
Cancers (Basel) ; 14(19)2022 Sep 25.
Article in English | MEDLINE | ID: mdl-36230586

ABSTRACT

Polyunsaturated fatty acid (PUFA) metabolism is currently a focus in cancer research due to PUFAs functioning as structural components of the membrane matrix, as fuel sources for energy production, and as sources of secondary messengers, so called oxylipins, important players of inflammatory processes. Although breast cancer (BC) is the leading cause of cancer death among women worldwide, no systematic study of PUFA metabolism as a system of interrelated processes in this disease has been carried out. Here, we implemented a Boruta-based feature selection algorithm to determine the list of most important PUFA metabolism genes altered in breast cancer tissues compared with in normal tissues. A rank-based Random Forest (RF) model was built on the selected gene list (33 genes) and applied to predict the cancer phenotype to ascertain the PUFA genes involved in cancerogenesis. It showed high-performance of dichotomic classification (balanced accuracy of 0.94, ROC AUC 0.99) We also retrieved a list of the important PUFA genes (46 genes) that differed between molecular subtypes at the level of breast cancer molecular subtypes. The balanced accuracy of the classification model built on the specified genes was 0.82, while the ROC AUC for the sensitivity analysis was 0.85. Specific patterns of PUFA metabolic changes were obtained for each molecular subtype of breast cancer. These results show evidence that (1) PUFA metabolism genes are critical for the pathogenesis of breast cancer; (2) BC subtypes differ in PUFA metabolism genes expression; and (3) the lists of genes selected in the models are enriched with genes involved in the metabolism of signaling lipids.

6.
PeerJ ; 10: e13986, 2022.
Article in English | MEDLINE | ID: mdl-36275462

ABSTRACT

An increased frequency of B-cell lymphomas is observed in human immunodeficiency virus-1 (HIV-1)-infected patients, although HIV-1 does not infect B cells. Development of B-cell lymphomas may be potentially due to the action of the HIV-1 Tat protein, which is actively released from HIV-1-infected cells, on uninfected B cells. The exact mechanism of Tat-induced B-cell lymphomagenesis has not yet been precisely identified. Here, we ectopically expressed either Tat or its TatC22G mutant devoid of transactivation activity in the RPMI 8866 lymphoblastoid B cell line and performed a genome-wide analysis of host gene expression. Stable expression of both Tat and TatC22G led to substantial modifications of the host transcriptome, including pronounced changes in antiviral response and cell cycle pathways. We did not find any strong action of Tat on cell proliferation, but during prolonged culturing, Tat-expressing cells were displaced by non-expressing cells, indicating that Tat expression slightly inhibited cell growth. We also found an increased frequency of chromosome aberrations in cells expressing Tat. Thus, Tat can modify gene expression in cultured B cells, leading to subtle modifications in cellular growth and chromosome instability, which could promote lymphomagenesis over time.


Subject(s)
HIV-1 , Lymphoma, B-Cell , Humans , HIV-1/genetics , tat Gene Products, Human Immunodeficiency Virus/genetics , Ectopic Gene Expression , Lymphoma, B-Cell/genetics , Gene Expression
7.
Nucleic Acids Res ; 50(W1): W534-W540, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35610035

ABSTRACT

Extensive amounts of data from next-generation sequencing and omics studies have led to the accumulation of information that provides insight into the evolutionary landscape of related proteins. Here, we present OrthoQuantum, a web server that allows for time-efficient analysis and visualization of phylogenetic profiles of any set of eukaryotic proteins. It is a simple-to-use tool capable of searching large input sets of proteins. Using data from open source databases of orthologous sequences in a wide range of taxonomic groups, it enables users to assess coupled evolutionary patterns and helps define lineage-specific innovations. The web interface allows to perform queries with gene names and UniProt identifiers in different phylogenetic clades and supplement presence with an additional BLAST search. The conservation patterns of proteins are coded as binary vectors, i.e., strings that encode the presence or absence of orthologous proteins in other genomes. These strings are used to calculate top-scoring correlation pairs needed for finding co-inherited proteins which are simultaneously present or simultaneously absent in specific lineages. Profiles are visualized in combination with phylogenetic trees in a JavaScript-based interface. The OrthoQuantum v1.0 web server is freely available at http://orthoq.bioinf.fbb.msu.ru along with documentation and tutorial.


Subject(s)
Eukaryota , Phylogeny , Proteins , Software , Eukaryota/genetics , Genome , Internet , Proteins/genetics
8.
Nat Commun ; 12(1): 3370, 2021 06 07.
Article in English | MEDLINE | ID: mdl-34099647

ABSTRACT

A sensitive approach to quantitative analysis of transcriptional regulation in diploid organisms is analysis of allelic imbalance (AI) in RNA sequencing (RNA-seq) data. A near-universal practice in such studies is to prepare and sequence only one library per RNA sample. We present theoretical and experimental evidence that data from a single RNA-seq library is insufficient for reliable quantification of the contribution of technical noise to the observed AI signal; consequently, reliance on one-replicate experimental design can lead to unaccounted-for variation in error rates in allele-specific analysis. We develop a computational approach, Qllelic, that accurately accounts for technical noise by making use of replicate RNA-seq libraries. Testing on new and existing datasets shows that application of Qllelic greatly decreases false positive rate in allele-specific analysis while conserving appropriate signal, and thus greatly improves reproducibility of AI estimates. We explore sources of technical overdispersion in observed AI signal and conclude by discussing design of RNA-seq studies addressing two biologically important questions: quantification of transcriptome-wide AI in one sample, and differential analysis of allele-specific expression between samples.


Subject(s)
Allelic Imbalance , Gene Library , Polymorphism, Single Nucleotide , RNA/genetics , Sequence Analysis, RNA/methods , Transcriptome/genetics , Algorithms , Alleles , Animals , Female , Mice, 129 Strain , Models, Genetic , RNA/metabolism
9.
PeerJ ; 8: e9566, 2020.
Article in English | MEDLINE | ID: mdl-32864204

ABSTRACT

Regulation of gene transcription is a complex process controlled by many factors, including the conformation of chromatin in the nucleus. Insights into chromatin conformation on both local and global scales can be provided by the Hi-C (high-throughput chromosomes conformation capture) method. One of the drawbacks of Hi-C analysis and interpretation is the presence of systematic biases, such as different accessibility to enzymes, amplification, and mappability of DNA regions, which all result in different visibility of the regions. Iterative correction (IC) is one of the most popular techniques developed for the elimination of these systematic biases. IC is based on the assumption that all chromatin regions have an equal number of observed contacts in Hi-C. In other words, the IC procedure is equalizing the experimental visibility approximated by the cumulative contact frequency (CCF) for all genomic regions. However, the differences in experimental visibility might be explained by biological factors such as chromatin openness, which is characteristic of distinct chromatin states. Here we show that CCF is positively correlated with active transcription. It is associated with compartment organization, since compartment A demonstrates higher CCF and gene expression levels than compartment B. Notably, this observation holds for a wide range of species, including human, mouse, and Drosophila. Moreover, we track the CCF state for syntenic blocks between human and mouse and conclude that active state assessed by CCF is an intrinsic property of the DNA region, which is independent of local genomic and epigenomic context. Our findings establish a missing link between Hi-C normalization procedures removing CCF from the data and poorly investigated and possibly relevant biological factors contributing to CCF.

10.
Nucleic Acids Res ; 48(12): 6699-6714, 2020 07 09.
Article in English | MEDLINE | ID: mdl-32479626

ABSTRACT

Non-coding RNAs (ncRNAs) participate in various biological processes, including regulating transcription and sustaining genome 3D organization. Here, we present a method termed Red-C that exploits proximity ligation to identify contacts with the genome for all RNA molecules present in the nucleus. Using Red-C, we uncovered the RNA-DNA interactome of human K562 cells and identified hundreds of ncRNAs enriched in active or repressed chromatin, including previously undescribed RNAs. Analysis of the RNA-DNA interactome also allowed us to trace the kinetics of messenger RNA production. Our data support the model of co-transcriptional intron splicing, but not the hypothesis of the circularization of actively transcribed genes.


Subject(s)
Chromatin/genetics , DNA/genetics , Genome/genetics , RNA, Untranslated/genetics , Transcription, Genetic , Cell Nucleus/genetics , Humans , RNA, Messenger/genetics , RNA, Untranslated/isolation & purification , Transcription Factors/genetics
11.
Biol Direct ; 15(1): 9, 2020 04 28.
Article in English | MEDLINE | ID: mdl-32345340

ABSTRACT

BACKGROUND: The origin of the selective nuclear protein import machinery, which consists of nuclear pore complexes and adaptor molecules interacting with the nuclear localization signals (NLSs) of cargo molecules, is one of the most important events in the evolution of eukaryotic cells. How proteins were selected for import into the forming nucleus remains an open question. RESULTS: Here, we demonstrate that functional NLSs may be integrated in the nucleotide-binding domains of both eukaryotic and prokaryotic proteins and may coevolve with these domains. CONCLUSION: The presence of sequences similar to NLSs in the DNA-binding domains of prokaryotic proteins might have created an advantage for nuclear accumulation of these proteins during evolution of the nuclear-cytoplasmic barrier, influencing which proteins accumulated and became compartmentalized inside the forming nucleus (i.e., the content of the nuclear proteome). REVIEWERS: This article was reviewed by Sergey Melnikov and Igor Rogozin. OPEN PEER REVIEW: Reviewed by Sergey Melnikov and Igor Rogozin. For the full reviews, please go to the Reviewers' comments section.


Subject(s)
Archaeal Proteins/chemistry , Bacterial Proteins/chemistry , Cell Nucleus/physiology , Evolution, Molecular , Nuclear Localization Signals/chemistry , Proteome , Eukaryotic Cells/chemistry , Prokaryotic Cells/chemistry
12.
Nucleic Acids Res ; 46(W1): W186-W193, 2018 07 02.
Article in English | MEDLINE | ID: mdl-29873782

ABSTRACT

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.


Subject(s)
Genomics/methods , Software , Chromatin Immunoprecipitation , GATA1 Transcription Factor/metabolism , Internet , Sequence Analysis, DNA , User-Computer Interface
13.
Methods Mol Biol ; 1745: 315-335, 2018.
Article in English | MEDLINE | ID: mdl-29476477

ABSTRACT

Recently developed high-throughput analytical techniques (e.g., protein mass spectrometry and nucleic acid sequencing) allow unprecedentedly sensitive, in-depth studies in molecular biology of cell proliferation, differentiation, aging, and death. However, the initial population of asynchronous cultured cells is highly heterogeneous by cell cycle stage, which complicates immediate analysis of some biological processes. Widely used cell synchronization protocols are time-consuming and can affect the finely tuned biochemical pathways leading to biased results. Besides, certain cell lines cannot be effectively synchronized. The current methodological challenge is thus to provide an effective tool for cell cycle phase-based population enrichment compatible with other required experimental procedures. Here, we describe an optimized approach to live cell FACS based on Hoechst 33342 cell-permeable DNA-binding fluorochrome staining. The proposed protocol is fast compared to traditional synchronization methods and yields reasonably pure fractions of viable cells for further experimental studies including high-throughput RNA-seq analysis.


Subject(s)
Biological Variation, Population , Cell Cycle/genetics , Flow Cytometry , Sequence Analysis, RNA , Single-Cell Analysis , Computational Biology , DNA Replication , Flow Cytometry/methods , Humans , K562 Cells , Microscopy , Single-Cell Analysis/methods , Staining and Labeling
14.
Bioinformatics ; 33(20): 3158-3165, 2017 Oct 15.
Article in English | MEDLINE | ID: mdl-29028265

ABSTRACT

MOTIVATION: Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. RESULTS: Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. AVAILABILITY AND IMPLEMENTATION: The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. CONTACT: favorov@sensi.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Regulation , Genomics/methods , Sequence Analysis, DNA/methods , Software , Algorithms , Chromatin Immunoprecipitation/methods , Epigenomics/methods , Genome, Human , Humans
15.
Sci Rep ; 7: 46080, 2017 04 13.
Article in English | MEDLINE | ID: mdl-28452371

ABSTRACT

The accumulation of misfolded proteins in the endoplasmic reticulum (ER) lumen due to the disruption of the homeostatic system of the ER leads to the induction of the ER stress response. Cellular stress-induced pathways globally transform genes expression on both the transcriptional and post-transcriptional levels with small RNA involvement as regulators of the stress response. The modulation of small RNA processing might represent an additional layer of a complex stress response program. However, it is poorly understood. Here, we studied changes in expression and small RNAs processing upon ER stress in Jurkat T-cells. Induced by ER-stress, depletion of miRNAs among small RNA composition was accompanied by a global decrease of 3' mono-adenylated, mono-cytodinylated and a global increase of 3' mono-uridinylated miRNA isoforms. We observed the specific subset of differentially expressed microRNAs, and also the dramatic induction of 32-nt tRNA fragments precisely phased to 5' and 3' ends of tRNA from a subset of tRNA isotypes. The induction of these tRNA fragments was linked to Angiogenin RNase, which mediates translation inhibition. Overall, the global perturbations of the expression and processing of miRNAs and tiRNAs were the most prominent features of small RNA transcriptome changes upon ER stress.


Subject(s)
Endoplasmic Reticulum Stress/genetics , MicroRNAs/genetics , RNA Processing, Post-Transcriptional/genetics , Base Sequence , Dithiothreitol/pharmacology , Endoplasmic Reticulum Stress/drug effects , Gene Expression Profiling , Gene Expression Regulation, Neoplastic/drug effects , Gene Library , Humans , Jurkat Cells , MicroRNAs/metabolism , Molecular Sequence Annotation , Nucleic Acid Conformation , Nucleotides/genetics , RNA Processing, Post-Transcriptional/drug effects , RNA, Transfer/chemistry , RNA, Transfer/genetics , RNA, Transfer/metabolism , T-Lymphocytes/drug effects , T-Lymphocytes/metabolism , Transcriptome/drug effects , Transcriptome/genetics
16.
Genome Biol Evol ; 9(2): 340-349, 2017 02 01.
Article in English | MEDLINE | ID: mdl-28201729

ABSTRACT

Many RNA molecules possess complicated secondary structure critical to their function. Mutations in double-helical regions of RNA may disrupt Watson-Crick (WC) interactions causing structure destabilization or even complete loss of function. Such disruption can be compensated by another mutation restoring base pairing, as has been shown for mRNA, rRNA and tRNA. Here, we investigate the evolution of intrinsic transcription terminators between closely related strains of Bacillus cereus. While the terminator structure is maintained by strong natural selection, as evidenced by the low frequency of disrupting mutations, we observe multiple instances of pairs of disrupting-compensating mutations in RNA structure stems. Such two-step switches between different WC pairs occur very fast, consistent with the low fitness conferred by the intermediate non-WC variant. Still, they are not instantaneous, and probably involve transient fixation of the intermediate variant. The GU wobble pair is the most frequent intermediate, and remains fixed longer than other intermediates, consistent with its less disruptive effect on the RNA structure. Double switches involving non-GU intermediates are more frequent at the ends of RNA stems, probably because they are associated with smaller fitness loss. Together, these results show that the fitness landscape of bacterial transcription terminators is rather rugged, but that the fitness valleys associated with unpaired stem nucleotides are rather shallow, facilitating evolution.


Subject(s)
Bacillus cereus/genetics , Evolution, Molecular , Terminator Regions, Genetic , Base Pairing , Genetic Fitness , Selection, Genetic
17.
Nucleic Acids Res ; 45(6): 3487-3502, 2017 04 07.
Article in English | MEDLINE | ID: mdl-27899632

ABSTRACT

Yield of protein per translated mRNA may vary by four orders of magnitude. Many studies analyzed the influence of mRNA features on the translation yield. However, a detailed understanding of how mRNA sequence determines its propensity to be translated is still missing. Here, we constructed a set of reporter plasmid libraries encoding CER fluorescent protein preceded by randomized 5΄ untranslated regions (5΄-UTR) and Red fluorescent protein (RFP) used as an internal control. Each library was transformed into Escherchia coli cells, separated by efficiency of CER mRNA translation by a cell sorter and subjected to next generation sequencing. We tested efficiency of translation of the CER gene preceded by each of 48 natural 5΄-UTR sequences and introduced random and designed mutations into natural and artificially selected 5΄-UTRs. Several distinct properties could be ascribed to a group of 5΄-UTRs most efficient in translation. In addition to known ones, several previously unrecognized features that contribute to the translation enhancement were found, such as low proportion of cytidine residues, multiple SD sequences and AG repeats. The latter could be identified as translation enhancer, albeit less efficient than SD sequence in several natural 5΄-UTRs.


Subject(s)
5' Untranslated Regions , Escherichia coli/genetics , Protein Biosynthesis , Regulatory Sequences, Ribonucleic Acid , Cell Separation , Flow Cytometry , Genes, Reporter , High-Throughput Nucleotide Sequencing , Mutation , Nucleic Acid Conformation , Nucleotides/physiology
18.
PLoS One ; 11(9): e0162681, 2016.
Article in English | MEDLINE | ID: mdl-27690309

ABSTRACT

The large and increasing volume of genomic data analyzed by comparative methods provides information about transcription factors and their binding sites that, in turn, enables statistical analysis of correlations between factors and sites, uncovering mechanisms and evolution of specific protein-DNA recognition. Here we present an online tool, Prot-DNA-Korr, designed to identify and analyze crucial protein-DNA pairs of positions in a family of transcription factors. Correlations are identified by analysis of mutual information between columns of protein and DNA alignments. The algorithm reduces the effects of common phylogenetic history and of abundance of closely related proteins and binding sites. We apply it to five closely related subfamilies of the MerR family of bacterial transcription factors that regulate heavy metal resistance systems. We validate the approach using known 3D structures of MerR-family proteins in complexes with their cognate DNA binding sites and demonstrate that a significant fraction of correlated positions indeed form specific side-chain-to-base contacts. The joint distribution of amino acids and nucleotides hence may be used to predict changes of specificity for point mutations in transcription factors.

19.
RNA Biol ; 13(2): 232-42, 2016.
Article in English | MEDLINE | ID: mdl-26732206

ABSTRACT

Transcripts often harbor RNA elements, which regulate cell processes co- or post-transcriptionally. The functions of many regulatory RNA elements depend on their structure, thus it is important to determine the structure as well as to scan genomes for structured elements. State of the art ab initio approaches to predict structured RNAs rely on DNA sequence analysis. They use 2 major types of information inferred from a sequence: thermodynamic stability of an RNA structure and evolutionary footprints of base-pair interactions. In recent years, chemical probing of RNA has arisen as an alternative source of structural information. RNA probing experiments detect positions accessible to specific types of chemicals or enzymes indicating their propensity to be in a paired or unpaired state. There exist several strategies to integrate probing data into RNA secondary structure prediction algorithms that substantially improve the prediction quality. However, whether and how probing data could contribute to detection of structured RNAs remains an open question. We previously developed the energy-based approach RNASurface to detect locally optimal structured RNA elements. Here, we integrate probing data into the RNASurface energy model using a general framework. We show that the use of experimental data allows for better discrimination of ncRNAs from other transcripts. Application of RNASurface to genome-wide analysis of the human transcriptome with PARS data identifies previously undetectable segments, with evidence of functionality for some of them.


Subject(s)
Nucleic Acid Conformation , RNA/genetics , Sequence Analysis, DNA , Transcriptome/genetics , Algorithms , Genome, Human , Humans , Molecular Sequence Annotation , RNA/chemistry
20.
PLoS One ; 10(5): e0126125, 2015.
Article in English | MEDLINE | ID: mdl-25961318

ABSTRACT

The recent advent of conformation capture techniques has provided unprecedented insights into the spatial organization of chromatin. We present a large-scale investigation of the inter-chromosomal segment and gene contact networks in embryonic stem cells of two mammalian organisms: humans and mice. Both interaction networks are characterized by a high degree of clustering of genome regions and the existence of hubs. Both genomes exhibit similar structural characteristics such as increased flexibility of certain Y chromosome regions and co-localization of centromere-proximal regions. Spatial proximity is correlated with the functional similarity of genes in both species. We also found a significant association between spatial proximity and the co-expression of genes in the human genome. The structural properties of chromatin are also species specific, including the presence of two highly interactive regions in mouse chromatin and an increased contact density on short, gene-rich human chromosomes, thereby indicating their central nuclear position. Trans-interacting segments are enriched in active marks in human and had no distinct feature profile in mouse. Thus, in contrast to interactions within individual chromosomes, the inter-chromosomal interactions in human and mouse embryonic stem cells do not appear to be conserved.


Subject(s)
Chromatin/genetics , Chromosomes, Mammalian/genetics , Epistasis, Genetic , Gene Regulatory Networks , Genomics , Models, Genetic , Algorithms , Animals , Cluster Analysis , Embryonic Stem Cells , Evolution, Molecular , Gene Ontology , Genetic Heterogeneity , Genomics/methods , Humans , Mice , Multigene Family
SELECTION OF CITATIONS
SEARCH DETAIL
...