RESUMO
"Epitranscriptomics" is the new RNA code that represents an ensemble of posttranscriptional RNA chemical modifications, which can precisely coordinate gene expression and biological processes. There are several RNA base modifications, such as N6-methyladenosine (m6A), 5-methylcytosine (m5C), and pseudouridine (Ψ), etc. that play pivotal roles in fine-tuning gene expression in almost all eukaryotes and emerging evidences suggest that parasitic protists are no exception. In this review, we primarily focus on m6A, which is the most abundant epitranscriptomic mark and regulates numerous cellular processes, ranging from nuclear export, mRNA splicing, polyadenylation, stability, and translation. We highlight the universal features of spatiotemporal m6A RNA modifications in eukaryotic phylogeny, their homologs, and unique processes in 3 unicellular parasites-Plasmodium sp., Toxoplasma sp., and Trypanosoma sp. and some technological advances in this rapidly developing research area that can significantly improve our understandings of gene expression regulation in parasites.
Assuntos
Parasitos , RNA , Animais , RNA/metabolismo , Parasitos/genética , Parasitos/metabolismo , Regulação da Expressão Gênica , Processamento Pós-Transcricional do RNA , Eucariotos/genética , PoliadenilaçãoRESUMO
2´-O-methylation (Nm) is one of the most abundant modifications found in both mRNAs and noncoding RNAs. It contributes to many biological processes, such as the normal functioning of tRNA, the protection of mRNA against degradation by the decapping and exoribonuclease (DXO) protein, and the biogenesis and specificity of rRNA. Recent advancements in single-molecule sequencing techniques for long read RNA sequencing data offered by Oxford Nanopore technologies have enabled the direct detection of RNA modifications from sequencing data. In this study, we propose a bio-computational framework, Nm-Nano, for predicting the presence of Nm sites in direct RNA sequencing data generated from two human cell lines. The Nm-Nano framework integrates two supervised machine learning (ML) models for predicting Nm sites: Extreme Gradient Boosting (XGBoost) and Random Forest (RF) with K-mer embedding. Evaluation on benchmark datasets from direct RNA sequecing of HeLa and HEK293 cell lines, demonstrates high accuracy (99% with XGBoost and 92% with RF) in identifying Nm sites. Deploying Nm-Nano on HeLa and HEK293 cell lines reveals genes that are frequently modified with Nm. In HeLa cell lines, 125 genes are identified as frequently Nm-modified, showing enrichment in 30 ontologies related to immune response and cellular processes. In HEK293 cell lines, 61 genes are identified as frequently Nm-modified, with enrichment in processes like glycolysis and protein localization. These findings underscore the diverse regulatory roles of Nm modifications in metabolic pathways, protein degradation, and cellular processes. The source code of Nm-Nano can be freely accessed at https://github.com/Janga-Lab/Nm-Nano.
Assuntos
Aprendizado de Máquina , Análise de Sequência de RNA , Transcriptoma , Humanos , Metilação , Análise de Sequência de RNA/métodos , Células HeLa , Sequenciamento por Nanoporos/métodos , Células HEK293 , Biologia Computacional/métodos , Processamento Pós-Transcricional do RNA , Nanoporos , Software , RNA Mensageiro/genética , RNA Mensageiro/metabolismoRESUMO
BACKGROUND: Elucidating genome-wide structural variants including copy number variations (CNVs) have gained increased significance in recent times owing to their contribution to genetic diversity and association with important pathophysiological states. The present study aimed to elucidate the high-resolution CNV map of six different global buffalo breeds using whole genome resequencing data at two coverages (10X and 30X). Post-quality control, the sequence reads were aligned to the latest draft release of the Bubaline genome. The genome-wide CNVs were elucidated using a read-depth approach in CNVnator with different bin sizes. Adjacent CNVs were concatenated into copy number variation regions (CNVRs) in different breeds and their genomic coverage was elucidated. RESULTS: Overall, the average size of CNVR was lower at 30X coverage, providing finer details. Most of the CNVRs were either deletion or duplication type while the occurrence of mixed events was lesser in number on a comparative basis in all breeds. The average CNVR size was lower at 30X coverage (0.201 Mb) as compared to 10X (0.013 Mb) with the finest variants in Banni buffaloes. The maximum number of CNVs was observed in Murrah (2627) and Pandharpuri (25,688) at 10X and 30X coverages, respectively. Whereas the minimum number of CNVs were scored in Surti at both coverages (2092 and 17,373). On the other hand, the highest and lowest number of CNVRs were scored in Jaffarabadi (833 and 10,179 events) and Surti (783 and 7553 events) at both coverages. Deletion events overnumbered duplications in all breeds at both coverages. Gene profiling of common overlapped genes and longest CNVRs provided important insights into the evolutionary history of these breeds and indicate the genomic regions under selection in respective breeds. CONCLUSION: The present study is the first of its kind to elucidate the high-resolution CNV map in major buffalo populations using a read-depth approach on whole genome resequencing data. The results revealed important insights into the divergence of major global buffalo breeds along the evolutionary timescale.
Assuntos
Búfalos , Variações do Número de Cópias de DNA , Animais , Búfalos/genética , Genoma , Análise de Sequência de DNA , Genômica/métodosRESUMO
Pseudouridine is one of the most abundant RNA modifications, occurring when uridines are catalyzed by Pseudouridine synthase proteins. It plays an important role in many biological processes and has been reported to have application in drug development. Recently, the single-molecule sequencing techniques such as the direct RNA sequencing platform offered by Oxford Nanopore technologies have enabled direct detection of RNA modifications on the molecule being sequenced. In this study, we introduce a tool called Penguin that integrates several machine learning (ML) models to identify RNA Pseudouridine sites on Nanopore direct RNA sequencing reads. Pseudouridine sites were identified on single molecule sequencing data collected from direct RNA sequencing resulting in 723 K reads in Hek293 and 500 K reads in Hela cell lines. Penguin extracts a set of features from the raw signal measured by the Oxford Nanopore and the corresponding basecalled k-mer. Those features are used to train the predictors included in Penguin, which in turn, can predict whether the signal is modified by the presence of Pseudouridine sites in the testing phase. We have included various predictors in Penguin, including Support vector machines (SVM), Random Forest (RF), and Neural network (NN). The results on the two benchmark data sets for Hek293 and Hela cell lines show outstanding performance of Penguin either in random split testing or in independent validation testing. In random split testing, Penguin has been able to identify Pseudouridine sites with a high accuracy of 93.38% by applying SVM to Hek293 benchmark dataset. In independent validation testing, Penguin achieves an accuracy of 92.61% by training SVM with Hek293 benchmark dataset and testing it for identifying Pseudouridine sites on Hela benchmark dataset. Thus, Penguin outperforms the existing Pseudouridine predictors in the literature by 16 % higher accuracy than those predictors using independent validation testing. Employing penguin to predict Pseudouridine sites revealed a significant enrichment of "regulation of mRNA 3'-end processing" in Hek293 cell line and 'positive regulation of transcription from RNA polymerase II promoter involved in cellular response to chemical stimulus' in Hela cell line. Penguin software and models are available on GitHub at https://github.com/Janga-Lab/Penguin and can be readily employed for predicting Ψ sites from Nanopore direct RNA-sequencing datasets.
Assuntos
Sequenciamento por Nanoporos , Nanoporos , Spheniscidae , Animais , Células HEK293 , Células HeLa , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Pseudouridina/química , RNA/genética , Análise de Sequência de RNA/métodos , Spheniscidae/genética , Spheniscidae/metabolismoRESUMO
BACKGROUND: Recent discovery of the gene editing system - CRISPR (Clustered Regularly Interspersed Short Palindromic Repeats) associated proteins (Cas), has resulted in its widespread use for improved understanding of a variety of biological systems. Cas13, a lesser studied Cas protein, has been repurposed to allow for efficient and precise editing of RNA molecules. The Cas13 system utilizes base complementarity between a crRNA/sgRNA (crispr RNA or single guide RNA) and a target RNA transcript, to preferentially bind to only the target transcript. Unlike targeting the upstream regulatory regions of protein coding genes on the genome, the transcriptome is significantly more redundant, leading to many transcripts having wide stretches of identical nucleotide sequences. Transcripts also exhibit complex three-dimensional structures and interact with an array of RBPs (RNA Binding Proteins), both of which may impact the effectiveness of transcript depletion of target sequences. However, our understanding of the features and corresponding methods which can predict whether a specific sgRNA will effectively knockdown a transcript is very limited. RESULTS: Here we present a novel machine learning and computational tool, CASowary, to predict the efficacy of a sgRNA. We used publicly available RNA knockdown data from Cas13 characterization experiments for 555 sgRNAs targeting the transcriptome in HEK293 cells, in conjunction with transcriptome-wide protein occupancy information. Our model utilizes a Decision Tree architecture with a set of 112 sequence and target availability features, to classify sgRNA efficacy into one of four classes, based upon expected level of target transcript knockdown. After accounting for noise in the training data set, the noise-normalized accuracy exceeds 70%. Additionally, highly effective sgRNA predictions have been experimentally validated using an independent RNA targeting Cas system - CIRTS, confirming the robustness and reproducibility of our model's sgRNA predictions. Utilizing transcriptome wide protein occupancy map generated using POP-seq in HeLa cells against publicly available protein-RNA interaction map in Hek293 cells, we show that CASowary can predict high quality guides for numerous transcripts in a cell line specific manner. CONCLUSIONS: Application of CASowary to whole transcriptomes should enable rapid deployment of CRISPR/Cas13 systems, facilitating the development of therapeutic interventions linked with aberrations in RNA regulatory processes.
Assuntos
Sistemas CRISPR-Cas , RNA Guia de Cinetoplastídeos , Edição de Genes/métodos , Células HEK293 , Células HeLa , Humanos , RNA Guia de Cinetoplastídeos/genética , Reprodutibilidade dos TestesRESUMO
OBJECTIVES: There is limited knowledge about the role of esophageal microbiome in pediatric esophageal eosinophilia (EE). We aimed to characterize the esophageal microbiome in pediatric patients with and without EE. METHODS: In the present prospective study, esophageal mucosal biopsies were obtained from 41 children. Of these, 22 had normal esophageal mucosal biopsies ("healthy"), 6 children had reflux esophagitis (RE), 4 had proton pump inhibitor (PPi)-responsive esophageal eosinophilia (PPi-REE), and 9 had eosinophilic esophagitis (EoE). The microbiome composition was analyzed using 16S rRNA gene sequencing. The age median (range) in years for the healthy, RE, PPi-REE, and EoE group were 10 (1.5-18), 6 (2-15), 6.5 (5-15), and 9 (1.5-17), respectively. RESULTS: The bacterial phylum Actinobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Proteobacteria were the most predominant. The Epsilonproteobacteria, Betaproteobacteria, Flavobacteria, Fusobacteria, and Sphingobacteria class were underrepresented across groups. The Vibrionales was predominant in healthy and EoE group but lower in RE and PPi-REE groups. The genus Streptococcus, Rahnella, and Leptotrichia explained 29.65% of the variation in the data with an additional 10.86% variation in the data was explained by Microbacterium, Prevotella, and Vibrio genus. The healthy group had a higher diversity and richness index compared to other groups, but this was not statistically different. CONCLUSIONS: The pediatric esophagus has an abundant and diverse microbiome, both in the healthy and diseased states. The healthy group had a higher, but not significantly different, diversity and richness index compared to other groups.
Assuntos
Esofagite Eosinofílica , Esofagite Péptica , Microbiota , Criança , Enterite , Eosinofilia , Esofagite Eosinofílica/patologia , Gastrite , Humanos , Estudos Prospectivos , Inibidores da Bomba de Prótons/uso terapêutico , RNA Ribossômico 16S/genéticaRESUMO
BACKGROUND: With advancements in omics technologies, the range of biological processes where long non-coding RNAs (lncRNAs) are involved, is expanding extensively, thereby generating the need to develop lncRNA annotation resources. Although, there are a plethora of resources for annotating genes, despite the extensive corpus of lncRNA literature, the available resources with lncRNA ontology annotations are rare. RESULTS: We present a lncRNA annotation extractor and repository (Lantern), developed using PubMed's abstract retrieval engine and NCBO's recommender annotation system. Lantern's annotations were benchmarked against lncRNAdb's manually curated free text. Benchmarking analysis suggested that Lantern has a recall of 0.62 against lncRNAdb for 182 lncRNAs and precision of 0.8. Additionally, we also annotated lncRNAs with multiple omics annotations, including predicted cis-regulatory TFs, interactions with RBPs, tissue-specific expression profiles, protein co-expression networks, coding potential, sub-cellular localization, and SNPs for ~ 11,000 lncRNAs in the human genome, providing a one-stop dynamic visualization platform. CONCLUSIONS: Lantern integrates a novel, accurate semi-automatic ontology annotation engine derived annotations combined with a variety of multi-omics annotations for lncRNAs, to provide a central web resource for dissecting the functional dynamics of long non-coding RNAs and to facilitate future hypothesis-driven experiments. The annotation pipeline and a web resource with current annotations for human lncRNAs are freely available on sysbio.lab.iupui.edu/lantern.
Assuntos
RNA Longo não Codificante , Genoma Humano , Humanos , Anotação de Sequência Molecular , RNA Longo não Codificante/genéticaRESUMO
BACKGROUND: Direct-sequencing technologies, such as Oxford Nanopore's, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. RESULT: Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. CONCLUSIONS: Sequoia's interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia .
Assuntos
Sequenciamento por Nanoporos , Nanoporos , Sequoia , Células HeLa , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , SoftwareRESUMO
Several protein-RNA cross linking protocols have been established in recent years to delineate the molecular interaction of an RNA Binding Protein (RBP) and its target RNAs. However, functional dissection of the role of the RBP binding sites in modulating the post-transcriptional fate of the target RNA remains challenging. CRISPR/Cas9 genome editing system is being commonly employed to perturb both coding and noncoding regions in the genome. With the advancements in genome-scale CRISPR/Cas9 screens, it is now possible to not only perturb specific binding sites but also probe the global impact of protein-RNA interaction sites across cell types. Here, we present SliceIt (http://sliceit.soic.iupui.edu/), a database of in silico sgRNA (single guide RNA) library to facilitate conducting such high throughput screens. SliceIt comprises of ~4.8 million unique sgRNAs with an estimated range of 2-8 sgRNAs designed per RBP binding site, for eCLIP experiments of >100 RBPs in HepG2 and K562 cell lines from the ENCODE project. SliceIt provides a user friendly environment, developed using advanced search engine framework, Elasticsearch. It is available in both table and genome browser views facilitating the easy navigation of RBP binding sites, designed sgRNAs, exon expression levels across 53 human tissues along with prevalence of SNPs and GWAS hits on binding sites. Exon expression profiles enable examination of locus specific changes proximal to the binding sites. Users can also upload custom tracks of various file formats directly onto genome browser, to navigate additional genomic features in the genome and compare with other types of omics profiles. All the binding site-centric information is dynamically accessible via "search by gene", "search by coordinates" and "search by RBP" options and readily available to download. Validation of the sgRNA library in SliceIt was performed by selecting RBP binding sites in Lipt1 gene and designing sgRNAs. Effect of CRISPR/Cas9 perturbations on the selected binding sites in HepG2 cell line, was confirmed based on altered proximal exon expression levels using qPCR, further supporting the utility of the resource to design experiments for perturbing protein-RNA interaction networks. Thus, SliceIt provides a one-stop repertoire of guide RNA library to perturb RBP binding sites, along with several layers of functional information to design both low and high throughput CRISPR/Cas9 screens, for studying the phenotypes and diseases associated with RBP binding sites.
Assuntos
Sistemas CRISPR-Cas/genética , Edição de Genes/métodos , Genômica/métodos , Genoma Humano/genética , Humanos , RNA Guia de Cinetoplastídeos/genéticaRESUMO
BACKGROUND: Bcl6 is required for the development of T follicular helper cells and T follicular regulatory (Tfr) cells that regulate germinal center responses. Bcl6 also affects the function of regulatory T (Treg) cells. OBJECTIVE: The goal of this study was to define the functions of Bcl6 in Treg cells, including Tfr cells, in the context of allergic airway inflammation. METHODS: We used a model of house dust mite sensitization to challenge wild-type, Bcl6fl/fl Foxp3-Cre, and Prdm1 (Blimp1)fl/fl Foxp3-Cre mice to study the reciprocal roles of Bcl6 and Blimp1 in allergic airway inflammation. RESULTS: In the house dust mite model, Tfr cells repress the production of IgE and Bcl6+ Treg cells suppress the generation of type 2 cytokine-producing cells in the lungs. In mice with Bcl6-deficient Treg cells, twice as many ST2+ (IL-33R+) Treg cells develop as are observed in wild-type mice. ST2+ Treg cells in the context of allergic airway inflammation are Blimp1 dependent, express type 2 cytokines, and share features of visceral adipose tissue Treg cells. Bcl6-deficient Treg cells are more susceptible, and Blimp1-deficient Treg cells are resistant, to acquiring the ST2+ Treg-cell phenotype in vitro and in vivo in response to IL-33. Bcl6-deficient ST2+ Treg cells, but not Bcl6-deficient ST2+ conventional T cells, strongly promote allergic airway inflammation when transferred into recipient mice. Lastly, ST2 is required for the exacerbated allergic airway inflammation in Bcl6fl/fl Foxp3-Cre mice. CONCLUSIONS: During allergic airway inflammation, Bcl6 and Blimp1 play dual roles in regulating Tfr-cell activity in the germinal center and in the development of ST2+ Treg cells that promote type 2 cytokine responses.
Assuntos
Centro Germinativo/imunologia , Hipersensibilidade/imunologia , Pneumonia/imunologia , Fator 1 de Ligação ao Domínio I Regulador Positivo/metabolismo , Proteínas Proto-Oncogênicas c-bcl-6/metabolismo , Linfócitos T Reguladores/imunologia , Células Th2/imunologia , Transferência Adotiva , Animais , Antígenos de Dermatophagoides/imunologia , Diferenciação Celular , Células Cultivadas , Citocinas/metabolismo , Humanos , Proteína 1 Semelhante a Receptor de Interleucina-1/metabolismo , Ativação Linfocitária , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Knockout , Fator 1 de Ligação ao Domínio I Regulador Positivo/genética , Proteínas Proto-Oncogênicas c-bcl-6/genética , PyroglyphidaeRESUMO
Transcription factors (TFs) and histone octamers are two abundant classes of DNA binding proteins that coordinate the transcriptional program in cells. Detailed studies of individual TFs have shown that TFs bind to nucleosome-occluded DNA sequences and induce nucleosome disruption/repositioning, while recent global studies suggest this is not the only mechanism used by all TFs. We have analyzed to what extent the intrinsic DNA binding preferences of TFs and histones play a role in determining nucleosome occupancy, in addition to nonintrinsic factors such as the enzymatic activity of chromatin remodelers. The majority of TFs in budding yeast have an intrinsic sequence preference overlapping with nucleosomal histones. TFs with intrinsic DNA binding properties highly correlated with those of histones tend to be associated with gene activation and might compete with histones to bind to genomic DNA. Consistent with this, we show that activators induce more nucleosome disruption upon transcriptional activation than repressors.
Assuntos
Proteínas de Ligação a DNA/metabolismo , Regulação Fúngica da Expressão Gênica , Nucleossomos/metabolismo , Ativação Transcricional , Sítios de Ligação , Ligação Competitiva , Cromatina/metabolismo , Histonas/química , Histonas/metabolismo , Modelos Biológicos , Regiões Promotoras Genéticas , Ligação Proteica , Saccharomyces cerevisiae/metabolismo , Transcrição GênicaRESUMO
The outbreak of a novel coronavirus SARS-CoV-2 responsible for the COVID-19 pandemic has caused a worldwide public health emergency. Due to the constantly evolving nature of the coronaviruses, SARS-CoV-2-mediated alterations on post-transcriptional gene regulations across human tissues remain elusive. In this study, we analyzed publicly available genomic datasets to systematically dissect the crosstalk and dysregulation of the human post-transcriptional regulatory networks governed by RNA-binding proteins (RBPs) and micro-RNAs (miRs) due to SARS-CoV-2 infection. We uncovered that 13 out of 29 SARS-CoV-2-encoded proteins directly interacted with 51 human RBPs, of which the majority of them were abundantly expressed in gonadal tissues and immune cells. We further performed a functional analysis of differentially expressed genes in mock-treated versus SARS-CoV-2-infected lung cells that revealed enrichment for the immune response, cytokine-mediated signaling, and metabolism-associated genes. This study also characterized the alternative splicing events in SARS-CoV-2-infected cells compared to the control, demonstrating that skipped exons and mutually exclusive exons were the most abundant events that potentially contributed to differential outcomes in response to the viral infection. A motif enrichment analysis on the RNA genomic sequence of SARS-CoV-2 clearly revealed the enrichment for RBPs such as SRSFs, PCBPs, ELAVs, and HNRNPs, suggesting the sponging of RBPs by the SARS-CoV-2 genome. A similar analysis to study the interactions of miRs with SARS-CoV-2 revealed functionally important miRs that were highly expressed in immune cells, suggesting that these interactions may contribute to the progression of the viral infection and modulate the host immune response across other human tissues. Given the need to understand the interactions of SARS-CoV-2 with key post-transcriptional regulators in the human genome, this study provided a systematic computational analysis to dissect the role of dysregulated post-transcriptional regulatory networks controlled by RBPs and miRs across tissue types during a SARS-CoV-2 infection.
Assuntos
Betacoronavirus/genética , Betacoronavirus/metabolismo , Infecções por Coronavirus/virologia , Redes Reguladoras de Genes , MicroRNAs/genética , Pneumonia Viral/virologia , Processamento Pós-Transcricional do RNA , Proteínas de Ligação a RNA/metabolismo , COVID-19 , Regulação da Expressão Gênica , Genoma Viral , Humanos , MicroRNAs/metabolismo , Pandemias , Mapas de Interação de Proteínas , Proteínas de Ligação a RNA/genética , SARS-CoV-2RESUMO
BACKGROUND: RNA-binding proteins (RBPs) are crucial in modulating RNA metabolism in eukaryotes thereby controlling an extensive network of RBP-RNA interactions. Although previous studies on the conservation of RBP targets have been carried out in lower eukaryotes such as yeast, relatively little is known about the extent of conservation of the binding sites of RBPs across mammalian species. RESULTS: In this study, we employ CLIP-seq datasets for 60 human RBPs and demonstrate that most binding sites for a third of these RBPs are conserved in at least 50% of the studied vertebrate species. Across the studied RBPs, binding sites were found to exhibit a median conservation of 58%, ~ 20% higher than random genomic locations, suggesting a significantly higher preservation of RBP-RNA interaction networks across vertebrates. RBP binding sites were highly conserved across primates with weak conservation profiles in birds and fishes. We also note that phylogenetic relationship between members of an RBP family does not explain the extent of conservation of their binding sites across species. Multivariate analysis to uncover features contributing to differences in the extents of conservation of binding sites across RBPs revealed RBP expression level and number of post-transcriptional targets to be the most prominent factors. Examination of the location of binding sites at the gene level confirmed that binding sites occurring on the 3' region of a gene are highly conserved across species with 90% of the RBPs exhibiting a significantly higher conservation of binding sites in 3' regions of a gene than those occurring in the 5'. Gene set enrichment analysis on the extent of conservation of binding sites to identify significantly associated human phenotypes revealed an enrichment for multiple developmental abnormalities. CONCLUSIONS: Our results suggest that binding sites of human RBPs are highly conserved across primates with weak conservation profiles in lower vertebrates and evolutionary relationship between members of an RBP family does not explain the extent of conservation of their binding sites. Expression level and number of targets of an RBP are important factors contributing to the differences in the extent of conservation of binding sites. RBP binding sites on 3' ends of a gene are the most conserved across species. Phenotypic analysis on the extent of conservation of binding sites revealed the importance of lineage-specific developmental events in post-transcriptional regulatory network evolution.
Assuntos
Mamíferos/genética , Mapas de Interação de Proteínas/genética , Proteínas de Ligação a RNA/genética , Animais , Sítios de Ligação/genética , Sequenciamento de Cromatina por Imunoprecipitação , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Genoma/genética , Humanos , Fenótipo , Filogenia , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de RNA , Vertebrados/genéticaRESUMO
RNA-binding proteins (RBPs) control the regulation of gene expression in eukaryotic genomes at post-transcriptional level by binding to their cognate RNAs. Although several variants of CLIP (crosslinking and immunoprecipitation) protocols are currently available to study the global protein-RNA interaction landscape at single-nucleotide resolution in a cell, currently there are very few tools that can facilitate understanding and dissecting the functional associations of RBPs from the resulting binding maps. Here, we present Seten, a web-based and command line tool, which can identify and compare processes, phenotypes, and diseases associated with RBPs from condition-specific CLIP-seq profiles. Seten uses BED files resulting from most peak calling algorithms, which include scores reflecting the extent of binding of an RBP on the target transcript, to provide both traditional functional enrichment as well as gene set enrichment results for a number of gene set collections including BioCarta, KEGG, Reactome, Gene Ontology (GO), Human Phenotype Ontology (HPO), and MalaCards Disease Ontology for several organisms including fruit fly, human, mouse, rat, worm, and yeast. It also provides an option to dynamically compare the associated gene sets across data sets as bubble charts, to facilitate comparative analysis. Benchmarking of Seten using eCLIP data for IGF2BP1, SRSF7, and PTBP1 against their corresponding CRISPR RNA-seq in K562 cells as well as randomized negative controls, demonstrated that its gene set enrichment method outperforms functional enrichment, with scores significantly contributing to the discovery of true annotations. Comparative performance analysis using these CRISPR control data sets revealed significantly higher precision and comparable recall to that observed using ChIP-Enrich. Seten's web interface currently provides precomputed results for about 200 CLIP-seq data sets and both command line as well as web interfaces can be used to analyze CLIP-seq data sets. We highlight several examples to show the utility of Seten for rapid profiling of various CLIP-seq data sets. Seten is available on http://www.iupui.edu/â¼sysbio/seten/.
Assuntos
Sítios de Ligação , Imunoprecipitação da Cromatina , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de RNA , Software , Linhagem Celular , Análise por Conglomerados , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Ontologia Genética , Estudos de Associação Genética/métodos , Humanos , Anotação de Sequência Molecular , Fenótipo , Fluxo de TrabalhoRESUMO
IL-2 is a pleiotropic cytokine that promotes the differentiation of Th cell subsets, including Th1, Th2, and Th9 cells, but it impairs the development of Th17 and T follicular helper cells. Although IL-2 is produced by all polarized Th subsets to some level, how it impacts cytokine production when effector T cells are restimulated is unknown. We show in this article that Golgi transport inhibitors (GTIs) blocked IL-9 production. Mechanistically, GTIs blocked secretion of IL-2 that normally feeds back in a paracrine manner to promote STAT5 activation and IL-9 production. IL-2 feedback had no effect on Th1- or Th17-signature cytokine production, but it promoted Th2- and Th9-associated cytokine expression. These data suggest that the use of GTIs results in an underestimation of the presence of type 2 cytokine-secreting cells and highlight IL-2 as a critical component in optimal cytokine production by Th2 and Th9 cells in vitro and in vivo.
Assuntos
Citocinas/biossíntese , Interleucina-2/metabolismo , Interleucina-9/biossíntese , Comunicação Parácrina , Células Th2/imunologia , Animais , Brefeldina A/farmacologia , Diferenciação Celular , Citocinas/imunologia , Interleucina-9/antagonistas & inibidores , Interleucina-9/imunologia , Ativação Linfocitária , Camundongos , Camundongos Endogâmicos C57BL , Monensin/farmacologia , Inibidores da Síntese de Proteínas/farmacologia , Ionóforos de Próton/farmacologia , Fator de Transcrição STAT5/metabolismo , Células Th1/imunologia , Células Th17/imunologiaRESUMO
Advances in sequencing have facilitated nucleotide-resolution genome-wide transcriptomic profiles across multiple mouse eye tissues. However, these RNA sequencing (RNA-seq) based eye developmental transcriptomes are not organized for easy public access, making any further analysis challenging. Here, we present a new database "Express" (http://www.iupui.edu/â¼sysbio/express/) that unifies various mouse lens and retina RNA-seq data and provides user-friendly visualization of the transcriptome to facilitate gene discovery in the eye. We obtained RNA-seq data encompassing 7 developmental stages of lens in addition to that on isolated lens epithelial and fibers, as well as on 11 developmental stages of retina/isolated retinal rod photoreceptor cells from publicly available wild-type mouse datasets. These datasets were pre-processed, aligned, quantified and normalized for expression levels of known and novel transcripts using a unified expression quantification framework. Express provides heatmap and browser view allowing easy navigation of the genomic organization of transcripts or gene loci. Further, it allows users to search candidate genes and export both the visualizations and the embedded data to facilitate downstream analysis. We identified total of >81,000 transcripts in the lens and >178,000 transcripts in the retina across all the included developmental stages. This analysis revealed that a significant number of the retina-expressed transcripts are novel. Expression of several transcripts in the lens and retina across multiple developmental stages was independently validated by RT-qPCR for established genes such as Pax6 and Lhx2 as well as for new candidates such as Elavl4, Rbm5, Pabpc1, Tia1 and Tubb2b. Thus, Express serves as an effective portal for analyzing pruned RNA-seq expression datasets presently collected for the lens and retina. It will allow a wild-type context for the detailed analysis of targeted gene-knockout mouse ocular defect models and facilitate the prioritization of candidate genes from Exome-seq data of eye disease patients.
Assuntos
Bases de Dados Factuais , Proteínas do Olho/metabolismo , Perfilação da Expressão Gênica , Cristalino/metabolismo , RNA Mensageiro/metabolismo , Retina/metabolismo , Transcriptoma , Animais , Camundongos , Análise de Sequência de RNARESUMO
RNA Binding Proteins (RBPs) are a class of post-transcriptional regulatory molecules which are increasingly documented to be dysfunctional in cancer genomes. However, our current understanding of these alterations is limited. Here, we delineate the mutational landscape of â¼1300 RBPs in â¼6000 cancer genomes. Our analysis revealed that RBPs have an average of â¼3 mutations per Mb across 26 cancer types. We identified 281 RBPs to be enriched for mutations (GEMs) in at least one cancer type. GEM RBPs were found to undergo frequent frameshift and inframe deletions as well as missense, nonsense and silent mutations when compared to those that are not enriched for mutations. Functional analysis of these RBPs revealed the enrichment of pathways associated with apoptosis, splicing and translation. Using the OncodriveFM framework, we also identified more than 200 candidate driver RBPs that were found to accumulate functionally impactful mutations in at least one cancer. Expression levels of 15% of these driver RBPs exhibited significant difference, when transcriptome groups with and without deleterious mutations were compared. Functional interaction network of the driver RBPs revealed the enrichment of spliceosomal machinery, suggesting a plausible mechanism for tumorogenesis while network analysis of the protein interactions between RBPs unambiguously revealed the higher degree, betweenness and closeness centrality for driver RBPs compared to non-drivers. Analysis to reveal cancer-specific Ribonucleoprotein (RNP) mutational hotspots showed extensive rewiring even among common drivers between cancer types. Knockdown experiments on pan-cancer drivers such as SF3B1 and PRPF8 in breast cancer cell lines, revealed cancer subtype specific functions like selective stem cell features, indicating a plausible means for RBPs to mediate cancer-specific phenotypes. Hence, this study would form a foundation to uncover the contribution of the mutational spectrum of RBPs in dysregulating the post-transcriptional regulatory networks in different cancer types.
Assuntos
Carcinogênese/genética , Neoplasias/genética , Proteínas de Ligação a RNA/genética , Transcriptoma/genética , Regulação Neoplásica da Expressão Gênica/genética , Técnicas de Silenciamento de Genes , Genoma Humano/genética , Humanos , Mutação , Neoplasias/patologia , Fosfoproteínas/genética , Splicing de RNA/genética , Fatores de Processamento de RNA/genética , Spliceossomos/genéticaRESUMO
Breast cancer (BC) is a highly heterogeneous disease, both at the pathological and molecular level, and several chromatin-associated proteins play crucial roles in BC initiation and progression. Here, we demonstrate the role of PSIP1 (PC4 and SF2 interacting protein)/p75 (LEDGF) in BC progression. PSIP1/p75, previously identified as a chromatin-adaptor protein, is found to be upregulated in basal-like/triple negative breast cancer (TNBC) patient samples and cell lines. Immunohistochemistry in tissue arrays showed elevated levels of PSIP1 in metastatic invasive ductal carcinoma. Survival data analyses revealed that the levels of PSIP1 showed a negative association with TNBC patient survival. Depletion of PSIP1/p75 significantly reduced the tumorigenicity and metastatic properties of TNBC cell lines while its over-expression promoted tumorigenicity. Further, gene expression studies revealed that PSIP1 regulates the expression of genes controlling cell-cycle progression, cell migration and invasion. Finally, by interacting with RNA polymerase II, PSIP1/p75 facilitates the association of RNA pol II to the promoter of cell cycle genes and thereby regulates their transcription. Our findings demonstrate an important role of PSIP1/p75 in TNBC tumorigenicity by promoting the expression of genes that control the cell cycle and tumor metastasis.
Assuntos
Proteínas Adaptadoras de Transdução de Sinal/genética , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Ciclo Celular/genética , Fatores de Transcrição/genética , Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Neoplasias da Mama/mortalidade , Linhagem Celular Tumoral , Proliferação de Células/genética , Cromatina/genética , Cromatina/metabolismo , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Oncogenes , Regiões Promotoras Genéticas , RNA Polimerase II/genética , RNA Polimerase II/metabolismo , Análise Serial de Tecidos , Fatores de Transcrição/metabolismo , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/patologiaRESUMO
BACKGROUND: Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION(®) sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage. RESULTS: In this study, we benchmarked available assembler algorithms to find an appropriate framework that can efficiently assemble Nanopore sequenced reads. To address this, we employed genome-scale Nanopore sequenced datasets available for E. coli and yeast genomes respectively. In order to comprehensively evaluate multiple algorithmic frameworks, we included assemblers based on de Bruijn graphs (Velvet and ABySS), Overlap Layout Consensus (OLC) (Celera) and Greedy extension (SSAKE) approaches. We analyzed the quality, accuracy of the assemblies as well as the computational performance of each of the assemblers included in our benchmark. Our analysis unveiled that OLC-based algorithm, Celera, could generate a high quality assembly with ten times higher N50 & mean contig values as well as one-fifth the number of total number of contigs compared to other tools. Celera was also found to exhibit an average genome coverage of 12 % in E. coli dataset and 70 % in Yeast dataset as well as relatively lesser run times. In contrast, de Bruijn graph based assemblers Velvet and ABySS generated the assemblies of moderate quality, in less time when there is no limitation on the memory allocation, while greedy extension based algorithm SSAKE generated an assembly of very poor quality but with genome coverage of 90 % on yeast dataset. CONCLUSION: OLC can be considered as a favorable algorithmic framework for the development of assembler tools for Nanopore-based data, followed by de Bruijn based algorithms as they consume relatively less or similar run times as OLC-based algorithms for generating assembly, irrespective of the memory allocated for the task. However, few improvements must be made to the existing de Bruijn implementations in order to generate an assembly with reasonable quality. Our findings should help in stimulating the development of novel assemblers for handling Nanopore sequence data.
Assuntos
Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Benchmarking , Bases de Dados Factuais , Escherichia coli/genética , Genoma Bacteriano/genética , Genoma Fúngico/genética , NanoporosRESUMO
De novo or acquired resistance to endocrine therapy limits its utility in a significant number of estrogen receptor-positive (ER-positive) breast cancers. It is crucial to identify novel targets for therapeutic intervention and improve the success of endocrine therapies. Splicing factor 3b, subunit 1 (SF3B1) mutations are described in luminal breast cancer albeit in low frequency. In this study, we evaluated the role of SF3B1 and SF3B3, critical parts of the SF3b splicing complex, in ER-positive endocrine resistance. To ascertain the role of SF3B1/SF3B3 in endocrine resistance, their expression levels were evaluated in ER-positive/endocrine-resistant cell lines (MCF-7/LCC2 and MCF-7/LCC9) using a real-time quantitative reverse transcription PCR (qRT-PCR). To further determine their clinical relevance, expression analysis was performed in a cohort of 60 paraffin-embedded ER-positive, node-negative breast carcinomas with low, intermediate, and high Oncotype DX recurrence scores. Expression levels of SF3B1 and SF3B3 and their prognostic value were validated in large cohorts using publicly available gene expression data sets including The Cancer Genome Atlas. SF3B1 and SF3B3 levels were significantly increased in ERα-positive cells with acquired tamoxifen (MCF-7/LCC2; both P<0.0002) and fulvestrant/tamoxifen resistance (MCF-7/LCC9; P=0.008 for SF3B1 and P=0.0006 for SF3B3). Expression levels of both MCF-7/LCC2 and MCF-7/LCC9 were not affected by additional treatments with E2 and/or tamoxifen. Furthermore, qRT-PCR analysis confirmed that SF3B3 expression is significantly upregulated in Oncotype DX high-risk groups when compared with low risk (P=0.019). Similarly, in publicly available breast cancer gene expression data sets, overexpression of SF3B3, but not SF3B1, was significantly correlated with overall survival. Furthermore, the correlation was significant in ER-positive, but not in ER-negative tumors.This is the first study to document the role of SF3B3 in endocrine resistance and prognosis in ER-positive breast cancer. Potential strategies for therapeutic targeting of the splicing mechanism(s) need to be evaluated.