Search | VHL Regional Portal

1.

Nm-Nano: a machine learning framework for transcriptome-wide single-molecule mapping of 2´-O-methylation (Nm) sites in nanopore direct RNA sequencing datasets.

Hassan, Doaa; Ariyur, Aditya; Daulatabad, Swapna Vidhur; Mir, Quoseena; Janga, Sarath Chandra.

RNA Biol ; 21(1): 1-15, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38758523

ABSTRACT

2´-O-methylation (Nm) is one of the most abundant modifications found in both mRNAs and noncoding RNAs. It contributes to many biological processes, such as the normal functioning of tRNA, the protection of mRNA against degradation by the decapping and exoribonuclease (DXO) protein, and the biogenesis and specificity of rRNA. Recent advancements in single-molecule sequencing techniques for long read RNA sequencing data offered by Oxford Nanopore technologies have enabled the direct detection of RNA modifications from sequencing data. In this study, we propose a bio-computational framework, Nm-Nano, for predicting the presence of Nm sites in direct RNA sequencing data generated from two human cell lines. The Nm-Nano framework integrates two supervised machine learning (ML) models for predicting Nm sites: Extreme Gradient Boosting (XGBoost) and Random Forest (RF) with K-mer embedding. Evaluation on benchmark datasets from direct RNA sequecing of HeLa and HEK293 cell lines, demonstrates high accuracy (99% with XGBoost and 92% with RF) in identifying Nm sites. Deploying Nm-Nano on HeLa and HEK293 cell lines reveals genes that are frequently modified with Nm. In HeLa cell lines, 125 genes are identified as frequently Nm-modified, showing enrichment in 30 ontologies related to immune response and cellular processes. In HEK293 cell lines, 61 genes are identified as frequently Nm-modified, with enrichment in processes like glycolysis and protein localization. These findings underscore the diverse regulatory roles of Nm modifications in metabolic pathways, protein degradation, and cellular processes. The source code of Nm-Nano can be freely accessed at https://github.com/Janga-Lab/Nm-Nano.

Subject(s)

Machine Learning , Sequence Analysis, RNA , Transcriptome , Humans , Methylation , Sequence Analysis, RNA/methods , HeLa Cells , Nanopore Sequencing/methods , HEK293 Cells , Computational Biology/methods , RNA Processing, Post-Transcriptional , Nanopores , Software , RNA, Messenger/genetics , RNA, Messenger/metabolism

2.

Datawiz-IN: Summer Research Experience for Health Data Science Training.

Afreen, Sadia; Krohannon, Alexander; Purkayastha, Saptarshi; Janga, Sarath Chandra.

Res Sq ; 2024 Mar 29.

Article in English | MEDLINE | ID: mdl-38585996

ABSTRACT

Background: Good science necessitates diverse perspectives to guide its progress. This study introduces Datawiz-IN, an educational initiative that fosters diversity and inclusion in AI skills training and research. Supported by a National Institutes of Health R25 grant from the National Library of Medicine, Datawiz-IN provided a comprehensive data science and machine learning research experience to students from underrepresented minority groups in medicine and computing. Methods: The program evaluation triangulated quantitative and qualitative data to measure representation, innovation, and experience. Diversity gains were quantified using demographic data analysis. Computational projects were systematically reviewed for research productivity. A mixed-methods survey gauged participant perspectives on skills gained, support quality, challenges faced, and overall sentiments. Results: The first cohort of 14 students in Summer 2023 demonstrated quantifiable increases in representation, with greater participation of women and minorities, evidencing the efficacy of proactive efforts to engage talent typically excluded from these fields. The student interns conducted innovative projects that elucidated disease mechanisms, enhanced clinical decision support systems, and analyzed health disparities. Conclusion: By illustrating how purposeful inclusion catalyzes innovation, Datawiz-IN offers a model for developing AI systems and research that reflect true diversity. Realizing the full societal benefits of AI requires sustaining pathways for historically excluded voices to help shape the field.

3.

Monoallelically expressed noncoding RNAs form nucleolar territories on NOR-containing chromosomes and regulate rRNA expression.

Hao, Qinyu; Liu, Minxue; Daulatabad, Swapna Vidhur; Gaffari, Saba; Song, You Jin; Srivastava, Rajneesh; Bhaskar, Shivang; Moitra, Anurupa; Mangan, Hazel; Tseng, Elizabeth; Gilmore, Rachel B; Frier, Susan M; Chen, Xin; Wang, Chengliang; Huang, Sui; Chamberlain, Stormy; Jin, Hong; Korlach, Jonas; McStay, Brian; Sinha, Saurabh; Janga, Sarath Chandra; Prasanth, Supriya G; Prasanth, Kannanganattu V.

Elife ; 132024 Jan 19.

Article in English | MEDLINE | ID: mdl-38240312

ABSTRACT

Out of the several hundred copies of rRNA genes arranged in the nucleolar organizing regions (NOR) of the five human acrocentric chromosomes, ~50% remain transcriptionally inactive. NOR-associated sequences and epigenetic modifications contribute to the differential expression of rRNAs. However, the mechanism(s) controlling the dosage of active versus inactive rRNA genes within each NOR in mammals is yet to be determined. We have discovered a family of ncRNAs, SNULs (Single NUcleolus Localized RNA), which form constrained sub-nucleolar territories on individual NORs and influence rRNA expression. Individual members of the SNULs monoallelically associate with specific NOR-containing chromosomes. SNULs share sequence similarity to pre-rRNA and localize in the sub-nucleolar compartment with pre-rRNA. Finally, SNULs control rRNA expression by influencing pre-rRNA sorting to the DFC compartment and pre-rRNA processing. Our study discovered a novel class of ncRNAs influencing rRNA expression by forming constrained nucleolar territories on individual NORs.

Subject(s)

Nucleolus Organizer Region , RNA Precursors , Humans , Animals , Nucleolus Organizer Region/genetics , Nucleolus Organizer Region/metabolism , RNA Precursors/genetics , RNA Precursors/metabolism , Cell Nucleolus/genetics , Cell Nucleolus/metabolism , RNA, Ribosomal/genetics , RNA, Ribosomal/metabolism , Chromosomes, Human/metabolism , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Mammals/genetics

4.

Experimental and computational methods for studying the dynamics of RNA-RNA interactions in SARS-COV2 genomes.

Srivastava, Mansi; Dukeshire, Matthew R; Mir, Quoseena; Omoru, Okiemute Beatrice; Manzourolajdad, Amirhossein; Janga, Sarath Chandra.

Brief Funct Genomics ; 23(1): 46-54, 2024 Jan 18.

Article in English | MEDLINE | ID: mdl-36752040

ABSTRACT

Long-range ribonucleic acid (RNA)-RNA interactions (RRI) are prevalent in positive-strand RNA viruses, including Beta-coronaviruses, and these take part in regulatory roles, including the regulation of sub-genomic RNA production rates. Crosslinking of interacting RNAs and short read-based deep sequencing of resulting RNA-RNA hybrids have shown that these long-range structures exist in severe acute respiratory syndrome coronavirus (SARS-CoV)-2 on both genomic and sub-genomic levels and in dynamic topologies. Furthermore, co-evolution of coronaviruses with their hosts is navigated by genetic variations made possible by its large genome, high recombination frequency and a high mutation rate. SARS-CoV-2's mutations are known to occur spontaneously during replication, and thousands of aggregate mutations have been reported since the emergence of the virus. Although many long-range RRIs have been experimentally identified using high-throughput methods for the wild-type SARS-CoV-2 strain, evolutionary trajectory of these RRIs across variants, impact of mutations on RRIs and interaction of SARS-CoV-2 RNAs with the host have been largely open questions in the field. In this review, we summarize recent computational tools and experimental methods that have been enabling the mapping of RRIs in viral genomes, with a specific focus on SARS-CoV-2. We also present available informatics resources to navigate the RRI maps and shed light on the impact of mutations on the RRI space in viral genomes. Investigating the evolution of long-range RNA interactions and that of virus-host interactions can contribute to the understanding of new and emerging variants as well as aid in developing improved RNA therapeutics critical for combating future outbreaks.

Subject(s)

COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/genetics , RNA, Viral/genetics , Mutation/genetics , Genome, Viral

5.

Inflammation primes the kidney for recovery by activating AZIN1 A-to-I editing.

Heruye, Segewkal; Myslinski, Jered; Zeng, Chao; Zollman, Amy; Makino, Shinichi; Nanamatsu, Azuma; Mir, Quoseena; Janga, Sarath Chandra; Doud, Emma H; Eadon, Michael T; Maier, Bernhard; Hamada, Michiaki; Tran, Tuan M; Dagher, Pierre C; Hato, Takashi.

bioRxiv ; 2023 Nov 09.

Article in English | MEDLINE | ID: mdl-37986799

ABSTRACT

The progression of kidney disease varies among individuals, but a general methodology to quantify disease timelines is lacking. Particularly challenging is the task of determining the potential for recovery from acute kidney injury following various insults. Here, we report that quantitation of post-transcriptional adenosine-to-inosine (A-to-I) RNA editing offers a distinct genome-wide signature, enabling the delineation of disease trajectories in the kidney. A well-defined murine model of endotoxemia permitted the identification of the origin and extent of A-to-I editing, along with temporally discrete signatures of double-stranded RNA stress and Adenosine Deaminase isoform switching. We found that A-to-I editing of Antizyme Inhibitor 1 (AZIN1), a positive regulator of polyamine biosynthesis, serves as a particularly useful temporal landmark during endotoxemia. Our data indicate that AZIN1 A-to-I editing, triggered by preceding inflammation, primes the kidney and activates endogenous recovery mechanisms. By comparing genetically modified human cell lines and mice locked in either A-to-I edited or uneditable states, we uncovered that AZIN1 A-to-I editing not only enhances polyamine biosynthesis but also engages glycolysis and nicotinamide biosynthesis to drive the recovery phenotype. Our findings implicate that quantifying AZIN1 A-to-I editing could potentially identify individuals who have transitioned to an endogenous recovery phase. This phase would reflect their past inflammation and indicate their potential for future recovery.

6.

Read-depth based approach on whole genome resequencing data reveals important insights into the copy number variation (CNV) map of major global buffalo breeds.

Ahmad, Sheikh Firdous; Chandrababu Shailaja, Celus; Vaishnav, Sakshi; Kumar, Amit; Gaur, Gyanendra Kumar; Janga, Sarath Chandra; Ahmad, Syed Mudasir; Malla, Waseem Akram; Dutt, Triveni.

BMC Genomics ; 24(1): 616, 2023 Oct 16.

Article in English | MEDLINE | ID: mdl-37845620

ABSTRACT

BACKGROUND: Elucidating genome-wide structural variants including copy number variations (CNVs) have gained increased significance in recent times owing to their contribution to genetic diversity and association with important pathophysiological states. The present study aimed to elucidate the high-resolution CNV map of six different global buffalo breeds using whole genome resequencing data at two coverages (10X and 30X). Post-quality control, the sequence reads were aligned to the latest draft release of the Bubaline genome. The genome-wide CNVs were elucidated using a read-depth approach in CNVnator with different bin sizes. Adjacent CNVs were concatenated into copy number variation regions (CNVRs) in different breeds and their genomic coverage was elucidated. RESULTS: Overall, the average size of CNVR was lower at 30X coverage, providing finer details. Most of the CNVRs were either deletion or duplication type while the occurrence of mixed events was lesser in number on a comparative basis in all breeds. The average CNVR size was lower at 30X coverage (0.201 Mb) as compared to 10X (0.013 Mb) with the finest variants in Banni buffaloes. The maximum number of CNVs was observed in Murrah (2627) and Pandharpuri (25,688) at 10X and 30X coverages, respectively. Whereas the minimum number of CNVs were scored in Surti at both coverages (2092 and 17,373). On the other hand, the highest and lowest number of CNVRs were scored in Jaffarabadi (833 and 10,179 events) and Surti (783 and 7553 events) at both coverages. Deletion events overnumbered duplications in all breeds at both coverages. Gene profiling of common overlapped genes and longest CNVRs provided important insights into the evolutionary history of these breeds and indicate the genomic regions under selection in respective breeds. CONCLUSION: The present study is the first of its kind to elucidate the high-resolution CNV map in major buffalo populations using a read-depth approach on whole genome resequencing data. The results revealed important insights into the divergence of major global buffalo breeds along the evolutionary timescale.

Subject(s)

Buffaloes , DNA Copy Number Variations , Animals , Buffaloes/genetics , Genome , Sequence Analysis, DNA , Genomics/methods

7.

Sequoia: A Framework for Visual Analysis of RNA Modifications from Direct RNA Sequencing Data.

Koonchanok, Ratanond; Daulatabad, Swapna Vidhur; Reda, Khairi; Janga, Sarath Chandra.

Methods Mol Biol ; 2624: 127-138, 2023.

Article in English | MEDLINE | ID: mdl-36723813

ABSTRACT

Oxford Nanopore-based long-read direct RNA sequencing protocols are being increasingly used to study the dynamics of RNA metabolic processes due to improvements in read lengths, increased throughput, decreasing cost, ease of library preparation, and convenience. Long-read sequencing enables single-molecule-based detection of posttranscriptional changes, promising novel insights into the functional roles of RNA. However, fulfilling this potential will necessitate the development of new tools for analyzing and exploring this type of data. Although there are tools that allow users to analyze signal information, such as comparing raw signal traces to a nucleotide sequence, they don't facilitate studying each individual signal instance in each read or perform analysis of signal clusters based on signal similarity. Therefore, we present Sequoia, a visual analytics application that allows users to interactively analyze signals originating from nanopore sequencers and can readily be extended to both RNA and DNA sequencing datasets. Sequoia combines a Python-based backend with a multi-view graphical interface that allows users to ingest raw nanopore sequencing data in Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to find attributes of interest. In this tutorial, we illustrate each individual step involved in running Sequoia and in the process dissect input data characteristics. We show how to generate Nanopore sequencing-based visualizations by leveraging dimensionality reduction and parameter tuning to separate modified RNA sequences from their unmodified counterparts. Sequoia's interactive features enhance nanopore-based computational methodologies. Sequoia enables users to construct rationales and hypotheses and develop insights about the dynamic nature of RNA from the visual analysis. Sequoia is available at https://github.com/dnonatar/Sequoia .

Subject(s)

Nanopores , Sequoia , RNA/genetics , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, RNA , Software

8.

Epitranscriptomics in parasitic protists: Role of RNA chemical modifications in posttranscriptional gene regulation.

Catacalos, Cassandra; Krohannon, Alexander; Somalraju, Sahiti; Meyer, Kate D; Janga, Sarath Chandra; Chakrabarti, Kausik.

PLoS Pathog ; 18(12): e1010972, 2022 12.

Article in English | MEDLINE | ID: mdl-36548245

ABSTRACT

"Epitranscriptomics" is the new RNA code that represents an ensemble of posttranscriptional RNA chemical modifications, which can precisely coordinate gene expression and biological processes. There are several RNA base modifications, such as N6-methyladenosine (m6A), 5-methylcytosine (m5C), and pseudouridine (Ψ), etc. that play pivotal roles in fine-tuning gene expression in almost all eukaryotes and emerging evidences suggest that parasitic protists are no exception. In this review, we primarily focus on m6A, which is the most abundant epitranscriptomic mark and regulates numerous cellular processes, ranging from nuclear export, mRNA splicing, polyadenylation, stability, and translation. We highlight the universal features of spatiotemporal m6A RNA modifications in eukaryotic phylogeny, their homologs, and unique processes in 3 unicellular parasites-Plasmodium sp., Toxoplasma sp., and Trypanosoma sp. and some technological advances in this rapidly developing research area that can significantly improve our understandings of gene expression regulation in parasites.

Subject(s)

Parasites , RNA , Animals , RNA/metabolism , Parasites/genetics , Parasites/metabolism , Gene Expression Regulation , RNA Processing, Post-Transcriptional , Eukaryota/genetics , Polyadenylation

9.

Combining transfer learning with retinal lesion features for accurate detection of diabetic retinopathy.

Hassan, Doaa; Gill, Hunter Mathias; Happe, Michael; Bhatwadekar, Ashay D; Hajrasouliha, Amir R; Janga, Sarath Chandra.

Front Med (Lausanne) ; 9: 1050436, 2022.

Article in English | MEDLINE | ID: mdl-36425113

ABSTRACT

Diabetic retinopathy (DR) is a late microvascular complication of Diabetes Mellitus (DM) that could lead to permanent blindness in patients, without early detection. Although adequate management of DM via regular eye examination can preserve vision in in 98% of the DR cases, DR screening and diagnoses based on clinical lesion features devised by expert clinicians; are costly, time-consuming and not sufficiently accurate. This raises the requirements for Artificial Intelligent (AI) systems which can accurately detect DR automatically and thus preventing DR before affecting vision. Hence, such systems can help clinician experts in certain cases and aid ophthalmologists in rapid diagnoses. To address such requirements, several approaches have been proposed in the literature that use Machine Learning (ML) and Deep Learning (DL) techniques to develop such systems. However, these approaches ignore the highly valuable clinical lesion features that could contribute significantly to the accurate detection of DR. Therefore, in this study we introduce a framework called DR-detector that employs the Extreme Gradient Boosting (XGBoost) ML model trained via the combination of the features extracted by the pretrained convolutional neural networks commonly known as transfer learning (TL) models and the clinical retinal lesion features for accurate detection of DR. The retinal lesion features are extracted via image segmentation technique using the UNET DL model and captures exudates (EXs), microaneurysms (MAs), and hemorrhages (HEMs) that are relevant lesions for DR detection. The feature combination approach implemented in DR-detector has been applied to two common TL models in the literature namely VGG-16 and ResNet-50. We trained the DR-detector model using a training dataset comprising of 1,840 color fundus images collected from e-ophtha, retinal lesions and APTOS 2019 Kaggle datasets of which 920 images are healthy. To validate the DR-detector model, we test the model on external dataset that consists of 81 healthy images collected from High-Resolution Fundus (HRF) dataset and MESSIDOR-2 datasets and 81 images with DR signs collected from Indian Diabetic Retinopathy Image Dataset (IDRID) dataset annotated for DR by expert. The experimental results show that the DR-detector model achieves a testing accuracy of 100% in detecting DR after training it with the combination of ResNet-50 and lesion features and 99.38% accuracy after training it with the combination of VGG-16 and lesion features. More importantly, the results also show a higher contribution of specific lesion features toward the performance of the DR-detector model. For instance, using only the hemorrhages feature to train the model, our model achieves an accuracy of 99.38 in detecting DR, which is higher than the accuracy when training the model with the combination of all lesion features (89%) and equal to the accuracy when training the model with the combination of all lesions and VGG-16 features together. This highlights the possibility of using only the clinical features, such as lesions that are clinically interpretable, to build the next generation of robust artificial intelligence (AI) systems with great clinical interpretability for DR detection. The code of the DR-detector framework is available on GitHub at https://github.com/Janga-Lab/DR-detector and can be readily employed for detecting DR from retinal image datasets.

10.

A Putative long-range RNA-RNA interaction between ORF8 and Spike of SARS-CoV-2.

Omoru, Okiemute Beatrice; Pereira, Filipe; Janga, Sarath Chandra; Manzourolajdad, Amirhossein.

PLoS One ; 17(9): e0260331, 2022.

Article in English | MEDLINE | ID: mdl-36048827

ABSTRACT

SARS-CoV-2 has affected people worldwide as the causative agent of COVID-19. The virus is related to the highly lethal SARS-CoV-1 responsible for the 2002-2003 SARS outbreak in Asia. Research is ongoing to understand why both viruses have different spreading capacities and mortality rates. Like other beta coronaviruses, RNA-RNA interactions occur between different parts of the viral genomic RNA, resulting in discontinuous transcription and production of various sub-genomic RNAs. These sub-genomic RNAs are then translated into other viral proteins. In this work, we performed a comparative analysis for novel long-range RNA-RNA interactions that may involve the Spike region. Comparing in-silico fragment-based predictions between reference sequences of SARS-CoV-1 and SARS-CoV-2 revealed several predictions amongst which a thermodynamically stable long-range RNA-RNA interaction between (23660-23703 Spike) and (28025-28060 ORF8) unique to SARS-CoV-2 was observed. The patterns of sequence variation using data gathered worldwide further supported the predicted stability of the sub-interacting region (23679-23690 Spike) and (28031-28042 ORF8). Such RNA-RNA interactions can potentially impact viral life cycle including sub-genomic RNA production rates.

Subject(s)

COVID-19 , SARS-CoV-2 , Spike Glycoprotein, Coronavirus , Viral Proteins , Genome, Viral , Humans , RNA, Viral/genetics , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/genetics , Viral Proteins/genetics

11.

FOXP3 exon 2 controls T_reg stability and autoimmunity.

Du, Jianguang; Wang, Qun; Yang, Shuangshuang; Chen, Si; Fu, Yongyao; Spath, Sabine; Domeier, Phillip; Hagin, David; Anover-Sombke, Stephanie; Haouili, Maya; Liu, Sheng; Wan, Jun; Han, Lei; Liu, Juli; Yang, Lei; Sangani, Neel; Li, Yujing; Lu, Xiongbin; Janga, Sarath Chandra; Kaplan, Mark H; Torgerson, Troy R; Ziegler, Steven F; Zhou, Baohua.

Sci Immunol ; 7(72): eabo5407, 2022 06 24.

Article in English | MEDLINE | ID: mdl-35749515

ABSTRACT

Differing from the mouse Foxp3 gene that encodes only one protein product, human FOXP3 encodes two major isoforms through alternative splicing-a longer isoform (FOXP3 FL) containing all the coding exons and a shorter isoform lacking the amino acids encoded by exon 2 (FOXP3 ΔE2). The two isoforms are naturally expressed in humans, yet their differences in controlling regulatory T cell phenotype and functionality remain unclear. In this study, we show that patients expressing only the shorter isoform fail to maintain self-tolerance and develop immunodeficiency, polyendocrinopathy, and enteropathy X-linked (IPEX) syndrome. Mice with Foxp3 exon 2 deletion have excessive follicular helper T (TFH) and germinal center B (GC B) cell responses, and develop systemic autoimmune disease with anti-dsDNA and antinuclear autoantibody production, as well as immune complex glomerulonephritis. Despite having normal suppressive function in in vitro assays, regulatory T cells expressing FOXP3 ΔE2 are unstable and sufficient to induce autoimmunity when transferred into Tcrb-deficient mice. Mechanistically, the FOXP3 ΔE2 isoform allows increased expression of selected cytokines, but decreased expression of a set of positive regulators of Foxp3 without altered binding to these gene loci. These findings uncover indispensable functions of the FOXP3 exon 2 region, highlighting a role in regulating a transcriptional program that maintains Treg stability and immune homeostasis.

Subject(s)

Autoimmunity , T-Lymphocytes, Regulatory , Animals , Autoimmunity/genetics , Exons/genetics , Forkhead Transcription Factors , Humans , Mice , Protein Isoforms/metabolism

12.

CASowary: CRISPR-Cas13 guide RNA predictor for transcript depletion.

Krohannon, Alexander; Srivastava, Mansi; Rauch, Simone; Srivastava, Rajneesh; Dickinson, Bryan C; Janga, Sarath Chandra.

BMC Genomics ; 23(1): 172, 2022 Mar 02.

Article in English | MEDLINE | ID: mdl-35236300

ABSTRACT

BACKGROUND: Recent discovery of the gene editing system - CRISPR (Clustered Regularly Interspersed Short Palindromic Repeats) associated proteins (Cas), has resulted in its widespread use for improved understanding of a variety of biological systems. Cas13, a lesser studied Cas protein, has been repurposed to allow for efficient and precise editing of RNA molecules. The Cas13 system utilizes base complementarity between a crRNA/sgRNA (crispr RNA or single guide RNA) and a target RNA transcript, to preferentially bind to only the target transcript. Unlike targeting the upstream regulatory regions of protein coding genes on the genome, the transcriptome is significantly more redundant, leading to many transcripts having wide stretches of identical nucleotide sequences. Transcripts also exhibit complex three-dimensional structures and interact with an array of RBPs (RNA Binding Proteins), both of which may impact the effectiveness of transcript depletion of target sequences. However, our understanding of the features and corresponding methods which can predict whether a specific sgRNA will effectively knockdown a transcript is very limited. RESULTS: Here we present a novel machine learning and computational tool, CASowary, to predict the efficacy of a sgRNA. We used publicly available RNA knockdown data from Cas13 characterization experiments for 555 sgRNAs targeting the transcriptome in HEK293 cells, in conjunction with transcriptome-wide protein occupancy information. Our model utilizes a Decision Tree architecture with a set of 112 sequence and target availability features, to classify sgRNA efficacy into one of four classes, based upon expected level of target transcript knockdown. After accounting for noise in the training data set, the noise-normalized accuracy exceeds 70%. Additionally, highly effective sgRNA predictions have been experimentally validated using an independent RNA targeting Cas system - CIRTS, confirming the robustness and reproducibility of our model's sgRNA predictions. Utilizing transcriptome wide protein occupancy map generated using POP-seq in HeLa cells against publicly available protein-RNA interaction map in Hek293 cells, we show that CASowary can predict high quality guides for numerous transcripts in a cell line specific manner. CONCLUSIONS: Application of CASowary to whole transcriptomes should enable rapid deployment of CRISPR/Cas13 systems, facilitating the development of therapeutic interventions linked with aberrations in RNA regulatory processes.

Subject(s)

CRISPR-Cas Systems , RNA, Guide, Kinetoplastida , Gene Editing/methods , HEK293 Cells , HeLa Cells , Humans , RNA, Guide, Kinetoplastida/genetics , Reproducibility of Results

13.

Penguin: A tool for predicting pseudouridine sites in direct RNA nanopore sequencing data.

Hassan, Doaa; Acevedo, Daniel; Daulatabad, Swapna Vidhur; Mir, Quoseena; Janga, Sarath Chandra.

Methods ; 203: 478-487, 2022 07.

Article in English | MEDLINE | ID: mdl-35182749

ABSTRACT

Pseudouridine is one of the most abundant RNA modifications, occurring when uridines are catalyzed by Pseudouridine synthase proteins. It plays an important role in many biological processes and has been reported to have application in drug development. Recently, the single-molecule sequencing techniques such as the direct RNA sequencing platform offered by Oxford Nanopore technologies have enabled direct detection of RNA modifications on the molecule being sequenced. In this study, we introduce a tool called Penguin that integrates several machine learning (ML) models to identify RNA Pseudouridine sites on Nanopore direct RNA sequencing reads. Pseudouridine sites were identified on single molecule sequencing data collected from direct RNA sequencing resulting in 723 K reads in Hek293 and 500 K reads in Hela cell lines. Penguin extracts a set of features from the raw signal measured by the Oxford Nanopore and the corresponding basecalled k-mer. Those features are used to train the predictors included in Penguin, which in turn, can predict whether the signal is modified by the presence of Pseudouridine sites in the testing phase. We have included various predictors in Penguin, including Support vector machines (SVM), Random Forest (RF), and Neural network (NN). The results on the two benchmark data sets for Hek293 and Hela cell lines show outstanding performance of Penguin either in random split testing or in independent validation testing. In random split testing, Penguin has been able to identify Pseudouridine sites with a high accuracy of 93.38% by applying SVM to Hek293 benchmark dataset. In independent validation testing, Penguin achieves an accuracy of 92.61% by training SVM with Hek293 benchmark dataset and testing it for identifying Pseudouridine sites on Hela benchmark dataset. Thus, Penguin outperforms the existing Pseudouridine predictors in the literature by 16 % higher accuracy than those predictors using independent validation testing. Employing penguin to predict Pseudouridine sites revealed a significant enrichment of "regulation of mRNA 3'-end processing" in Hek293 cell line and 'positive regulation of transcription from RNA polymerase II promoter involved in cellular response to chemical stimulus' in Hela cell line. Penguin software and models are available on GitHub at https://github.com/Janga-Lab/Penguin and can be readily employed for predicting Ψ sites from Nanopore direct RNA-sequencing datasets.

Subject(s)

Nanopore Sequencing , Nanopores , Spheniscidae , Animals , HEK293 Cells , HeLa Cells , High-Throughput Nucleotide Sequencing , Humans , Pseudouridine/chemistry , RNA/genetics , Sequence Analysis, RNA/methods , Spheniscidae/genetics , Spheniscidae/metabolism

14.

Esophageal Microbiome in Healthy Children and Esophageal Eosinophilia.

Parashette, Kalyan Ray; Sarsani, Vishal Kumar; Toh, Evelyn; Janga, Sarath Chandra; Nelson, David E; Gupta, Sandeep K.

J Pediatr Gastroenterol Nutr ; 74(5): e109-e114, 2022 05 01.

Article in English | MEDLINE | ID: mdl-35149653

ABSTRACT

OBJECTIVES: There is limited knowledge about the role of esophageal microbiome in pediatric esophageal eosinophilia (EE). We aimed to characterize the esophageal microbiome in pediatric patients with and without EE. METHODS: In the present prospective study, esophageal mucosal biopsies were obtained from 41 children. Of these, 22 had normal esophageal mucosal biopsies ("healthy"), 6 children had reflux esophagitis (RE), 4 had proton pump inhibitor (PPi)-responsive esophageal eosinophilia (PPi-REE), and 9 had eosinophilic esophagitis (EoE). The microbiome composition was analyzed using 16S rRNA gene sequencing. The age median (range) in years for the healthy, RE, PPi-REE, and EoE group were 10 (1.5-18), 6 (2-15), 6.5 (5-15), and 9 (1.5-17), respectively. RESULTS: The bacterial phylum Actinobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Proteobacteria were the most predominant. The Epsilonproteobacteria, Betaproteobacteria, Flavobacteria, Fusobacteria, and Sphingobacteria class were underrepresented across groups. The Vibrionales was predominant in healthy and EoE group but lower in RE and PPi-REE groups. The genus Streptococcus, Rahnella, and Leptotrichia explained 29.65% of the variation in the data with an additional 10.86% variation in the data was explained by Microbacterium, Prevotella, and Vibrio genus. The healthy group had a higher diversity and richness index compared to other groups, but this was not statistically different. CONCLUSIONS: The pediatric esophagus has an abundant and diverse microbiome, both in the healthy and diseased states. The healthy group had a higher, but not significantly different, diversity and richness index compared to other groups.

Subject(s)

Eosinophilic Esophagitis , Esophagitis, Peptic , Microbiota , Child , Enteritis , Eosinophilia , Eosinophilic Esophagitis/pathology , Gastritis , Humans , Prospective Studies , Proton Pump Inhibitors/therapeutic use , RNA, Ribosomal, 16S/genetics

15.

Comparative Analysis of Alternative Splicing Profiles in Th Cell Subsets Reveals Extensive Cell Type-Specific Effects Modulated by a Network of Transcription Factors and RNA-Binding Proteins.

Mir, Quoseena; Lakshmipati, Deepak K; Ulrich, Benjamin J; Kaplan, Mark H; Janga, Sarath Chandra.

Immunohorizons ; 5(9): 760-771, 2021 09 28.

Article in English | MEDLINE | ID: mdl-34583937

ABSTRACT

Alternative splicing (AS) plays an important role in the development of many cell types; however, its contribution to Th subsets has been clearly defined. In this study, we compare mice naive CD4+ Th cells with Th1, Th2, Th17, and T regulatory cells and observed that the majority of AS events were retained intron, followed by skipped-exon events, with at least 1200 genes across cell types affected by AS events. A significant fraction of the AS events, especially retained intron events from the 72-h time point, were no longer observed 2 wk postdifferentiation, suggesting a role for AS in early activation and differentiation via preferential expression of specific isoforms required during T cell activation, but not for differentiation or effector function. Examining the protein consequence of the exon-skipping events revealed an abundance of structural proteins encoding for intrinsically unstructured peptide regions, followed by transmembrane helices, ß strands, and polypeptide turn motifs. Analyses of expression profiles of RNA-binding proteins (RBPs) and their cognate binding sites flanking the discovered AS events revealed an enrichment for specific RBP recognition sites in each of the Th subsets. Integration with publicly available chromatin immunoprecipitation sequencing datasets for transcription factors support a model wherein lineage-determining transcription factors impact the RBP profile within the differentiating cells, and this differential expression contributes to AS of the transcriptome via a cascade of cell type-specific posttranscriptional rewiring events.

Subject(s)

Alternative Splicing/immunology , T-Lymphocyte Subsets/immunology , T-Lymphocytes, Helper-Inducer/immunology , Animals , Binding Sites , Cells, Cultured , Datasets as Topic , Lymphocyte Activation/genetics , Mice , Models, Animal , Primary Cell Culture , RNA-Binding Proteins/metabolism , RNA-Seq , T-Lymphocyte Subsets/metabolism , T-Lymphocytes, Helper-Inducer/metabolism , Transcription Factors/metabolism

16.

Mutational Landscape and Interaction of SARS-CoV-2 with Host Cellular Components.

Srivastava, Mansi; Hall, Dwight; Omoru, Okiemute Beatrice; Gill, Hunter Mathias; Smith, Sarah; Janga, Sarath Chandra.

Microorganisms ; 9(9)2021 Aug 24.

Article in English | MEDLINE | ID: mdl-34576690

ABSTRACT

The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its rapid evolution has led to a global health crisis. Increasing mutations across the SARS-CoV-2 genome have severely impacted the development of effective therapeutics and vaccines to combat the virus. However, the new SARS-CoV-2 variants and their evolutionary characteristics are not fully understood. Host cellular components such as the ACE2 receptor, RNA-binding proteins (RBPs), microRNAs, small nuclear RNA (snRNA), 18s rRNA, and the 7SL RNA component of the signal recognition particle (SRP) interact with various structural and non-structural proteins of the SARS-CoV-2. Several of these viral proteins are currently being examined for designing antiviral therapeutics. In this review, we discuss current advances in our understanding of various host cellular components targeted by the virus during SARS-CoV-2 infection. We also summarize the mutations across the SARS-CoV-2 genome that directs the evolution of new viral strains. Considering coronaviruses are rapidly evolving in humans, this enables them to escape therapeutic therapies and vaccine-induced immunity. In order to understand the virus's evolution, it is essential to study its mutational patterns and their impact on host cellular machinery. Finally, we present a comprehensive survey of currently available databases and tools to study viral-host interactions that stand as crucial resources for developing novel therapeutic strategies for combating SARS-CoV-2 infection.

17.

Sequoia: an interactive visual analytics platform for interpretation and feature extraction from nanopore sequencing datasets.

Koonchanok, Ratanond; Daulatabad, Swapna Vidhur; Mir, Quoseena; Reda, Khairi; Janga, Sarath Chandra.

BMC Genomics ; 22(1): 513, 2021 Jul 07.

Article in English | MEDLINE | ID: mdl-34233619

ABSTRACT

BACKGROUND: Direct-sequencing technologies, such as Oxford Nanopore's, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. RESULT: Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. CONCLUSIONS: Sequoia's interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia .

Subject(s)

Nanopore Sequencing , Nanopores , Sequoia , HeLa Cells , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA , Software

18.

Lantern: an integrative repository of functional annotations for lncRNAs in the human genome.

Daulatabad, Swapna Vidhur; Srivastava, Rajneesh; Janga, Sarath Chandra.

BMC Bioinformatics ; 22(1): 279, 2021 May 26.

Article in English | MEDLINE | ID: mdl-34039271

ABSTRACT

BACKGROUND: With advancements in omics technologies, the range of biological processes where long non-coding RNAs (lncRNAs) are involved, is expanding extensively, thereby generating the need to develop lncRNA annotation resources. Although, there are a plethora of resources for annotating genes, despite the extensive corpus of lncRNA literature, the available resources with lncRNA ontology annotations are rare. RESULTS: We present a lncRNA annotation extractor and repository (Lantern), developed using PubMed's abstract retrieval engine and NCBO's recommender annotation system. Lantern's annotations were benchmarked against lncRNAdb's manually curated free text. Benchmarking analysis suggested that Lantern has a recall of 0.62 against lncRNAdb for 182 lncRNAs and precision of 0.8. Additionally, we also annotated lncRNAs with multiple omics annotations, including predicted cis-regulatory TFs, interactions with RBPs, tissue-specific expression profiles, protein co-expression networks, coding potential, sub-cellular localization, and SNPs for ~ 11,000 lncRNAs in the human genome, providing a one-stop dynamic visualization platform. CONCLUSIONS: Lantern integrates a novel, accurate semi-automatic ontology annotation engine derived annotations combined with a variety of multi-omics annotations for lncRNAs, to provide a central web resource for dissecting the functional dynamics of long non-coding RNAs and to facilitate future hypothesis-driven experiments. The annotation pipeline and a web resource with current annotations for human lncRNAs are freely available on sysbio.lab.iupui.edu/lantern.

Subject(s)

RNA, Long Noncoding , Genome, Human , Humans , Molecular Sequence Annotation , RNA, Long Noncoding/genetics

19.

Transcriptome-wide high-throughput mapping of protein-RNA occupancy profiles using POP-seq.

Srivastava, Mansi; Srivastava, Rajneesh; Janga, Sarath Chandra.

Sci Rep ; 11(1): 1175, 2021 01 13.

Article in English | MEDLINE | ID: mdl-33441968

ABSTRACT

Interaction between proteins and RNA is critical for post-transcriptional regulatory processes. Existing high throughput methods based on crosslinking of the protein-RNA complexes and poly-A pull down are reported to contribute to biases and are not readily amenable for identifying interaction sites on non poly-A RNAs. We present Protein Occupancy Profile-Sequencing (POP-seq), a phase separation based method in three versions, one of which does not require crosslinking, thus providing unbiased protein occupancy profiles on whole cell transcriptome without the requirement of poly-A pulldown. Our study demonstrates that ~ 68% of the total POP-seq peaks exhibited an overlap with publicly available protein-RNA interaction profiles of 97 RNA binding proteins (RBPs) in K562 cells. We show that POP-seq variants consistently capture protein-RNA interaction sites across a broad range of genes including on transcripts encoding for transcription factors (TFs), RNA-Binding Proteins (RBPs) and long non-coding RNAs (lncRNAs). POP-seq identified peaks exhibited a significant enrichment (p value < 2.2e-16) for GWAS SNPs, phenotypic, clinically relevant germline as well as somatic variants reported in cancer genomes, suggesting the prevalence of uncharacterized genomic variation in protein occupied sites on RNA. We demonstrate that the abundance of POP-seq peaks increases with an increase in expression of lncRNAs, suggesting that highly expressed lncRNA are likely to act as sponges for RBPs, contributing to the rewiring of protein-RNA interaction network in cancer cells. Overall, our data supports POP-seq as a robust and cost-effective method that could be applied to primary tissues for mapping global protein occupancies.

Subject(s)

Protein Interaction Maps/genetics , RNA-Binding Proteins/genetics , Transcriptome/genetics , Binding Sites/genetics , Cell Line, Tumor , Gene Expression Regulation/genetics , Genome/genetics , High-Throughput Nucleotide Sequencing/methods , Humans , K562 Cells , RNA, Long Noncoding/genetics , Sequence Analysis, RNA/methods

20.

Geographical Landscape and Transmission Dynamics of SARS-CoV-2 Variants Across India: A Longitudinal Perspective.

Jha, Neha; Hall, Dwight; Kanakan, Akshay; Mehta, Priyanka; Maurya, Ranjeet; Mir, Quoseena; Gill, Hunter Mathias; Janga, Sarath Chandra; Pandey, Rajesh.

Front Genet ; 12: 753648, 2021.

Article in English | MEDLINE | ID: mdl-34976008

ABSTRACT

Globally, SARS-CoV-2 has moved from one tide to another with ebbs in between. Genomic surveillance has greatly aided the detection and tracking of the virus and the identification of the variants of concern (VOC). The knowledge and understanding from genomic surveillance is important for a populous country like India for public health and healthcare officials for advance planning. An integrative analysis of the publicly available datasets in GISAID from India reveals the differential distribution of clades, lineages, gender, and age over a year (Apr 2020-Mar 2021). The significant insights include the early evidence towards B.1.617 and B.1.1.7 lineages in the specific states of India. Pan-India longitudinal data highlighted that B.1.36* was the predominant clade in India until January-February 2021 after which it has gradually been replaced by the B.1.617.1 lineage, from December 2020 onward. Regional analysis of the spread of SARS-CoV-2 indicated that B.1.617.3 was first seen in India in the month of October in the state of Maharashtra, while the now most prevalent strain B.1.617.2 was first seen in Bihar and subsequently spread to the states of Maharashtra, Gujarat, and West Bengal. To enable a real time understanding of the transmission and evolution of the SARS-CoV-2 genomes, we built a transmission map available on https://covid19-indiana.soic.iupui.edu/India/EmergingLineages/April2020/to/March2021. Based on our analysis, the rate estimate for divergence in our dataset was 9.48 e-4 substitutions per site/year for SARS-CoV-2. This would enable pandemic preparedness with the addition of future sequencing data from India available in the public repositories for tracking and monitoring the VOCs and variants of interest (VOI). This would help aid decision making from the public health perspective.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL