Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 129
1.
Genome Res ; 34(3): 426-440, 2024 Apr 25.
Article En | MEDLINE | ID: mdl-38621828

Genome structural variations within species are rare. How selective constraints preserve gene order and chromosome structure is a central question in evolutionary biology that remains unsolved. Our sequencing of several genomes of the appendicularian tunicate Oikopleura dioica around the globe reveals extreme genome scrambling caused by thousands of chromosomal rearrangements, although showing no obvious morphological differences between these animals. The breakpoint accumulation rate is an order of magnitude higher than in ascidian tunicates, nematodes, Drosophila, or mammals. Chromosome arms and sex-specific regions appear to be the primary unit of macrosynteny conservation. At the microsyntenic level, scrambling did not preserve operon structures, suggesting an absence of selective pressure to maintain them. The uncoupling of the genome scrambling with morphological conservation in O. dioica suggests the presence of previously unnoticed cryptic species and provides a new biological system that challenges our previous vision of speciation in which similar animals always share similar genome structures.


Genome , Urochordata , Animals , Urochordata/genetics , Urochordata/classification , Evolution, Molecular , Female , Phylogeny , Male , Synteny
2.
Wellcome Open Res ; 8: 403, 2023.
Article En | MEDLINE | ID: mdl-38074197

Background: CD4 + Th1 cells producing IFN-γ are required to eradicate intracellular pathogens, however if uncontrolled these cells can cause immunopathology. The cytokine IL-10 is produced by multiple immune cells including Th1 cells during infection and regulates the immune response to minimise collateral host damage. In this study we aimed to elucidate the transcriptional network of genes controlling the expression of Il10 and proinflammatory cytokines, including Ifng in Th1 cells differentiated from mouse naive CD4 + T cells. Methods: We applied computational analysis of gene regulation derived from temporal profiling of gene expression clusters obtained from bulk RNA sequencing (RNA-seq) of flow cytometry sorted naïve CD4 + T cells from mouse spleens differentiated in vitro into Th1 effector cells with IL-12 and IL-27 to produce Ifng and Il10, compared to IL-27 alone which express Il10 only , or IL-12 alone which express Ifng and no Il10, or medium control driven-CD4 + T cells which do not express effector cytokines . Data were integrated with analysis of active genomic regions from these T cells using an assay for transposase-accessible chromatin with sequencing (ATAC)-seq, integrated with literature derived-Chromatin-immunoprecipitation (ChIP)-seq data and the RNA-seq data, to elucidate the transcriptional network of genes controlling expression of Il10 and pro-inflammatory effector genes in Th1 cells. The co-dominant role for the transcription factors, Prdm1 (encoding Blimp-1) and Maf (encoding c-Maf) , in cytokine gene regulation in Th1 cells, was confirmed using T cells obtained from mice with T-cell specific deletion of these transcription factors. Results: We show that the transcription factors Blimp-1 and c-Maf each have unique and common effects on cytokine gene regulation and not only co-operate to induce Il10 gene expression in IL-12 plus IL-27 differentiated mouse Th1 cells, but additionally directly negatively regulate key proinflammatory cytokines including Ifng, thus providing mechanisms for reinforcement of regulated Th1 cell responses. Conclusions: These data show that Blimp-1 and c-Maf positively and negatively regulate a network of both unique and common anti-inflammatory and pro-inflammatory genes to reinforce a Th1 response in mice that will eradicate pathogens with minimum immunopathology.

3.
Wellcome Open Res ; 8: 286, 2023.
Article En | MEDLINE | ID: mdl-37829674

Crosslinking and immunoprecipitation (CLIP) technologies have become a central component of the molecular biologists' toolkit to study protein-RNA interactions and thus to uncover core principles of RNA biology. There has been a proliferation of CLIP-based experimental protocols, as well as computational tools, especially for peak-calling. Consequently, there is an urgent need for a well-documented bioinformatic pipeline that enshrines the principles of robustness, reproducibility, scalability, portability and flexibility while embracing the diversity of experimental and computational CLIP tools. To address this, we present nf-core/clipseq - a robust Nextflow pipeline for quality control and analysis of CLIP sequencing data. It is part of the international nf-core community effort to develop and curate a best-practice, gold-standard set of pipelines for data analysis. The standards enabled by Nextflow and nf-core, including workflow management, version control, continuous integration and containerisation ensure that these key needs are met. Furthermore, multiple tools are implemented ( e.g. for peak-calling), alongside visualisation of quality control metrics to empower the user to make their own informed decisions based on their data. nf-core/clipseq remains under active development, with plans to incorporate newly released tools to ensure that pipeline remains up-to-date and relevant for the community. Engagement with users and developers is encouraged through the nf-core GitHub repository and Slack channel to promote collaboration. It is available at https://nf-co.re/clipseq.

4.
Elife ; 122023 08 02.
Article En | MEDLINE | ID: mdl-37530410

The vertebrate 'neural plate border' is a transient territory located at the edge of the neural plate containing precursors for all ectodermal derivatives: the neural plate, neural crest, placodes and epidermis. Elegant functional experiments in a range of vertebrate models have provided an in-depth understanding of gene regulatory interactions within the ectoderm. However, these experiments conducted at tissue level raise seemingly contradictory models for fate allocation of individual cells. Here, we carry out single cell RNA sequencing of chick ectoderm from primitive streak to neurulation stage, to explore cell state diversity and heterogeneity. We characterise the dynamics of gene modules, allowing us to model the order of molecular events which take place as ectodermal fates segregate. Furthermore, we find that genes previously classified as neural plate border 'specifiers' typically exhibit dynamic expression patterns and are enriched in either neural, neural crest or placodal fates, revealing that the neural plate border should be seen as a heterogeneous ectodermal territory and not a discrete transitional transcriptional state. Analysis of neural, neural crest and placodal markers reveals that individual NPB cells co-express competing transcriptional programmes suggesting that their ultimate identify is not yet fixed. This population of 'border located undecided progenitors' (BLUPs) gradually diminishes as cell fate decisions take place. Considering our findings, we propose a probabilistic model for cell fate choice at the neural plate border. Our data suggest that the probability of a progenitor's daughters to contribute to a given ectodermal derivative is related to the balance of competing transcriptional programmes, which in turn are regulated by the spatiotemporal position of a progenitor.


Ectoderm , Neural Plate , Animals , Ectoderm/metabolism , Neural Crest , Chickens , Models, Statistical , Single-Cell Analysis , Gene Expression Regulation, Developmental
5.
Nucleic Acids Res ; 51(8): 3573-3589, 2023 05 08.
Article En | MEDLINE | ID: mdl-37013995

The structure of mRNA molecules plays an important role in its interactions with trans-acting factors, notably RNA binding proteins (RBPs), thus contributing to the functional consequences of this interplay. However, current transcriptome-wide experimental methods to chart these interactions are limited by their poor sensitivity. Here we extend the hiCLIP atlas of duplexes bound by Staufen1 (STAU1) ∼10-fold, through careful consideration of experimental assumptions, and the development of bespoke computational methods which we apply to existing data. We present Tosca, a Nextflow computational pipeline for the processing, analysis and visualisation of proximity ligation sequencing data generally. We use our extended duplex atlas to discover insights into the RNA selectivity of STAU1, revealing the importance of structural symmetry and duplex-span-dependent nucleotide composition. Furthermore, we identify heterogeneity in the relationship between transcripts with STAU1-bound 3' UTR duplexes and metabolism of the associated RNAs that we relate to RNA structure: transcripts with short-range proximal 3' UTR duplexes have high degradation rates, but those with long-range duplexes have low rates. Overall, our work enables the integrative analysis of proximity ligation data delivering insights into specific features and effects of RBP-RNA structure interactions.


RNA-Binding Proteins , Trans-Activators , 3' Untranslated Regions/genetics , RNA, Messenger/metabolism , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Trans-Activators/metabolism , Protein Binding
6.
Elife ; 122023 03 03.
Article En | MEDLINE | ID: mdl-36867045

During early vertebrate development, signals from a special region of the embryo, the organizer, can redirect the fate of non-neural ectoderm cells to form a complete, patterned nervous system. This is called neural induction and has generally been imagined as a single signalling event, causing a switch of fate. Here, we undertake a comprehensive analysis, in very fine time course, of the events following exposure of competent ectoderm of the chick to the organizer (the tip of the primitive streak, Hensen's node). Using transcriptomics and epigenomics we generate a gene regulatory network comprising 175 transcriptional regulators and 5614 predicted interactions between them, with fine temporal dynamics from initial exposure to the signals to expression of mature neural plate markers. Using in situ hybridization, single-cell RNA-sequencing, and reporter assays, we show that the gene regulatory hierarchy of responses to a grafted organizer closely resembles the events of normal neural plate development. The study is accompanied by an extensive resource, including information about conservation of the predicted enhancers in other vertebrates.


Gene Regulatory Networks , Nervous System , Animals , Nervous System/metabolism , Chickens , Embryonic Development , Organizers, Embryonic , Vertebrates
7.
RNA ; 29(6): 715-723, 2023 06.
Article En | MEDLINE | ID: mdl-36894192

CLIP technologies are now widely used to study RNA-protein interactions and many data sets are now publicly available. An important first step in CLIP data exploration is the visual inspection and assessment of processed genomic data on selected genes or regions and performing comparisons: either across conditions within a particular project, or incorporating publicly available data. However, the output files produced by data processing pipelines or preprocessed files available to download from data repositories are often not suitable for direct comparison and usually need further processing. Furthermore, to derive biological insight it is usually necessary to visualize a CLIP signal alongside other data such as annotations, or orthogonal functional genomic data (e.g., RNA-seq). We have developed a simple, but powerful, command-line tool: clipplotr, which facilitates these visual comparative and integrative analyses with normalization and smoothing options for CLIP data and the ability to show these alongside reference annotation tracks and functional genomic data. These data can be supplied as input to clipplotr in a range of file formats, which will output a publication quality figure. It is written in R and can both run on a laptop computer independently or be integrated into computational workflows on a high-performance cluster. Releases, source code, and documentation are freely available at https://github.com/ulelab/clipplotr.


Genomics , Software , Genome , RNA-Seq
8.
Brain ; 146(6): 2547-2556, 2023 06 01.
Article En | MEDLINE | ID: mdl-36789492

Valosin-containing protein (VCP) is a hexameric ATPase associated with diverse cellular activities. Genetic mutations in VCP are associated with several forms of muscular and neuronal degeneration, including amyotrophic lateral sclerosis (ALS). Moreover, VCP mediates UV-induced proteolysis of RNA polymerase II (RNAPII), but little is known about the effects of VCP mutations on the transcriptional machinery. Here, we used silica particle-assisted chromatin enrichment and mass spectrometry to study proteins co-localized with RNAPII in precursor neurons differentiated from VCP-mutant or control induced pluripotent stem cells. Remarkably, we observed diminished RNAPII binding of proteins involved in transcription elongation and mRNA splicing in mutant cells. One of these is SART3, a recycling factor of the splicing machinery, whose knockdown leads to perturbed intron retention in several ALS-associated genes. Additional reduced proteins are RBM45, EIF5A and RNF220, mutations in which are associated with various neurodegenerative disorders and are linked to TDP-43 aggregation. Conversely, we observed increased RNAPII binding of heat shock proteins such as HSPB1. Together, these findings shed light on how transcription and splicing machinery are impaired by VCP mutations, which might contribute to aberrant alternative splicing and proteinopathy in neurodegeneration.


Amyotrophic Lateral Sclerosis , Humans , Valosin Containing Protein/genetics , Valosin Containing Protein/metabolism , Amyotrophic Lateral Sclerosis/genetics , Amyotrophic Lateral Sclerosis/metabolism , RNA Polymerase II/metabolism , Adenosine Triphosphatases/genetics , Adenosine Triphosphatases/metabolism , Mutation/genetics , Antigens, Neoplasm , RNA-Binding Proteins/genetics , Nerve Tissue Proteins/genetics
9.
Nature ; 615(7950): 105-110, 2023 03.
Article En | MEDLINE | ID: mdl-36697830

Indirect development with an intermediate larva exists in all major animal lineages1, which makes larvae central to most scenarios of animal evolution2-11. Yet how larvae evolved remains disputed. Here we show that temporal shifts (that is, heterochronies) in trunk formation underpin the diversification of larvae and bilaterian life cycles. We performed chromosome-scale genome sequencing in the annelid Owenia fusiformis with transcriptomic and epigenomic profiling during the life cycles of this and two other annelids. We found that trunk development is deferred to pre-metamorphic stages in the feeding larva of O. fusiformis but starts after gastrulation in the non-feeding larva with gradual metamorphosis of Capitella teleta and the direct developing embryo of Dimorphilus gyrociliatus. Accordingly, the embryos of O. fusiformis develop first into an enlarged anterior domain that forms larval tissues and the adult head12. Notably, this also occurs in the so-called 'head larvae' of other bilaterians13-17, with which the O. fusiformis larva shows extensive transcriptomic similarities. Together, our findings suggest that the temporal decoupling of head and trunk formation, as maximally observed in head larvae, facilitated larval evolution in Bilateria. This diverges from prevailing scenarios that propose either co-option9,10 or innovation11 of gene regulatory programmes to explain larva and adult origins.


Genomics , Life Cycle Stages , Polychaeta , Animals , Larva/anatomy & histology , Larva/growth & development , Polychaeta/anatomy & histology , Polychaeta/embryology , Polychaeta/genetics , Polychaeta/growth & development , Gene Expression Profiling , Epigenomics , Head/anatomy & histology , Head/embryology , Head/growth & development
10.
Elife ; 112022 11 24.
Article En | MEDLINE | ID: mdl-36422864

N6- methyladenosine (m6A) RNA modification impacts mRNA fate primarily via reader proteins, which dictate processes in development, stress, and disease. Yet little is known about m6A function in Saccharomyces cerevisiae, which occurs solely during early meiosis. Here, we perform a multifaceted analysis of the m6A reader protein Pho92/Mrb1. Cross-linking immunoprecipitation analysis reveals that Pho92 associates with the 3'end of meiotic mRNAs in both an m6A-dependent and independent manner. Within cells, Pho92 transitions from the nucleus to the cytoplasm, and associates with translating ribosomes. In the nucleus Pho92 associates with target loci through its interaction with transcriptional elongator Paf1C. Functionally, we show that Pho92 promotes and links protein synthesis to mRNA decay. As such, the Pho92-mediated m6A-mRNA decay is contingent on active translation and the CCR4-NOT complex. We propose that the m6A reader Pho92 is loaded co-transcriptionally to facilitate protein synthesis and subsequent decay of m6A modified transcripts, and thereby promotes meiosis.


Exercise , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genetics , RNA, Messenger/genetics , RNA Stability
11.
F1000Res ; 11: 240, 2022.
Article En | MEDLINE | ID: mdl-35350547

Background: Automation has increasingly become more commonplace in the research laboratory workspace. The introduction of articulated robotic arms allows the researcher more flexibility in the tasks a single piece of automated machinery can perform. We set out to incorporate automation in processing of genomic DNA organic extractions to increase throughput and limit researchers to the exposure of organic solvents. Methods: In order to automate the genome sequencing pipeline in our laboratory, we programmed a dual-arm anthropomorphic robot, the Robotic Biology Institute's Maholo LabDroid, to perform organic solvent-based genomic DNA extraction from cell lysates. To the best of our knowledge, this is the first time that automation of phenol-chloroform extraction has been reported. Results: We achieved routine extraction of high molecular weight genomic DNA (>100 kb) from diverse biological samples including algae cultured in sea water, bacteria, whole insects, and human cell lines. The results of pulse-field electrophoresis size analysis and the N50 sequencing metrics of reads obtained from Nanopore MinION runs verified the presence of intact DNA suitable for direct sequencing. Conclusions: We present the workflow that can be used to program similar robots and discuss the problems and solutions we encountered in developing the workflow. The protocol can be adapted to analogous methods such as RNA extraction, and there is ongoing work to incorporate further post-extraction steps such as library construction. This work shows the potential for automated robotic workflows to free molecular biological researchers from manual interventions in routine experimental work. A time-lapse movie of the entire automated run is included in this report.


Chloroform , Phenol , DNA/genetics , Genomics , Humans , Molecular Weight , Phenols
12.
Genome Res ; 32(1): 71-84, 2022 01.
Article En | MEDLINE | ID: mdl-34963663

Astrocytes contribute to motor neuron death in amyotrophic lateral sclerosis (ALS), but whether they adopt deleterious features consistent with inflammatory reactive states remains incompletely resolved. To identify inflammatory reactive features in ALS human induced pluripotent stem cell (hiPSC)-derived astrocytes, we examined transcriptomics, proteomics, and glutamate uptake in VCP-mutant astrocytes. We complemented this by examining other ALS mutations and models using a systematic meta-analysis of all publicly-available ALS astrocyte sequencing data, which included hiPSC-derived astrocytes carrying SOD1, C9orf72, and FUS gene mutations as well as mouse ALS astrocyte models with SOD1G93A mutation, Tardbp deletion, and Tmem259 (also known as membralin) deletion. ALS astrocytes were characterized by up-regulation of genes involved in the extracellular matrix, endoplasmic reticulum stress, and the immune response and down-regulation of synaptic integrity, glutamate uptake, and other neuronal support processes. We identify activation of the TGFB, Wnt, and hypoxia signaling pathways in both hiPSC and mouse ALS astrocytes. ALS changes positively correlate with TNF, IL1A, and complement pathway component C1q-treated inflammatory reactive astrocytes, with significant overlap of differentially expressed genes. By contrasting ALS changes with models of protective reactive astrocytes, including middle cerebral artery occlusion and spinal cord injury, we uncover a cluster of genes changing in opposing directions, which may represent down-regulated homeostatic genes and up-regulated deleterious genes in ALS astrocytes. These observations indicate that ALS astrocytes augment inflammatory processes while concomitantly suppressing neuronal supporting mechanisms, thus resembling inflammatory reactive states and offering potential therapeutic targets.


Amyotrophic Lateral Sclerosis , Induced Pluripotent Stem Cells , Amyotrophic Lateral Sclerosis/genetics , Amyotrophic Lateral Sclerosis/metabolism , Animals , Astrocytes/metabolism , Disease Models, Animal , Humans , Mice , Mice, Transgenic , Motor Neurons/metabolism , Mutation
13.
Int J Mol Sci ; 22(23)2021 Dec 01.
Article En | MEDLINE | ID: mdl-34884825

RNA-binding proteins (RBPs) act as posttranscriptional regulators controlling the fate of target mRNAs. Unraveling how RNAs are recognized by RBPs and in turn are assembled into neuronal RNA granules is therefore key to understanding the underlying mechanism. While RNA sequence elements have been extensively characterized, the functional impact of RNA secondary structures is only recently being explored. Here, we show that Staufen2 binds complex, long-ranged RNA hairpins in the 3'-untranslated region (UTR) of its targets. These structures are involved in the assembly of Staufen2 into RNA granules. Furthermore, we provide direct evidence that a defined Rgs4 RNA duplex regulates Staufen2-dependent RNA localization to distal dendrites. Importantly, disrupting the RNA hairpin impairs the observed effects. Finally, we show that these secondary structures differently affect protein expression in neurons. In conclusion, our data reveal the importance of RNA secondary structure in regulating RNA granule assembly, localization and eventually translation. It is therefore tempting to speculate that secondary structures represent an important code for cells to control the intracellular fate of their mRNAs.


Cytoplasmic Ribonucleoprotein Granules/chemistry , Neurons/metabolism , RGS Proteins/genetics , RNA, Messenger/chemistry , RNA-Binding Proteins/metabolism , 3' Untranslated Regions , Animals , Cells, Cultured , Cytoplasmic Ribonucleoprotein Granules/metabolism , Female , Neurons/cytology , Nucleic Acid Conformation , RNA Interference , RNA, Messenger/metabolism , RNA, Small Interfering/metabolism , RNA-Binding Proteins/antagonists & inhibitors , RNA-Binding Proteins/genetics , Rats , Rats, Sprague-Dawley
14.
Nucleic Acids Res ; 49(22): 13092-13107, 2021 12 16.
Article En | MEDLINE | ID: mdl-34871434

RNA-binding proteins (RBPs) play diverse roles in regulating co-transcriptional RNA-processing and chromatin functions, but our knowledge of the repertoire of chromatin-associated RBPs (caRBPs) and their interactions with chromatin remains limited. Here, we developed SPACE (Silica Particle Assisted Chromatin Enrichment) to isolate global and regional chromatin components with high specificity and sensitivity, and SPACEmap to identify the chromatin-contact regions in proteins. Applied to mouse embryonic stem cells, SPACE identified 1459 chromatin-associated proteins, ∼48% of which are annotated as RBPs, indicating their dual roles in chromatin and RNA-binding. Additionally, SPACEmap stringently verified chromatin-binding of 403 RBPs and identified their chromatin-contact regions. Notably, SPACEmap showed that about 40% of the caRBPs bind chromatin by intrinsically disordered regions (IDRs). Studying SPACE and total proteome dynamics from mES cells grown in 2iL and serum medium indicates significant correlation (R = 0.62). One of the most dynamic caRBPs is Dazl, which we find co-localized with PRC2 at transcription start sites of genes that are distinct from Dazl mRNA binding. Dazl and other PRC2-colocalised caRBPs are rich in intrinsically disordered regions (IDRs), which could contribute to the formation and regulation of phase-separated PRC condensates. Together, our approach provides an unprecedented insight into IDR-mediated interactions and caRBPs with moonlighting functions in native chromatin.


Chromatin/metabolism , Intrinsically Disordered Proteins/metabolism , Mouse Embryonic Stem Cells/metabolism , RNA-Binding Proteins/metabolism , Animals , Binding Sites/genetics , Cells, Cultured , Chromatin/genetics , Intrinsically Disordered Proteins/genetics , Mass Spectrometry/methods , Mice , Protein Binding , Protein Interaction Maps/genetics , Proteome/genetics , Proteome/metabolism , Proteomics/methods , RNA-Binding Proteins/genetics , Reproducibility of Results
15.
Nat Commun ; 12(1): 7198, 2021 12 10.
Article En | MEDLINE | ID: mdl-34893601

RNA molecules undergo a vast array of chemical post-transcriptional modifications (PTMs) that can affect their structure and interaction properties. In recent years, a growing number of PTMs have been successfully mapped to the transcriptome using experimental approaches relying on high-throughput sequencing. Oxford Nanopore direct-RNA sequencing has been shown to be sensitive to RNA modifications. We developed and validated Nanocompore, a robust analytical framework that identifies modifications from these data. Our strategy compares an RNA sample of interest against a non-modified control sample, not requiring a training set and allowing the use of replicates. We show that Nanocompore can detect different RNA modifications with position accuracy in vitro, and we apply it to profile m6A in vivo in yeast and human RNAs, as well as in targeted non-coding RNAs. We confirm our results with orthogonal methods and provide novel insights on the co-occurrence of multiple modified residues on individual RNA molecules.


Nanopore Sequencing/methods , Nanopores , RNA/metabolism , Sequence Analysis, RNA/methods , Base Sequence , Computational Biology , Gene Expression Profiling , Genetic Techniques , High-Throughput Nucleotide Sequencing , Humans , RNA/isolation & purification , RNA Processing, Post-Transcriptional , Software , Transcriptome
16.
Cell ; 184(18): 4680-4696.e22, 2021 09 02.
Article En | MEDLINE | ID: mdl-34380047

Mutations causing amyotrophic lateral sclerosis (ALS) often affect the condensation properties of RNA-binding proteins (RBPs). However, the role of RBP condensation in the specificity and function of protein-RNA complexes remains unclear. We created a series of TDP-43 C-terminal domain (CTD) variants that exhibited a gradient of low to high condensation propensity, as observed in vitro and by nuclear mobility and foci formation. Notably, a capacity for condensation was required for efficient TDP-43 assembly on subsets of RNA-binding regions, which contain unusually long clusters of motifs of characteristic types and density. These "binding-region condensates" are promoted by homomeric CTD-driven interactions and required for efficient regulation of a subset of bound transcripts, including autoregulation of TDP-43 mRNA. We establish that RBP condensation can occur in a binding-region-specific manner to selectively modulate transcriptome-wide RNA regulation, which has implications for remodeling RNA networks in the context of signaling, disease, and evolution.


DNA-Binding Proteins/metabolism , RNA-Binding Proteins/metabolism , RNA/metabolism , 3' Untranslated Regions/genetics , Base Sequence , Cell Nucleus/metabolism , HEK293 Cells , HeLa Cells , Homeostasis , Humans , Mutation/genetics , Nucleotide Motifs/genetics , Phase Transition , Point Mutation/genetics , Poly A/metabolism , Protein Binding , Protein Multimerization , RNA, Messenger/genetics , RNA, Messenger/metabolism , Sequence Deletion
17.
Wellcome Open Res ; 6: 141, 2021.
Article En | MEDLINE | ID: mdl-34286104

Background: The first step of virtually all next generation sequencing analysis involves the splitting of the raw sequencing data into separate files using sample-specific barcodes, a process known as "demultiplexing". However, we found that existing software for this purpose was either too inflexible or too computationally intensive for fast, streamlined processing of raw, single end fastq files containing combinatorial barcodes. Results: Here, we introduce a fast and uniquely flexible demultiplexer, named Ultraplex, which splits a raw FASTQ file containing barcodes either at a single end or at both 5' and 3' ends of reads, trims the sequencing adaptors and low-quality bases, and moves unique molecular identifiers (UMIs) into the read header, allowing subsequent removal of PCR duplicates. Ultraplex is able to perform such single or combinatorial demultiplexing on both single- and paired-end sequencing data, and can process an entire Illumina HiSeq lane, consisting of nearly 500 million reads, in less than 20 minutes. Conclusions: Ultraplex greatly reduces computational burden and pipeline complexity for the demultiplexing of complex sequencing libraries, such as those produced by various CLIP and ribosome profiling protocols, and is also very user friendly, enabling streamlined, robust data processing. Ultraplex is available on PyPi and Conda and via Github.

19.
Genome Biol ; 22(1): 136, 2021 05 05.
Article En | MEDLINE | ID: mdl-33952325

BACKGROUND: Eukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs. Transcription is not restricted to regions with annotated gene features but includes almost any genomic context. Currently, the source and function of most RNAs originating from intergenic regions in the human genome remain unclear. RESULTS: We hypothesize that many intergenic RNAs can be ascribed to the presence of as-yet unannotated genes or the "fuzzy" transcription of known genes that extends beyond the annotated boundaries. To elucidate the contributions of these two sources, we assemble a dataset of more than 2.5 billion publicly available RNA-seq reads across 5 human cell lines and multiple cellular compartments to annotate transcriptional units in the human genome. About 80% of transcripts from unannotated intergenic regions can be attributed to the fuzzy transcription of existing genes; the remaining transcripts originate mainly from putative long non-coding RNA loci that are rarely spliced. We validate the transcriptional activity of these intergenic RNAs using independent measurements, including transcriptional start sites, chromatin signatures, and genomic occupancies of RNA polymerase II in various phosphorylation states. We also analyze the nuclear localization and sensitivities of intergenic transcripts to nucleases to illustrate that they tend to be rapidly degraded either on-chromatin by XRN2 or off-chromatin by the exosome. CONCLUSIONS: We provide a curated atlas of intergenic RNAs that distinguishes between alternative processing of well-annotated genes from independent transcriptional units based on the combined analysis of chromatin signatures, nuclear RNA localization, and degradation pathways.


DNA, Intergenic/genetics , Genes , RNA, Messenger/genetics , Cell Line , Chromatin/genetics , Endonucleases/metabolism , Humans , RNA, Messenger/metabolism , Transcription, Genetic
...