ABSTRACT
Analysis of RNA-seq data often detects numerous 'non-co-linear' (NCL) transcripts, which comprised sequence segments that are topologically inconsistent with their corresponding DNA sequences in the reference genome. However, detection of NCL transcripts involves two major challenges: removal of false positives arising from alignment artifacts and discrimination between different types of NCL transcripts (trans-spliced, circular or fusion transcripts). Here, we developed a new NCL-transcript-detecting method ('NCLscan'), which utilized a stepwise alignment strategy to almost completely eliminate false calls (>98% precision) without sacrificing true positives, enabling NCLscan outperform 18 other publicly-available tools (including fusion- and circular-RNA-detecting tools) in terms of sensitivity and precision, regardless of the generation strategy of simulated dataset, type of intragenic or intergenic NCL event, read depth of coverage, read length or expression level of NCL transcript. With the high accuracy, NCLscan was applied to distinguishing between trans-spliced, circular and fusion transcripts on the basis of poly(A)- and nonpoly(A)-selected RNA-seq data. We showed that circular RNAs were expressed more ubiquitously, more abundantly and less cell type-specifically than trans-spliced and fusion transcripts. Our study thus describes a robust pipeline for the discovery of NCL transcripts, and sheds light on the fundamental biology of these non-canonical RNA events in human transcriptome.
Subject(s)
RNA Splicing , RNA, Messenger/genetics , RNA/genetics , Limit of Detection , RNA, Circular , Reproducibility of ResultsABSTRACT
Global transcriptome investigations often result in the detection of an enormous number of transcripts composed of non-co-linear sequence fragments. Such 'aberrant' transcript products may arise from post-transcriptional events or genetic rearrangements, or may otherwise be false positives (sequencing/alignment errors or in vitro artifacts). Moreover, post-transcriptionally non-co-linear ('PtNcl') transcripts can arise from trans-splicing or back-splicing in cis (to generate so-called 'circular RNA'). Here, we collected previously-predicted human non-co-linear RNA candidates, and designed a validation procedure integrating in silico filters with multiple experimental validation steps to examine their authenticity. We showed that >50% of the tested candidates were in vitro artifacts, even though some had been previously validated by RT-PCR. After excluding the possibility of genetic rearrangements, we distinguished between trans-spliced and circular RNAs, and confirmed that these two splicing forms can share the same non-co-linear junction. Importantly, the experimentally-confirmed PtNcl RNA events and their corresponding PtNcl splicing types (i.e. trans-splicing, circular RNA, or both sharing the same junction) were all expressed in rhesus macaque, and some were even expressed in mouse. Our study thus describes an essential procedure for confirming PtNcl transcripts, and provides further insight into the evolutionary role of PtNcl RNA events, opening up this important, but understudied, class of post-transcriptional events for comprehensive characterization.
Subject(s)
Artifacts , RNA Splicing , Reverse Transcriptase Polymerase Chain Reaction , Trans-Splicing , Animals , Cells, Cultured , Evolution, Molecular , Gene Expression Profiling , High-Throughput Nucleotide Sequencing , Humans , Macaca mulatta , Mice , RNA/chemistry , RNA/isolation & purification , RNA Splice Sites , Sequence Analysis, RNAABSTRACT
BACKGROUND: Crop plants such as rice, maize and sorghum play economically-important roles as main sources of food, fuel, and animal feed. However, current genome annotations of crop plants still suffer false-positive predictions; a more comprehensive registry of alternative splicing (AS) events is also in demand. Comparative genomics of crop plants is largely unexplored. RESULTS: We performed a large-scale comparative analysis (ExonFinder) of the expressed sequence tag (EST) library from nine grass plants against three crop genomes (rice, maize, and sorghum) and identified 2,879 previously-unannotated exons (i.e., novel exons) in the three crops. We validated 81% of the tested exons by RT-PCR-sequencing, supporting the effectiveness of our in silico strategy. Evolutionary analysis reveals that the novel exons, comparing with their flanking annotated ones, are generally under weaker selection pressure at the protein level, but under stronger pressure at the RNA level, suggesting that most of the novel exons also represent novel alternatively spliced variants (ASVs). However, we also observed the consistency of evolutionary rates between certain novel exons and their flanking exons, which provided further evidence of their co-occurrence in the transcripts, suggesting that previously-annotated isoforms might be subject to erroneous predictions. Our validation showed that 54% of the tested genes expressed the newly-identified isoforms that contained the novel exons, rather than the previously-annotated isoforms that excluded them. The consistent results were steadily observed across cultivated (Oryza sativa and O. glaberrima) and wild (O. rufipogon and O. nivara) rice species, asserting the necessity of our curation of the crop genome annotations. Our comparative analyses also inferred the common ancestral transcriptome of grass plants and gain- and loss-of-ASV events. CONCLUSIONS: We have reannotated the rice, maize, and sorghum genomes, and showed that evolutionary rates might serve as an indicator for determining whether the identified exons were alternatively spliced. This study not only presents an effective in silico strategy for the improvement of plant annotations, but also provides further insights into the role of AS events in the evolution and domestication of crop plants. ExonFinder and the novel exons/ASVs identified are publicly accessible at http://exonfinder.sourceforge.net/ .
Subject(s)
Crops, Agricultural/genetics , Expressed Sequence Tags/chemistry , Genome, Plant , Plant Proteins/genetics , Poaceae/genetics , Alternative Splicing , Exons , Oryza/genetics , Protein Isoforms/genetics , Real-Time Polymerase Chain Reaction , Sorghum/genetics , Zea mays/geneticsABSTRACT
Vertebrates differ greatly in responses to pro-inflammatory agonists such as bacterial lipopolysaccharide (LPS), complicating use of animal models to study human sepsis or inflammatory disorders. We compared transcriptomes of resting and LPS-exposed blood from six LPS-sensitive species (rabbit, pig, sheep, cow, chimpanzee, human) and four LPS-resilient species (mice, rats, baboon, rhesus), as well as plasma proteomes and lipidomes. Unexpectedly, at baseline, sensitive species already had enhanced expression of LPS-responsive genes relative to resilient species. After LPS stimulation, maximally different genes in resilient species included genes that detoxify LPS, diminish bacterial growth, discriminate sepsis from SIRS, and play roles in autophagy and apoptosis. The findings reveal the molecular landscape of species differences in inflammation, and may inform better selection of species for pre-clinical models.
ABSTRACT
One in four myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) patients are estimated to be severely affected by the disease, and these house-bound or bedbound patients are currently understudied. Here, we report a comprehensive examination of the symptoms and clinical laboratory tests of a cohort of severely ill patients and healthy controls. The greatly reduced quality of life of the patients was negatively correlated with clinical depression. The most troublesome symptoms included fatigue (85%), pain (65%), cognitive impairment (50%), orthostatic intolerance (45%), sleep disturbance (35%), post-exertional malaise (30%), and neurosensory disturbance (30%). Sleep profiles and cognitive tests revealed distinctive impairments. Lower morning cortisol level and alterations in its diurnal rhythm were observed in the patients, and antibody and antigen measurements showed no evidence for acute infections by common viral or bacterial pathogens. These results highlight the urgent need of developing molecular diagnostic tests for ME/CFS. In addition, there was a striking similarity in symptoms between long COVID and ME/CFS, suggesting that studies on the mechanism and treatment of ME/CFS may help prevent and treat long COVID and vice versa.
ABSTRACT
BACKGROUND: Despite several outbreaks of SARS-CoV-2 amongst healthcare personnel (HCP) exposed to COVID-19 patients globally, risk factors for transmission remain poorly understood. METHODS: We conducted an outbreak investigation and case-control study to evaluate SARS-CoV-2 transmission risk in an outbreak among HCP at an academic medical center in California that was confirmed by whole genome sequencing. RESULTS: A total of 7/9 cases and 93/182 controls completed a voluntary survey about risk factors. Compared to controls, cases reported significantly more patient contact time. Cases were also significantly more likely to have performed airway procedures on the index patient, particularly placing the patient on high flow nasal cannula, continuous positive airway pressure (CPAP), or bilevel positive airway pressure (BiPAP) (OR = 11.6; 95% CI = 1.7 -132.1). DISCUSSION: This study highlights the risk of nosocomial infection of SARS-CoV-2 from patients who become infectious midway into their hospitalization. Our findings also reinforce the importance of patient contact time and aerosol-generating procedures as key risk factors for HCP infection with SARS-CoV-2. CONCLUSIONS: Re-testing patients for SARS-CoV-2 after admission in suspicious cases and using N95 masks for all aerosol-generating procedures regardless of initial patient SARS-CoV-2 test results can help reduce the risk of SARS-COV-2 transmission to HCP.
Subject(s)
COVID-19 , SARS-CoV-2 , Case-Control Studies , Delivery of Health Care , Disease Outbreaks , Health Personnel , Humans , Risk Factors , Tertiary Care CentersABSTRACT
Circulating tumor DNA (ctDNA) sequencing is being rapidly adopted in precision oncology, but the accuracy, sensitivity and reproducibility of ctDNA assays is poorly understood. Here we report the findings of a multi-site, cross-platform evaluation of the analytical performance of five industry-leading ctDNA assays. We evaluated each stage of the ctDNA sequencing workflow with simulations, synthetic DNA spike-in experiments and proficiency testing on standardized, cell-line-derived reference samples. Above 0.5% variant allele frequency, ctDNA mutations were detected with high sensitivity, precision and reproducibility by all five assays, whereas, below this limit, detection became unreliable and varied widely between assays, especially when input material was limited. Missed mutations (false negatives) were more common than erroneous candidates (false positives), indicating that the reliable sampling of rare ctDNA fragments is the key challenge for ctDNA assays. This comprehensive evaluation of the analytical performance of ctDNA assays serves to inform best practice guidelines and provides a resource for precision oncology.
Subject(s)
Circulating Tumor DNA/genetics , Medical Oncology , Neoplasms/genetics , Precision Medicine , Sequence Analysis, DNA/standards , High-Throughput Nucleotide Sequencing/methods , Humans , Limit of Detection , Practice Guidelines as Topic , Reproducibility of ResultsABSTRACT
Adenosine-to-inosine (A-to-I) editing is widespread across the kingdom Metazoa. However, for the lack of comprehensive analysis in nonmodel animals, the evolutionary history of A-to-I editing remains largely unexplored. Here, we detect high-confidence editing sites using clustering and conservation strategies based on RNA sequencing data alone, without using single-nucleotide polymorphism information or genome sequencing data from the same sample. We thereby unveil the first evolutionary landscape of A-to-I editing maps across 20 metazoan species (from worm to human), providing unprecedented evidence on how the editing mechanism gradually expands its territory and increases its influence along the history of evolution. Our result revealed that highly clustered and conserved editing sites tended to have a higher editing level and a higher magnitude of the ADAR motif. The ratio of the frequencies of nonsynonymous editing to that of synonymous editing remarkably increased with increasing the conservation level of A-to-I editing. These results thus suggest potentially functional benefit of highly clustered and conserved editing sites. In addition, spatiotemporal dynamics analyses reveal a conserved enrichment of editing and ADAR expression in the central nervous system throughout more than 300 Myr of divergent evolution in complex animals and the comparability of editing patterns between invertebrates and between vertebrates during development. This study provides evolutionary and dynamic aspects of A-to-I editome across metazoan species, expanding this important but understudied class of nongenomically encoded events for comprehensive characterization.
Subject(s)
Adenosine/genetics , Inosine/genetics , RNA Editing , RNA/genetics , Animals , Cluster Analysis , Evolution, Molecular , Humans , Sequence Analysis, RNAABSTRACT
Genome-wide analysis has observed an excess of coincident single nucleotide polymorphisms (coSNPs) at human-chimpanzee orthologous positions, and suggested that this is due to cryptic variation in the mutation rate. While this phenomenon primarily corresponds with non-coding coSNPs, the situation in coding sequences remains unclear. Here we calculate the observed-to-expected ratio of coSNPs (coSNPO/E) to estimate the prevalence of human-chimpanzee coSNPs, and show that the excess of coSNPs is also present in coding regions. Intriguingly, coSNPO/E is much higher at zero-fold than at nonzero-fold degenerate sites; such a difference is due to an elevation of coSNPO/E at zero-fold degenerate sites, rather than a reduction at nonzero-fold degenerate ones. These trends are independent of chimpanzee subpopulation, population size, or sequencing techniques; and hold in broad generality across primates. We find that this discrepancy cannot fully explained by sequence contexts, shared ancestral polymorphisms, SNP density, and recombination rate, and that coSNPO/E in coding sequences is significantly influenced by purifying selection. We also show that selection and mutation rate affect coSNPO/E independently, and coSNPs tend to be less damaging and more correlated with human diseases than non-coSNPs. These suggest that coSNPs may represent a "signature" during primate protein evolution.