RESUMEN
Technological advances enabling massively parallel measurement of biological features - such as microarrays, high-throughput sequencing and mass spectrometry - have ushered in the omics era, now in its third decade. The resulting complex landscape of analytical methods has naturally fostered the growth of an omics benchmarking industry. Benchmarking refers to the process of objectively comparing and evaluating the performance of different computational or analytical techniques when processing and analysing large-scale biological data sets, such as transcriptomics, proteomics and metabolomics. With thousands of omics benchmarking studies published over the past 25 years, the field has matured to the point where the foundations of benchmarking have been established and well described. However, generating meaningful benchmarking data and properly evaluating performance in this complex domain remains challenging. In this Review, we highlight some common oversights and pitfalls in omics benchmarking. We also establish a methodology to bring the issues that can be addressed into focus and to be transparent about those that cannot: this takes the form of a spreadsheet template of guidelines for comprehensive reporting, intended to accompany publications. In addition, a survey of recent developments in benchmarking is provided as well as specific guidance for commonly encountered difficulties.
Asunto(s)
Benchmarking , Proteómica , Proteómica/métodos , Metabolómica/métodos , Perfilación de la Expresión Génica , Espectrometría de MasasRESUMEN
Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully length messenger RNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in polymerase chain reaction (PCR) amplification, barcode read errors and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.
Asunto(s)
Genoma , ARN , RNA-Seq , Análisis de Secuencia de ARN , Simulación por Computador , ARN/genética , Secuenciación de Nucleótidos de Alto RendimientoRESUMEN
BACKGROUND: Full-length isoform quantification from RNA-Seq is a key goal in transcriptomics analyses and has been an area of active development since the beginning. The fundamental difficulty stems from the fact that RNA transcripts are long, while RNA-Seq reads are short. RESULTS: Here we use simulated benchmarking data that reflects many properties of real data, including polymorphisms, intron signal and non-uniform coverage, allowing for systematic comparative analyses of isoform quantification accuracy and its impact on differential expression analysis. Genome, transcriptome and pseudo alignment-based methods are included; and a simple approach is included as a baseline control. CONCLUSIONS: Salmon, kallisto, RSEM, and Cufflinks exhibit the highest accuracy on idealized data, while on more realistic data they do not perform dramatically better than the simple approach. We determine the structural parameters with the greatest impact on quantification accuracy to be length and sequence compression complexity and not so much the number of isoforms. The effect of incomplete annotation on performance is also investigated. Overall, the tested methods show sufficient divergence from the truth to suggest that full-length isoform quantification and isoform level DE should still be employed selectively.
Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Isoformas de Proteínas/genética , RNA-Seq , Análisis de Secuencia de ARNRESUMEN
BACKGROUND: The accurate interpretation of RNA-Seq data presents a moving target as scientists continue to introduce new experimental techniques and analysis algorithms. Simulated datasets are an invaluable tool to accurately assess the performance of RNA-Seq analysis methods. However, existing RNA-Seq simulators focus on modeling the technical biases and artifacts of sequencing, rather than on simulating the original RNA samples. A first step in simulating RNA-Seq is to simulate RNA. RESULTS: To fill this need, we developed the Configurable And Modular Program Allowing RNA Expression Emulation (CAMPAREE), a simulator using empirical data to simulate diploid RNA samples at the level of individual molecules. We demonstrated CAMPAREE's use for generating idealized coverage plots from real data, and for adding the ability to generate allele-specific data to existing RNA-Seq simulators that do not natively support this feature. CONCLUSIONS: Separating input sample modeling from library preparation/sequencing offers added flexibility for both users and developers to mix-and-match different sample and sequencing simulators to suit their specific needs. Furthermore, the ability to maintain sample and sequencing simulators independently provides greater agility to incorporate new biological findings about transcriptomics and new developments in sequencing technologies. Additionally, by simulating at the level of individual molecules, CAMPAREE has the potential to model molecules transcribed from the same genes as a heterogeneous population of transcripts with different states of degradation and processing (splicing, editing, etc.). CAMPAREE was developed in Python, is open source, and freely available at https://github.com/itmat/CAMPAREE .
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Algoritmos , Perfilación de la Expresión Génica , ARN/genética , Análisis de Secuencia de ARNRESUMEN
Immune checkpoint inhibitors (ICIs) that target programmed cell death 1 (PD-1) have revolutionized cancer treatment by enabling the restoration of suppressed T-cell cytotoxic responses. However, resistance to single-agent ICIs limits their clinical utility. Combinatorial strategies enhance their antitumor effects, but may also enhance the risk of immune related adverse effects of ICIs. Prostaglandin (PG) E2, formed by the sequential action of the cyclooxygenase (COX) and microsomal PGE synthase (mPGES-1) enzymes, acting via its E prostanoid (EP) receptors, EPr2 and EPr4, promotes lymphocyte exhaustion, revealing an additional target for ICIs. Thus, COX inhibitors and EPr4 antagonists are currently being combined with ICIs potentially to enhance antitumor efficacy in clinical trials. However, given the cardiovascular (CV) toxicity of COX inhibitors, such combinations may increase the risk particularly of CV AEs. Here, we compared the impact of distinct approaches to disruption of the PGE2 synthesis /response pathway - global or myeloid cell specific depletion of mPges-1 or global depletion of Epr4 - on the accelerated atherogenesis in Pd-1 deficient hyperlipidemic (Ldlr-/-) mice. All strategies restrained the atherogenesis. While depletion of mPGES-1 suppresses PGE2 biosynthesis, reflected by its major urinary metabolite, PGE2 biosynthesis was increased in mice lacking EPr4, consistent with enhanced expression of aortic Cox-1 and mPges-1. Deletions of mPges-1 and Epr4 differed in their effects on immune cell populations in atherosclerotic plaques; the former reduced neutrophil infiltration, while the latter restrained macrophages and increased the infiltration of T-cells. Consistent with these findings, chemotaxis by bone-marrow derived macrophages from Epr4-/- mice was impaired. Epr4 depletion also resulted in extramedullary lymphoid hematopoiesis and inhibition of lipoprotein lipase activity (LPL) with coincident spelenomegaly, leukocytosis and dyslipidemia. Targeting either mPGES-1 or EPr4 may restrain lymphocyte exhaustion while mitigating CV irAEs consequent to PD-1 blockade.
RESUMEN
Aging is associated with a number of physiologic changes including perturbed circadian rhythms; however, mechanisms by which rhythms are altered remain unknown. To test the idea that circulating factors mediate age-dependent changes in peripheral rhythms, we compared the ability of human serum from young and old individuals to synchronize circadian rhythms in culture. We collected blood from apparently healthy young (age 25-30) and old (age 70-76) individuals and used the serum to synchronize cultured fibroblasts. We found that young and old sera are equally competent at driving robust ~24h oscillations of a luciferase reporter driven by clock gene promoter. However, cyclic gene expression is affected, such that young and old sera drive cycling of different genes. While genes involved in the cell cycle and transcription/translation remain rhythmic in both conditions, genes identified by STRING and IPA analyses as associated with oxidative phosphorylation and Alzheimer's Disease lose rhythmicity in the aged condition. Also, the expression of cycling genes associated with cholesterol biosynthesis increases in the cells entrained with old serum. We did not observe a global difference in the distribution of phase between groups, but find that peak expression of several clock controlled genes (PER3, NR1D1, NR1D2, CRY1, CRY2, and TEF) lags in the cells synchronized with old serum. Taken together, these findings demonstrate that age-dependent blood-borne factors affect peripheral circadian rhythms in cells and have the potential to impact health and disease via maintaining or disrupting rhythms respectively.
RESUMEN
Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking, and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully-length mRNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM, or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in PCR amplification, barcode read errors, and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.
RESUMEN
Lipids may influence cellular penetrance by pathogens and the immune response that they evoke. Here we find a broad based lipidomic storm driven predominantly by secretory (s) phospholipase A 2 (sPLA 2 ) dependent eicosanoid production occurs in patients with sepsis of viral and bacterial origin and relates to disease severity in COVID-19. Elevations in the cyclooxygenase (COX) products of arachidonic acid (AA), PGD 2 and PGI 2 , and the AA lipoxygenase (LOX) product, 12-HETE, and a reduction in the high abundance lipids, ChoE 18:3, LPC-O-16:0 and PC-O-30:0 exhibit relative specificity for COVID-19 amongst such patients, correlate with the inflammatory response and link to disease severity. Linoleic acid (LA) binds directly to SARS-CoV-2 and both LA and its di-HOME products reflect disease severity in COVID-19. AA and LA metabolites and LPC-O-16:0 linked variably to the immune response. These studies yield prognostic biomarkers and therapeutic targets for patients with sepsis, including COVID-19. An interactive purpose built interactive network analysis tool was developed, allowing the community to interrogate connections across these multiomic data and generate novel hypotheses.
RESUMEN
BACKGROUND: Lipids may influence cellular penetrance by viral pathogens and the immune response that they evoke. We deeply phenotyped the lipidomic response to SARs-CoV-2 and compared that with infection with other pathogens in patients admitted with acute respiratory distress syndrome to an intensive care unit (ICU). METHODS: Mass spectrometry was used to characterise lipids and relate them to proteins, peripheral cell immunotypes and disease severity. RESULTS: Circulating phospholipases (sPLA2, cPLA2 (PLA2G4A) and PLA2G2D) were elevated on admission in all ICU groups. Cyclooxygenase, lipoxygenase and epoxygenase products of arachidonic acid (AA) were elevated in all ICU groups compared with controls. sPLA2 predicted severity in COVID-19 and correlated with TxA2, LTE4 and the isoprostane, iPF2α-III, while PLA2G2D correlated with LTE4. The elevation in PGD2, like PGI2 and 12-HETE, exhibited relative specificity for COVID-19 and correlated with sPLA2 and the interleukin-13 receptor to drive lymphopenia, a marker of disease severity. Pro-inflammatory eicosanoids remained correlated with severity in COVID-19 28 days after admission. Amongst non-COVID ICU patients, elevations in 5- and 15-HETE and 9- and 13-HODE reflected viral rather than bacterial disease. Linoleic acid (LA) binds directly to SARS-CoV-2 and both LA and its di-HOME products reflected disease severity in COVID-19. In healthy marines, these lipids rose with seroconversion. Eicosanoids linked variably to the peripheral cellular immune response. PGE2, TxA2 and LTE4 correlated with T cell activation, as did PGD2 with non-B non-T cell activation. In COVID-19, LPS stimulated peripheral blood mononuclear cell PGF2α correlated with memory T cells, dendritic and NK cells while LA and DiHOMEs correlated with exhausted T cells. Three high abundance lipids - ChoE 18:3, LPC-O-16:0 and PC-O-30:0 - were altered specifically in COVID. LPC-O-16:0 was strongly correlated with T helper follicular cell activation and all three negatively correlated with multi-omic inflammatory pathways and disease severity. CONCLUSIONS: A broad based lipidomic storm is a predictor of poor prognosis in ARDS. Alterations in sPLA2, PGD2 and 12-HETE and the high abundance lipids, ChoE 18:3, LPC-O-16:0 and PC-O-30:0 exhibit relative specificity for COVID-19 amongst such patients and correlate with the inflammatory response to link to disease severity.
Asunto(s)
COVID-19 , Fosfolipasas A2 Secretoras , Sepsis , Humanos , SARS-CoV-2 , Ácido 12-Hidroxi-5,8,10,14-Eicosatetraenoico , Lipidómica , Leucocitos Mononucleares , Leucotrieno E4 , Prostaglandina D2 , Ciclooxigenasa 2 , EicosanoidesRESUMEN
Circadian omics analyses present investigators with large amounts of data to consider and many choices for methods of analysis. Visualization is crucial as rhythmicity can take many forms and p-values offer an incomplete picture. Yet statically viewing the entirety of high-throughput datasets is impractical, and there is often limited ability to assess the impact of choices, such as significance threshold cutoffs. Nitecap provides an intuitive and unified web-based solution to these problems. Through highly responsive visualizations, Nitecap enables investigators to see dataset-wide behavior. It supports deep analyses, including comparisons of two conditions. Moreover, it focuses upon ease-of-use and enables collaboration through dataset sharing. As an application, we investigated cross talk between peripheral clocks in adipose and liver tissues and determined that adipocyte clock disruption does not substantially modulate the transcriptional rhythmicity of liver but does advance the phase of core clock gene Bmal1 (Arntl) expression in the liver. Nitecap is available at nitecap.org and is free-to-use.
Asunto(s)
Factores de Transcripción ARNTL , Relojes Circadianos , Factores de Transcripción ARNTL/genética , Factores de Transcripción ARNTL/metabolismo , Proteínas CLOCK/genética , Relojes Circadianos/genética , Ritmo Circadiano/genética , Hígado/metabolismo , Programas InformáticosRESUMEN
During the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, providing safe in-person schooling has been a dynamic process balancing evolving community disease burden, scientific information, and local regulatory requirements with the mandate for education. Considerations include the health risks of SARS-CoV-2 infection and its post-acute sequelae, the impact of remote learning or periods of quarantine on education and well-being of children, and the contribution of schools to viral circulation in the community. The risk for infections that may occur within schools is related to the incidence of SARS-CoV-2 infections within the local community. Thus, persistent suppression of viral circulation in the community through effective public health measures including vaccination is critical to in-person schooling. Evidence suggests that the likelihood of transmission of SARS-CoV-2 within schools can be minimized if mitigation strategies are rationally combined. This article reviews evidence-based approaches and practices for the continual operation of in-person schooling.