RESUMEN
Technological advances enabling massively parallel measurement of biological features - such as microarrays, high-throughput sequencing and mass spectrometry - have ushered in the omics era, now in its third decade. The resulting complex landscape of analytical methods has naturally fostered the growth of an omics benchmarking industry. Benchmarking refers to the process of objectively comparing and evaluating the performance of different computational or analytical techniques when processing and analysing large-scale biological data sets, such as transcriptomics, proteomics and metabolomics. With thousands of omics benchmarking studies published over the past 25 years, the field has matured to the point where the foundations of benchmarking have been established and well described. However, generating meaningful benchmarking data and properly evaluating performance in this complex domain remains challenging. In this Review, we highlight some common oversights and pitfalls in omics benchmarking. We also establish a methodology to bring the issues that can be addressed into focus and to be transparent about those that cannot: this takes the form of a spreadsheet template of guidelines for comprehensive reporting, intended to accompany publications. In addition, a survey of recent developments in benchmarking is provided as well as specific guidance for commonly encountered difficulties.
Asunto(s)
Benchmarking , Proteómica , Proteómica/métodos , Metabolómica/métodos , Perfilación de la Expresión Génica , Espectrometría de MasasRESUMEN
Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully length messenger RNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in polymerase chain reaction (PCR) amplification, barcode read errors and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.
Asunto(s)
Genoma , ARN , RNA-Seq , Análisis de Secuencia de ARN , Simulación por Computador , ARN/genética , Secuenciación de Nucleótidos de Alto RendimientoRESUMEN
Evidence is scarce to guide the use of nonsteroidal anti-inflammatory drugs (NSAIDs) to mitigate severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) vaccine-related adverse effects, given the possibility of blunting the desired immune response. In this pilot study, we deeply phenotyped a small number of volunteers who did or did not take NSAIDs concomitant with SARS-CoV-2 immunizations to seek initial information on the immune response. A SARS-CoV-2 vaccine-specific receptor binding domain (RBD) IgG antibody response and efficacy in the evoked neutralization titers were evident irrespective of concomitant NSAID consumption. Given the sample size, only a large and consistent signal of immunomodulation would have been detectable, and this was not apparent. However, the information gathered may inform the design of a definitive clinical trial. Here we report a series of divergent omics signals that invites additional hypotheses testing. SIGNIFICANCE STATEMENT: The impact of nonsteroidal anti-inflammatory drugs (NSAIDs) on the immune response elicited by repeat severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) immunizations was profiled by immunophenotypic, proteomic, and metabolomic approaches in a clinical pilot study of small sample size. A SARS-CoV-2 vaccine-specific immune response was evident irrespective of concomitant NSAID consumption. The information gathered may inform the design of a definitive clinical trial.
Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , COVID-19/prevención & control , Vacunas contra la COVID-19/efectos adversos , Proyectos Piloto , Proteómica , Anticuerpos Antivirales , Inmunoglobulina G , Vacunación , Inmunidad , AntiinflamatoriosRESUMEN
Circadian disruption has multiple pathological consequences, but the underlying mechanisms are largely unknown. To address such mechanisms, we subjected transformed cultured cells to chronic circadian desynchrony (CCD), mimicking a chronic jet-lag scheme, and assayed a range of cellular functions. The results indicated a specific circadian clock-dependent increase in cell proliferation. Transcriptome analysis revealed up-regulation of G1/S phase transition genes (myelocytomatosis oncogene cellular homolog [Myc], cyclin D1/3, chromatin licensing and DNA replication factor 1 [Cdt1]), concomitant with increased phosphorylation of the retinoblastoma (RB) protein by cyclin-dependent kinase (CDK) 4/6 and increased G1-S progression. Phospho-RB (Ser807/811) was found to oscillate in a circadian fashion and exhibit phase-shifted rhythms in circadian desynchronized cells. Consistent with circadian regulation, a CDK4/6 inhibitor approved for cancer treatment reduced growth of cultured cells and mouse tumors in a time-of-day-specific manner. Our study identifies a mechanism that underlies effects of circadian disruption on tumor growth and underscores the use of treatment timed to endogenous circadian rhythms.
Asunto(s)
Trastornos Cronobiológicos/metabolismo , Ritmo Circadiano/fisiología , Neoplasias/metabolismo , Animales , Ciclo Celular/fisiología , División Celular/fisiología , Línea Celular , Quinasa 4 Dependiente de la Ciclina , Quinasa 6 Dependiente de la Ciclina , Quinasas Ciclina-Dependientes/metabolismo , Fase G1/fisiología , Humanos , Masculino , Ratones , Ratones Endogámicos C57BL , Fosforilación , Proteínas Proto-Oncogénicas/genética , Proteína de Retinoblastoma , Fase S/fisiologíaRESUMEN
BACKGROUND: Full-length isoform quantification from RNA-Seq is a key goal in transcriptomics analyses and has been an area of active development since the beginning. The fundamental difficulty stems from the fact that RNA transcripts are long, while RNA-Seq reads are short. RESULTS: Here we use simulated benchmarking data that reflects many properties of real data, including polymorphisms, intron signal and non-uniform coverage, allowing for systematic comparative analyses of isoform quantification accuracy and its impact on differential expression analysis. Genome, transcriptome and pseudo alignment-based methods are included; and a simple approach is included as a baseline control. CONCLUSIONS: Salmon, kallisto, RSEM, and Cufflinks exhibit the highest accuracy on idealized data, while on more realistic data they do not perform dramatically better than the simple approach. We determine the structural parameters with the greatest impact on quantification accuracy to be length and sequence compression complexity and not so much the number of isoforms. The effect of incomplete annotation on performance is also investigated. Overall, the tested methods show sufficient divergence from the truth to suggest that full-length isoform quantification and isoform level DE should still be employed selectively.
Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Isoformas de Proteínas/genética , RNA-Seq , Análisis de Secuencia de ARNRESUMEN
BACKGROUND: The accurate interpretation of RNA-Seq data presents a moving target as scientists continue to introduce new experimental techniques and analysis algorithms. Simulated datasets are an invaluable tool to accurately assess the performance of RNA-Seq analysis methods. However, existing RNA-Seq simulators focus on modeling the technical biases and artifacts of sequencing, rather than on simulating the original RNA samples. A first step in simulating RNA-Seq is to simulate RNA. RESULTS: To fill this need, we developed the Configurable And Modular Program Allowing RNA Expression Emulation (CAMPAREE), a simulator using empirical data to simulate diploid RNA samples at the level of individual molecules. We demonstrated CAMPAREE's use for generating idealized coverage plots from real data, and for adding the ability to generate allele-specific data to existing RNA-Seq simulators that do not natively support this feature. CONCLUSIONS: Separating input sample modeling from library preparation/sequencing offers added flexibility for both users and developers to mix-and-match different sample and sequencing simulators to suit their specific needs. Furthermore, the ability to maintain sample and sequencing simulators independently provides greater agility to incorporate new biological findings about transcriptomics and new developments in sequencing technologies. Additionally, by simulating at the level of individual molecules, CAMPAREE has the potential to model molecules transcribed from the same genes as a heterogeneous population of transcripts with different states of degradation and processing (splicing, editing, etc.). CAMPAREE was developed in Python, is open source, and freely available at https://github.com/itmat/CAMPAREE .
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Algoritmos , Perfilación de la Expresión Génica , ARN/genética , Análisis de Secuencia de ARNRESUMEN
BACKGROUND: Large-scale, placebo-controlled trials established that nonsteroidal anti-inflammatory drugs confer a cardiovascular hazard: this has been attributed to depression of cardioprotective products of cyclooxygenase (COX)-2, especially prostacyclin. An alternative mechanism by which nonsteroidal anti-inflammatory drugs might constrain cardioprotection is by enhancing the formation of methylarginines in the kidney that would limit the action of nitric oxide throughout the vasculature. METHODS: Targeted and untargeted metabolomics were used to investigate the effect of COX-2 deletion or inhibition in mice and in osteoarthritis patients exposed to nonsteroidal anti-inflammatory drugs on the l-arginine/nitric oxide pathway. RESULTS: Analysis of the plasma and renal metabolome was performed in postnatal tamoxifen-inducible Cox-2 knockout mice, which exhibit normal renal function and blood pressure. This revealed no changes in arginine and methylarginines compared with their wild-type controls. Moreover, the expression of genes in the l-arginine/nitric oxide pathway was not altered in the renal medulla or cortex of tamoxifen inducible Cox-2 knockout mice. Therapeutic concentrations of the selective COX-2 inhibitors, rofecoxib, celecoxib, and parecoxib, none of which altered basal blood pressure or renal function as reflected by plasma creatinine, failed to elevate plasma arginine and methylarginines in mice. Finally, plasma arginine or methylarginines were not altered in osteoarthritis patients with confirmed exposure to nonsteroidal anti-inflammatory drugs that inhibit COX-1 and COX-2. By contrast, plasma asymmetrical dimethylarginine was increased in mice infused with angiotensin II sufficient to elevate blood pressure and impair renal function. Four weeks later, blood pressure, plasma creatinine, and asymmetrical dimethylarginine were restored to normal levels. The increase in asymmetrical dimethylarginine in response to infusion with angiotensin II in celecoxib-treated mice was also related to transient impairment of renal function. CONCLUSIONS: Plasma methylarginines are not altered by COX-2 deletion or inhibition but rather are elevated coincident with renal compromise.
Asunto(s)
Antiinflamatorios no Esteroideos/efectos adversos , Arginina/análogos & derivados , Enfermedades Cardiovasculares/etiología , Ciclooxigenasa 2/metabolismo , Animales , Antiinflamatorios no Esteroideos/sangre , Antiinflamatorios no Esteroideos/farmacología , Antiinflamatorios no Esteroideos/uso terapéutico , Arginina/sangre , Presión Sanguínea/efectos de los fármacos , Nitrógeno de la Urea Sanguínea , Celecoxib/farmacología , Creatinina/sangre , Ciclooxigenasa 1/metabolismo , Ciclooxigenasa 2/química , Ciclooxigenasa 2/genética , Inhibidores de la Ciclooxigenasa 2/farmacología , Humanos , Riñón/metabolismo , Metaboloma/efectos de los fármacos , Ratones , Ratones Endogámicos C57BL , Ratones Noqueados , Osteoartritis/tratamiento farmacológico , Osteoartritis/patología , Efecto PlaceboRESUMEN
Motivation: A key component in many RNA-Seq-based studies is contrasting multiple replicates from different experimental conditions. In this setup, replicates play a key role as they allow to capture underlying biological variability inherent to the compared conditions, as well as experimental variability. However, what constitutes a 'bad' replicate is not necessarily well defined. Consequently, researchers might discard valuable data or downstream analysis may be hampered by failed experiments. Results: Here we develop a probability model to weigh a given RNA-Seq sample as a representative of an experimental condition when performing alternative splicing analysis. We demonstrate that this model detects outlier samples which are consistently and significantly different compared with other samples from the same condition. Moreover, we show that instead of discarding such samples the proposed weighting scheme can be used to downweight samples and specific splicing variations suspected as outliers, gaining statistical power. These weights can then be used for differential splicing (DS) analysis, where the resulting algorithm offers a generalization of the MAJIQ algorithm. Using both synthetic and real-life data, we perform an extensive evaluation of the improved MAJIQ algorithm in different scenarios involving perturbed samples, mislabeled samples, same condition groups, and different levels of coverage, showing it compares favorably to other tools. Overall, this work offers an outlier detection algorithm that can be combined with any splicing pipeline, a generalized and improved version of MAJIQ for DS detection, and evaluation metrics with matching code and data for DS algorithms. Availability and implementation: Software and data are accessible via majiq.biociphers.org/norton_et_al_2017/. Contact: yosephb@upenn.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Algoritmos , Empalme Alternativo , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica , Programas InformáticosRESUMEN
Physiological and behavioral circadian rhythms are driven by a conserved transcriptional/translational negative feedback loop in mammals. Although most core clock factors are transcription factors, post-transcriptional control introduces delays that are critical for circadian oscillations. Little work has been done on circadian regulation of translation, so to address this deficit we conducted ribosome profiling experiments in a human cell model for an autonomous clock. We found that most rhythmic gene expression occurs with little delay between transcription and translation, suggesting that the lag in the accumulation of some clock proteins relative to their mRNAs does not arise from regulated translation. Nevertheless, we found that translation occurs in a circadian fashion for many genes, sometimes imposing an additional level of control on rhythmically expressed mRNAs and, in other cases, conferring rhythms on noncycling mRNAs. Most cyclically transcribed RNAs are translated at one of two major times in a 24-h day, while rhythmic translation of most noncyclic RNAs is phased to a single time of day. Unexpectedly, we found that the clock also regulates the formation of cytoplasmic processing (P) bodies, which control the fate of mRNAs, suggesting circadian coordination of mRNA metabolism and translation.
Asunto(s)
Relojes Circadianos/genética , Ritmo Circadiano/genética , Regulación de la Expresión Génica , Biosíntesis de Proteínas , Ribosomas/genética , Ribosomas/metabolismo , Transcriptoma , Línea Celular Tumoral , Humanos , Sistemas de Lectura Abierta , Proteínas Proto-Oncogénicas/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Proteínas de Unión al ARN/genética , Transcripción GenéticaRESUMEN
BACKGROUND: Though Illumina has largely dominated the RNA-Seq field, the simultaneous availability of Ion Torrent has left scientists wondering which platform is most effective for differential gene expression (DGE) analysis. Previous investigations of this question have typically used reference samples derived from cell lines and brain tissue, and do not involve biological variability. While these comparisons might inform studies of tissue-specific expression, marked by large-scale transcriptional differences, this is not the common use case. RESULTS: Here we employ a standard treatment/control experimental design, which enables us to evaluate these platforms in the context of the expression differences common in differential gene expression experiments. Specifically, we assessed the hepatic inflammatory response of mice by assaying liver RNA from control and IL-1ß treated animals with both the Illumina HiSeq and the Ion Torrent Proton sequencing platforms. We found the greatest difference between the platforms at the level of read alignment, a moderate level of concordance at the level of DGE analysis, and nearly identical results at the level of differentially affected pathways. Interestingly, we also observed a strong interaction between sequencing platform and choice of aligner. By aligning both real and simulated Illumina and Ion Torrent data with the twelve most commonly-cited aligners in the literature, we observed that different aligner and platform combinations were better suited to probing different genomic features; for example, disentangling the source of expression in gene-pseudogene pairs. CONCLUSIONS: Taken together, our results indicate that while Illumina and Ion Torrent have similar capacities to detect changes in biology from a treatment/control experiment, these platforms may be tailored to interrogate different transcriptional phenomena through careful selection of alignment software.
Asunto(s)
Perfilación de la Expresión Génica , Análisis de Secuencia de ARN/métodos , Algoritmos , Secuenciación de Nucleótidos de Alto RendimientoRESUMEN
To characterize the role of the circadian clock in mouse physiology and behavior, we used RNA-seq and DNA arrays to quantify the transcriptomes of 12 mouse organs over time. We found 43% of all protein coding genes showed circadian rhythms in transcription somewhere in the body, largely in an organ-specific manner. In most organs, we noticed the expression of many oscillating genes peaked during transcriptional "rush hours" preceding dawn and dusk. Looking at the genomic landscape of rhythmic genes, we saw that they clustered together, were longer, and had more spliceforms than nonoscillating genes. Systems-level analysis revealed intricate rhythmic orchestration of gene pathways throughout the body. We also found oscillations in the expression of more than 1,000 known and novel noncoding RNAs (ncRNAs). Supporting their potential role in mediating clock function, ncRNAs conserved between mouse and human showed rhythmic expression in similar proportions as protein coding genes. Importantly, we also found that the majority of best-selling drugs and World Health Organization essential medicines directly target the products of rhythmic genes. Many of these drugs have short half-lives and may benefit from timed dosage. In sum, this study highlights critical, systemic, and surprising roles of the mammalian circadian clock and provides a blueprint for advancement in chronotherapy.
Asunto(s)
Ritmo Circadiano/fisiología , Bases de Datos de Ácidos Nucleicos , Regulación de la Expresión Génica/fisiología , Transcriptoma/fisiología , Animales , Cronoterapia/métodos , Perfilación de la Expresión Génica/métodos , Humanos , RatonesRESUMEN
MOTIVATION: Because of the advantages of RNA sequencing (RNA-Seq) over microarrays, it is gaining widespread popularity for highly parallel gene expression analysis. For example, RNA-Seq is expected to be able to provide accurate identification and quantification of full-length splice forms. A number of informatics packages have been developed for this purpose, but short reads make it a difficult problem in principle. Sequencing error and polymorphisms add further complications. It has become necessary to perform studies to determine which algorithms perform best and which if any algorithms perform adequately. However, there is a dearth of independent and unbiased benchmarking studies. Here we take an approach using both simulated and experimental benchmark data to evaluate their accuracy. RESULTS: We conclude that most methods are inaccurate even using idealized data, and that no method is highly accurate once multiple splice forms, polymorphisms, intron signal, sequencing errors, alignment errors, annotation errors and other complicating factors are present. These results point to the pressing need for further algorithm development. AVAILABILITY AND IMPLEMENTATION: Simulated datasets and other supporting information can be found at http://bioinf.itmat.upenn.edu/BEERS/bp2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Algoritmos , Empalme Alternativo , Perfilación de la Expresión Génica/métodos , Isoformas de ARN/análisis , Análisis de Secuencia de ARN/métodos , Animales , Benchmarking , Humanos , Ratones , ARN Mensajero/análisisRESUMEN
CircaDB (http://circadb.org) is a new database of circadian transcriptional profiles from time course expression experiments from mice and humans. Each transcript's expression was evaluated by three separate algorithms, JTK_Cycle, Lomb Scargle and DeLichtenberg. Users can query the gene annotations using simple and powerful full text search terms, restrict results to specific data sets and provide probability thresholds for each algorithm. Visualizations of the data are intuitive charts that convey profile information more effectively than a table of probabilities. The CircaDB web application is open source and available at http://github.com/itmat/circadb.
Asunto(s)
Ritmo Circadiano/genética , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Algoritmos , Animales , Humanos , Internet , RatonesRESUMEN
Background: Non-steroidal anti-inflammatory drugs (NSAIDs) increase the risk of adverse cardiovascular events via suppression of cyclooxygenase (COX)-2-derived prostacyclin (PGI2) formation in heart, vasculature, and kidney. The Prospective Randomized Evaluation of Celecoxib Integrated Safety versus Ibuprofen Or Naproxen (PRECISION) trial and other large clinical studies compared the cardiovascular risk of traditional NSAIDs (i.e. naproxen), which inhibit both COX isozymes, with NSAIDs selective for COX-2 (i.e. celecoxib). However, whether pharmacologically equipotent doses were used - that is, whether a similar degree of COX-2 inhibition was achieved - was not considered. We compared drug target inhibition and blood pressure response to celecoxib at the dose used by most patients in PRECISION with the lowest recommended naproxen dose for osteoarthritis, which is lower than the dose used in PRECISION. Methods: Sixteen healthy participants (19-61 years) were treated with celecoxib (100 mg every 12h), naproxen (250 mg every 12h), or placebo administered twice daily for seven days in a double-blind, crossover design randomized by order. On Day 7 when drug levels had reached steady state, the degree of COX inhibition was assessed ex vivo and in vivo. Ambulatory blood pressure was measured throughout the final 12h dosing interval. Results: Both NSAIDs inhibited COX-2 activity relative to placebo, but naproxen inhibited COX-2 activity to a greater degree (62.9±21.7%) than celecoxib (35.7±25.2%; p<0.05). Similarly, naproxen treatment inhibited PGI2 formation in vivo (48.0±24.9%) to a greater degree than celecoxib (26.7±24.6%; p<0.05). Naproxen significantly increased blood pressure compared to celecoxib (differences in least-square means of mean arterial pressure: 2.5 mm Hg (95% CI: 1.5, 3.5); systolic blood pressure: 4.0 mm Hg (95% CI: 2.9, 5.1); diastolic blood pressure: 1.8 mm Hg (95% CI: 0.8, 2.8); p<0.05 for all). The difference in systolic blood pressure relative to placebo was associated with the degree of COX-2 inhibition (p<0.05). Conclusions: Celecoxib 200 mg/day inhibited COX-2 activity to a lesser degree than naproxen 500 mg/day, resulting in a less pronounced blood pressure increase. While the PRECISION trial concluded the non-inferiority of celecoxib regarding cardiovascular risk, this is based on a comparison of doses that are not equipotent.ClinicalTrials.gov identifier: NCT02502006 (https://clinicaltrials.gov/study/NCT02502006).
RESUMEN
Aging is associated with a number of physiologic changes including perturbed circadian rhythms; however, mechanisms by which rhythms are altered remain unknown. To test the idea that circulating factors mediate age-dependent changes in peripheral rhythms, we compared the ability of human serum from young and old individuals to synchronize circadian rhythms in culture. We collected blood from apparently healthy young (age 25-30) and old (age 70-76) individuals and used the serum to synchronize cultured fibroblasts. We found that young and old sera are equally competent at driving robust ~24h oscillations of a luciferase reporter driven by clock gene promoter. However, cyclic gene expression is affected, such that young and old sera drive cycling of different genes. While genes involved in the cell cycle and transcription/translation remain rhythmic in both conditions, genes identified by STRING and IPA analyses as associated with oxidative phosphorylation and Alzheimer's Disease lose rhythmicity in the aged condition. Also, the expression of cycling genes associated with cholesterol biosynthesis increases in the cells entrained with old serum. We did not observe a global difference in the distribution of phase between groups, but find that peak expression of several clock controlled genes (PER3, NR1D1, NR1D2, CRY1, CRY2, and TEF) lags in the cells synchronized with old serum. Taken together, these findings demonstrate that age-dependent blood-borne factors affect peripheral circadian rhythms in cells and have the potential to impact health and disease via maintaining or disrupting rhythms respectively.
RESUMEN
Many chronic disease symptomatologies involve desynchronized sleep-wake cycles, indicative of disrupted biorhythms. This can be interrogated using body temperature rhythms, which have circadian as well as sleep-wake behavior/environmental evoked components. Here, we investigated the association of wrist temperature amplitudes with a future onset of disease in the UK Biobank one year after actigraphy. Among 425 disease conditions (range n = 200-6728) compared to controls (range n = 62,107-91,134), a total of 73 (17%) disease phenotypes were significantly associated with decreased amplitudes of wrist temperature (Benjamini-Hochberg FDR q < 0.05) and 26 (6.1%) PheCODEs passed a more stringent significance level (Bonferroni-correction α < 0.05). A two-standard deviation (1.8° Celsius) lower wrist temperature amplitude corresponded to hazard ratios of 1.91 (1.58-2.31 95% CI) for NAFLD, 1.69 (1.53-1.88) for type 2 diabetes, 1.25 (1.14-1.37) for renal failure, 1.23 (1.17-1.3) for hypertension, and 1.22 (1.11-1.33) for pneumonia (phenome-wide atlas available at http://bioinf.itmat.upenn.edu/biorhythm_atlas/ ). This work suggests peripheral thermoregulation as a digital biomarker.
Asunto(s)
Bancos de Muestras Biológicas , Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/epidemiología , Temperatura , Muñeca , Ritmo Circadiano , Reino Unido/epidemiologíaRESUMEN
Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking, and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully-length mRNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM, or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in PCR amplification, barcode read errors, and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.
RESUMEN
The ubiquity of RNA-seq has led to many methods that use RNA-seq data to analyze variations in RNA splicing. However, available methods are not well suited for handling heterogeneous and large datasets. Such datasets scale to thousands of samples across dozens of experimental conditions, exhibit increased variability compared to biological replicates, and involve thousands of unannotated splice variants resulting in increased transcriptome complexity. We describe here a suite of algorithms and tools implemented in the MAJIQ v2 package to address challenges in detection, quantification, and visualization of splicing variations from such datasets. Using both large scale synthetic data and GTEx v8 as benchmark datasets, we assess the advantages of MAJIQ v2 compared to existing methods. We then apply MAJIQ v2 package to analyze differential splicing across 2,335 samples from 13 brain subregions, demonstrating its ability to offer insights into brain subregion-specific splicing regulation.
Asunto(s)
Algoritmos , Empalme del ARN , RNA-Seq , Benchmarking , EncéfaloRESUMEN
Longitudinal studies associate shiftwork with cardiometabolic disorders but do not establish causation or elucidate mechanisms of disease. We developed a mouse model based on shiftwork schedules to study circadian misalignment in both sexes. Behavioral and transcriptional rhythmicity were preserved in female mice despite exposure to misalignment. Females were protected from the cardiometabolic impact of circadian misalignment on a high-fat diet seen in males. The liver transcriptome and proteome revealed discordant pathway perturbations between the sexes. Tissue-level changes were accompanied by gut microbiome dysbiosis only in male mice, biasing toward increased potential for diabetogenic branched chain amino acid production. Antibiotic ablation of the gut microbiota diminished the impact of misalignment. In the United Kingdom Biobank, females showed stronger circadian rhythmicity in activity and a lower incidence of metabolic syndrome than males among job-matched shiftworkers. Thus, we show that female mice are more resilient than males to chronic circadian misalignment and that these differences are conserved in humans.
Asunto(s)
Enfermedades Cardiovasculares , Microbioma Gastrointestinal , Humanos , Masculino , Femenino , Animales , Ratones , Dieta Alta en Grasa , Caracteres Sexuales , Ritmo CircadianoRESUMEN
Lipids may influence cellular penetrance by pathogens and the immune response that they evoke. Here we find a broad based lipidomic storm driven predominantly by secretory (s) phospholipase A 2 (sPLA 2 ) dependent eicosanoid production occurs in patients with sepsis of viral and bacterial origin and relates to disease severity in COVID-19. Elevations in the cyclooxygenase (COX) products of arachidonic acid (AA), PGD 2 and PGI 2 , and the AA lipoxygenase (LOX) product, 12-HETE, and a reduction in the high abundance lipids, ChoE 18:3, LPC-O-16:0 and PC-O-30:0 exhibit relative specificity for COVID-19 amongst such patients, correlate with the inflammatory response and link to disease severity. Linoleic acid (LA) binds directly to SARS-CoV-2 and both LA and its di-HOME products reflect disease severity in COVID-19. AA and LA metabolites and LPC-O-16:0 linked variably to the immune response. These studies yield prognostic biomarkers and therapeutic targets for patients with sepsis, including COVID-19. An interactive purpose built interactive network analysis tool was developed, allowing the community to interrogate connections across these multiomic data and generate novel hypotheses.