RESUMO
The growth of omic data presents evolving challenges in data manipulation, analysis and integration. Addressing these challenges, Bioconductor provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming offers a revolutionary data organization and manipulation standard. Here we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analyzing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas, spanning six data frameworks and ten analysis tools.
Assuntos
Software , Humanos , Biologia Computacional/métodos , Leucócitos Mononucleares/metabolismo , Leucócitos Mononucleares/citologia , Genômica/métodos , Análise de DadosRESUMO
MOTIVATION: Deriving biological insights from genomic data commonly requires comparing attributes of selected genomic loci to a null set of loci. The selection of this null set is non-trivial, as it requires careful consideration of potential covariates, a problem that is exacerbated by the non-uniform distribution of genomic features including genes, enhancers, and transcription factor binding sites. Propensity score-based covariate matching methods allow the selection of null sets from a pool of possible items while controlling for multiple covariates; however, existing packages do not operate on genomic data classes and can be slow for large data sets making them difficult to integrate into genomic workflows. RESULTS: To address this, we developed matchRanges, a propensity score-based covariate matching method for the efficient and convenient generation of matched null ranges from a set of background ranges within the Bioconductor framework. AVAILABILITY AND IMPLEMENTATION: Package: https://bioconductor.org/packages/nullranges, Code: https://github.com/nullranges, Documentation: https://nullranges.github.io/nullranges.
Assuntos
Genômica , Software , Genômica/métodos , Genoma , Sequências Reguladoras de Ácido Nucleico , Projetos de PesquisaRESUMO
MOTIVATION: Enrichment analysis is a widely utilized technique in genomic analysis that aims to determine if there is a statistically significant association between two sets of genomic features. To conduct this type of hypothesis testing, an appropriate null model is typically required. However, the null distribution that is commonly used can be overly simplistic and may result in inaccurate conclusions. RESULTS: bootRanges provides fast functions for generation of block bootstrapped genomic ranges representing the null hypothesis in enrichment analysis. As part of a modular workflow, bootRanges offers greater flexibility for computing various test statistics leveraging other Bioconductor packages. We show that shuffling or permutation schemes may result in overly narrow test statistic null distributions and over-estimation of statistical significance, while creating new range sets with a block bootstrap preserves local genomic correlation structure and generates more reliable null distributions. It can also be used in more complex analyses, such as accessing correlations between cis-regulatory elements (CREs) and genes across cell types or providing optimized thresholds, e.g. log fold change (logFC) from differential analysis. AVAILABILITY AND IMPLEMENTATION: bootRanges is freely available in the R/Bioconductor package nullranges hosted at https://bioconductor.org/packages/nullranges.
Assuntos
Genoma , Genômica , Genômica/métodos , SoftwareRESUMO
SUMMARY: Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g. centromeres, telomeres, short arms) create additional considerations in generating exclusion sets. We generated exclusion sets for the latest human T2T-CHM13 and mouse GRCm39 genomes and systematically assembled and annotated these and other sets in the excluderanges R/Bioconductor data package, also accessible via the BEDbase.org API. The package provides unified access to 82 GenomicRanges objects covering six organisms, multiple genome assemblies, and types of exclusion regions. For human hg38 genome assembly, we recommend hg38.Kundaje.GRCh38_unified_blacklist as the most well-curated and annotated, and sets generated by the Blacklist tool for other organisms. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/excluderanges/. Package website: https://dozmorovlab.github.io/excluderanges/.
Assuntos
Genoma Humano , Software , Animais , Humanos , Camundongos , IncertezaRESUMO
MOTIVATION: Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation, which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial- or time-dependent AI signals may be dampened or not detected. RESULTS: We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing data, or dynamics AI from other spatially or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower Root Mean Square Error (RMSE) of allelic ratio estimates than existing methods. In real data, airpart identified differential allelic imbalance patterns across cell states and could be used to define trends of AI signal over spatial or time axes. AVAILABILITY AND IMPLEMENTATION: The airpart package is available as an R/Bioconductor package at https://bioconductor.org/packages/airpart. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Desequilíbrio Alélico , Modelos Estatísticos , Alelos , Teorema de Bayes , Simulação por Computador , SoftwareRESUMO
PURPOSE: CALGB (Alliance)/SWOG 80405 was a randomized phase III trial that in first-line patients with metastatic colorectal cancer (mCRC) treated with bevacizumab or cetuximab with chemotherapy. We aimed to discover novel mutated genes associated with prognosis and differential response to therapy with the biologics. METHODS: Primary tumor DNA from 548 patients was sequenced using FoundationOne. The effect of mutated genes and mutations on overall survival (OS) was tested adjusting for microsatellite instability status, BRAF V600E, all RAS mutations, arm, sex, and age. RESULTS: The median number (lower-upper quartile) of mutated genes was 5 (3-7), 5 (3-6) in microsatellite stable and 12.5 (4.5-32) in microsatellite instability-high tumors. Mutated KRAS and APC were more frequent in Black (53% and 85%) than White (27% and 65%, respectively) patients while BRAF V600E was less frequent in Black (5%) than White (14%) patients. The median OS in patients with BRAF non-V600E (2.2% of patients) was 31.9 months (95% CI, 15.1 to not applicable [NA]) similar to that of BRAF wild-type (WT) patients (31.2 months [95% CI, 29.0 to 33.9]). Mutated LRP1B (10.7% of patients) was associated with improved OS compared with WT LRP1B (hazard ratio, 0.57 [95% CI, 0.40 to 0.80]). RNF43 (5.6% of patients) interacted with treatment arms as, in the cetuximab arm, patients with mutated RNF43 had a median OS of 11.5 (95% CI, 10.8 to NA) months compared with 30.1 (95% CI, 24.9 to 35.3) months in patients with WT RNF43, whereas in the bevacizumab arm, patients with mutated RNF43 had a median OS of 25.0 (95% CI, 14.2 to NA) months compared with 31.3 (95% CI, 29.0 to 34.3) months in patients with WT RNF43. CONCLUSION: These results can provide new tools to predict patient outcome and improve therapeutic decisions and trial participation in patient minorities. The molecular alterations identified in this study may direct biomarker-driven studies.
Assuntos
Neoplasias Colorretais , Humanos , Bevacizumab/uso terapêutico , Cetuximab , Neoplasias Colorretais/tratamento farmacológico , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Proteínas Proto-Oncogênicas B-raf/genética , Instabilidade de Microssatélites , Padrão de Cuidado , Mutação , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêuticoRESUMO
CRISPR epigenomic editing technologies enable functional interrogation of non-coding elements. However, current computational methods for guide RNA (gRNA) design do not effectively predict the power potential, molecular and cellular impact to optimize for efficient gRNAs, which are crucial for successful applications of these technologies. We present "launch-dCas9" (machine LeArning based UNified CompreHensive framework for CRISPR-dCas9) to predict gRNA impact from multiple perspectives, including cell fitness, wildtype abundance (gauging power potential), and gene expression in single cells. Our launchdCas9, built and evaluated using experiments involving >1 million gRNAs targeted across the human genome, demonstrates relatively high prediction accuracy (AUC up to 0.81) and generalizes across cell lines. Method-prioritized top gRNA(s) are 4.6-fold more likely to exert effects, compared to other gRNAs in the same cis-regulatory region. Furthermore, launchdCas9 identifies the most critical sequence-related features and functional annotations from >40 features considered. Our results establish launch-dCas9 as a promising approach to design gRNAs for CRISPR epigenomic experiments.
RESUMO
The growth of omic data presents evolving challenges in data manipulation, analysis, and integration. Addressing these challenges, Bioconductor1 provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming2 offers a revolutionary standard for data organisation and manipulation. Here, we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning, and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analysing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas3, spanning six data frameworks and ten analysis tools.
RESUMO
Summary: CTCF (CCCTC-binding factor) is an 11-zinc-finger DNA binding protein which regulates much of the eukaryotic genome's 3D structure and function. The diversity of CTCF binding motifs has led to a fragmented landscape of CTCF binding data. We collected position weight matrices of CTCF binding motifs and defined strand-oriented CTCF binding sites in the human and mouse genomes, including the recent Telomere to Telomere and mm39 assemblies. We included selected experimentally determined and predicted CTCF binding sites, such as CTCF-bound cis-regulatory elements from SCREEN ENCODE. We recommend filtering strategies for CTCF binding motifs and demonstrate that liftOver is a viable alternative to convert CTCF coordinates between assemblies. Our comprehensive data resource and usage recommendations can serve to harmonize and strengthen the reproducibility of genomic studies utilizing CTCF binding data. Availability and implementation: https://bioconductor.org/packages/CTCF. Companion website: https://dozmorovlab.github.io/CTCF/; Code to reproduce the analyses: https://github.com/dozmorovlab/CTCF.dev. Supplementary information: Supplementary data are available at Bioinformatics Advances online.
RESUMO
Experimental manipulation of gut microbes in animal models alters fear behavior and relevant neurocircuitry. In humans, the first year of life is a key period for brain development, the emergence of fearfulness, and the establishment of the gut microbiome. Variation in the infant gut microbiome has previously been linked to cognitive development, but its relationship with fear behavior and neurocircuitry is unknown. In this pilot study of 34 infants, we find that 1-year gut microbiome composition (Weighted Unifrac; lower abundance of Bacteroides, increased abundance of Veillonella, Dialister, and Clostridiales) is significantly associated with increased fear behavior during a non-social fear paradigm. Infants with increased richness and reduced evenness of the 1-month microbiome also display increased non-social fear. This study indicates associations of the human infant gut microbiome with fear behavior and possible relationships with fear-related brain structures on the basis of a small cohort. As such, it represents an important step in understanding the role of the gut microbiome in the development of human fear behaviors, but requires further validation with a larger number of participants.
Assuntos
Bacteroides/genética , Clostridiales/genética , Medo/psicologia , Microbioma Gastrointestinal/genética , Veillonella/genética , Veillonellaceae/genética , Adulto , Bacteroides/classificação , Bacteroides/isolamento & purificação , Encéfalo/fisiologia , Aleitamento Materno , Clostridiales/classificação , Clostridiales/isolamento & purificação , Fezes/microbiologia , Feminino , Humanos , Lactente , Fórmulas Infantis , Estudos Longitudinais , Masculino , Projetos Piloto , RNA Ribossômico 16S/genética , Veillonella/classificação , Veillonella/isolamento & purificação , Veillonellaceae/classificação , Veillonellaceae/isolamento & purificaçãoRESUMO
Both fluorescence and photoactivity activatable probes are particularly valuable for cancer theranostics as they allow for sensitive fluorescence diagnosis and on-demand photodynamic therapy (PDT) against targeted cancer cells at the same time, which undoubtedly promote the diagnostic accuracy and reduce the side effects on normal tissues/cells. Here, we show that enzyme-instructed self-assembly (EISA) is an ideal strategy to develop a both fluorescence and reactive oxygen species (ROS) generation capability activatable probe with aggregation-induced emission (AIE) signature. As a proof-of-concept, we design and synthesize a precursor TPE-Py-FpYGpYGpY that consists of an AIE luminogen (TPE-Py) and a short peptide with three tyrosine phosphates (pY), which permits selective fluorescence visualization and PDT of alkaline phosphatase (ALP)-overexpressed cancer cells. TPE-Py-FpYGpYGpY has good aqueous solubility thanks to the hydrophilic phosphotyrosine residues and hence leads to weak fluorescence and negligible ROS generation ability. After ALP enzymatic dephosphorylation of the precursors, however, self-assembly of ALP-catalysed products occurs and the resultant nanostructures are activated to be highly emissive and efficiently produce ROS. Cellular studies reveal that TPE-Py-FpYGpYGpY is capable of differentiating cancer cells and normal cells, specifically pinpointing and suppressing ALP-overexpressed cancer cells. This study may inspire new insights into the design of advanced activatable molecular probes.