Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 299
Filter
1.
BMC Bioinformatics ; 25(1): 212, 2024 Jun 13.
Article in English | MEDLINE | ID: mdl-38872103

ABSTRACT

BACKGROUND: A vital step in analyzing single-cell data is ascertaining which cell types are present in a dataset, and at what abundance. In many diseases, the proportions of varying cell types can have important implications for health and prognosis. Most approaches for cell type annotation have centered around cell typing for single-cell RNA-sequencing (scRNA-seq) and have had promising success. However, reliable methods are lacking for many other single-cell modalities such as single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq), which quantifies the extent to which genes of interest in each cell are epigenetically "open" for expression. RESULTS: To leverage the informative potential of scATAC-seq data, we developed CAMML with the integration of chromatin accessibility (CAraCAl), a bioinformatic method that performs cell typing on scATAC-seq data. CAraCAl performs cell typing by scoring each cell for its enrichment of cell type-specific gene sets. These gene sets are composed of the most upregulated or downregulated genes present in each cell type according to projected gene activity. CONCLUSIONS: We found that CAraCAl does not improve performance beyond CAMML when scRNA-seq is present, but if only scATAC-seq is available, CAraCAl performs cell typing relatively successfully. As such, we also discuss best practices for cell typing and the strengths and weaknesses of various cell annotation options.


Subject(s)
Chromatin , Computational Biology , Chromatin/metabolism , Chromatin/genetics , Chromatin/chemistry , Computational Biology/methods , Humans , Single-Cell Analysis/methods , Software , Sequence Analysis, RNA/methods , Transposases/metabolism , Transposases/genetics
2.
Appl Netw Sci ; 9(1): 14, 2024.
Article in English | MEDLINE | ID: mdl-38699246

ABSTRACT

We present a novel approach for computing a variant of eigenvector centrality for multilayer networks with inter-layer constraints on node importance. Specifically, we consider a multilayer network defined by multiple edge-weighted, potentially directed, graphs over the same set of nodes with each graph representing one layer of the network and no inter-layer edges. As in the standard eigenvector centrality construction, the importance of each node in a given layer is based on the weighted sum of the importance of adjacent nodes in that same layer. Unlike standard eigenvector centrality, we assume that the adjacency relationship and the importance of adjacent nodes may be based on distinct layers. Importantly, this type of centrality constraint is only partially supported by existing frameworks for multilayer eigenvector centrality that use edges between nodes in different layers to capture inter-layer dependencies. For our model, constrained, layer-specific eigenvector centrality values are defined by a system of independent eigenvalue problems and dependent pseudo-eigenvalue problems, whose solution can be efficiently realized using an interleaved power iteration algorithm. We refer to this model, and the associated algorithm, as the Constrained Multilayer Centrality (CMLC) method. The characteristics of this approach, and of standard techniques based on inter-layer edges, are demonstrated on both a simple multilayer network and on a range of random graph models. An R package implementing the CMLC method along with example vignettes is available at https://hrfrost.host.dartmouth.edu/CMLC/.

3.
PLoS Comput Biol ; 20(4): e1012084, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38683883

ABSTRACT

We have developed a new, and analytically novel, single sample gene set testing method called Reconstruction Set Test (RESET). RESET quantifies gene set importance based on the ability of set genes to reconstruct values for all measured genes. RESET is realized using a computationally efficient randomized reduced rank reconstruction algorithm (available via the RESET R package on CRAN) that can effectively detect patterns of differential abundance and differential correlation for self-contained and competitive scenarios. As demonstrated using real and simulated scRNA-seq data, RESET provides superior performance at a lower computational cost relative to other single sample approaches.


Subject(s)
Algorithms , Computational Biology , Computational Biology/methods , Humans , Gene Expression Profiling/methods , Computer Simulation
4.
PLoS Comput Biol ; 20(1): e1011717, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38206988

ABSTRACT

We describe a novel single sample gene set testing method for cancer transcriptomics data named tissue-adjusted pathway analysis of cancer (TPAC). The TPAC method leverages information about the normal tissue-specificity of human genes to compute a robust multivariate distance score that quantifies gene set dysregulation in each profiled tumor. Because the null distribution of the TPAC scores has an accurate gamma approximation, both population and sample-level inference is supported. As we demonstrate through an analysis of gene expression data for 21 solid human cancers from The Cancer Genome Atlas (TCGA) and associated normal tissue expression data from the Human Protein Atlas (HPA), TPAC gene set scores are more strongly associated with patient prognosis than the scores generated by existing single sample gene set testing methods.


Subject(s)
Neoplasms , Humans , Neoplasms/genetics , Gene Expression Profiling/methods
5.
Bioinform Adv ; 3(1): vbad120, 2023.
Article in English | MEDLINE | ID: mdl-37745004

ABSTRACT

Summary: Doublets are usually considered an unwanted artifact of single-cell RNA-sequencing (scRNA-seq) and are only identified in datasets for the sake of removal. However, if cells have a juxtacrine interaction with one another in situ and maintain this association through an scRNA-seq processing pipeline that only partially dissociates the tissue, these doublets can provide meaningful biological information regarding the intercellular signals and processes occurring in the analyzed tissue. This is especially true for cases such as the immune compartment of the tumor microenvironment, where the frequency and the type of immune cell juxtacrine interactions can be a prognostic indicator. We developed Cell type-specific Interaction Analysis using Doublets in scRNA-seq (CIcADA) as a pipeline for identifying and analyzing biologically meaningful doublets in scRNA-seq data. CIcADA identifies putative doublets using multi-label cell type scores and characterizes interaction dynamics through a comparison against synthetic doublets of the same cell type composition. In performing CIcADA on several scRNA-seq tumor datasets, we found that the identified doublets were consistently upregulating expression of immune response genes. Availability and implementation: An R package implementing the CIcADA method is in development and will be released on CRAN, but for now it is available at https://github.com/schiebout/CAMML.

6.
Bioinform Adv ; 3(1): vbad073, 2023.
Article in English | MEDLINE | ID: mdl-37359727

ABSTRACT

Summary: The rapid development of single-cell transcriptomics has revolutionized the study of complex tissues. Single-cell RNA-sequencing (scRNA-seq) can profile tens-of-thousands of dissociated cells from a tissue sample, enabling researchers to identify cell types, phenotypes and interactions that control tissue structure and function. A key requirement of these applications is the accurate estimation of cell surface protein abundance. Although technologies to directly quantify surface proteins are available, these data are uncommon and limited to proteins with available antibodies. While supervised methods that are trained on Cellular Indexing of Transcriptomes and Epitopes by Sequencing data can provide the best performance, these training data are limited by available antibodies and may not exist for the tissue under investigation. In the absence of protein measurements, researchers must estimate receptor abundance from scRNA-seq data. Therefore, we developed a new unsupervised method for receptor abundance estimation using scRNA-seq data called SPECK (Surface Protein abundance Estimation using CKmeans-based clustered thresholding) and primarily evaluated its performance against unsupervised approaches for at least 25 human receptors and multiple tissue types. This analysis reveals that techniques based on a thresholded reduced rank reconstruction of scRNA-seq data are effective for receptor abundance estimation, with SPECK providing the best overall performance. Availability and implementation: SPECK is freely available at https://CRAN.R-project.org/package=SPECK. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

7.
bioRxiv ; 2023 Apr 20.
Article in English | MEDLINE | ID: mdl-37066315

ABSTRACT

We have developed a new, and analytically novel, single sample gene set testing method called Reconstruction Set Test (RESET). RESET quantifies gene set importance at both the sample-level and for the entire dataset based on the ability of set genes to reconstruct values for all measured genes. RESET addresses four important limitations of current techniques: 1) existing single sample methods are designed to detect mean differences and struggle to identify differential correlation patterns, 2) computationally efficient techniques are self-contained methods and cannot directly detect competitive scenarios where set genes differ from non-set genes in the same sample, 3) the scores generated by current methods can only be accurately compared across samples for a single set and not between sets, and 4) the computational performance of even the fastest existing methods be significant on very large datasets. RESET is realized using a computationally efficient randomized reduced rank reconstruction algorithm (available via the RESET R package on CRAN) that can effectively detect patterns of differential abundance and differential correlation for self-contained and competitive scenarios. As demonstrated using real and simulated scRNA-seq data, RESET provides superior accuracy at a lower computational cost relative to other single sample approaches.

8.
bioRxiv ; 2023 Feb 15.
Article in English | MEDLINE | ID: mdl-36824707

ABSTRACT

Motivation: Doublets are usually considered an unwanted artifact of single-cell RNA-sequencing (scRNA-seq) and are only identified in datasets for the sake of removal. However, if cells have a juxtacrine attachment to one another in situ and maintain this association through an scRNA-seq processing pipeline that only partially dissociates the tissue, these doublets can provide meaningful biological information regarding the interactions and cell processes occurring in the analyzed tissue. This is especially true for cases such as the immune compartment of the tumor microenvironment, where the frequency and type of immune cell juxtacrine interactions can be a prognostic indicator. Results: We developed Cell type-specific Interaction Analysis using Doublets in scRNA-seq (CIcADA) as a pipeline for identifying and analyzing biological doublets in scRNA-seq data. CIcADA identifies putative doublets using multi-label cell type scores and characterizes interaction dynamics through a comparison against synthetic doublets of the same cell type composition. In performing CIcADA on several scRNA-seq tumor datasets, we found that the identified doublets were consistently upregulating expression of immune response genes. Contact: Courtney.T.Schiebout.GR@Dartmouth.edu , Hildreth.R.Frost@Dartmouth.edu.

9.
Physiotherapy ; 117: 81-88, 2022 12.
Article in English | MEDLINE | ID: mdl-36244276

ABSTRACT

OBJECTIVE: To explore the acceptability, barriers and enablers of NICE guidelines for osteoarthritis in the Scottish primary care setting using the Joint Implementation of Guidelines for Osteoarthritis in Western Europe (JIGSAW-E) model and investigate the role of Advanced Physiotherapy Practitioners (APPs) in providing evidence-based care. DESIGN: A qualitative case study comprised of semi-structured interviews followed by a workshop with participants. SETTING: 10 Scottish primary care practices. PARTICIPANTS: Six general practitioners (GPs) and eight APPs were interviewed. Twenty-three practitioners attended the workshop including 22 physiotherapists and one GP. RESULTS: While both GPs and APPs recognised the need to improve and standardise osteoarthritis care delivery, this study found that APPs were better situated to implement the evidence-based model. Barriers to implementation included lack of time for training, limited appointment time for GPs to consult and discuss medication use with patients, limitation of disease specific guidelines for patients with complex multimorbidity, and system-based barriers such as electronic data collection and high staff turnover. The key enabler was practitioners' motivation to provide optimal, standardised quality care for osteoarthritis. To increase acceptance, ownership and usability for both practitioners and patients, the JIGSAW-E model materials required adaptation to the local context. CONCLUSION: This study provides evidence that the JIGSAW-E model is acceptable in Scottish primary care. Furthermore, the evolving roles of GPs and APPs within multidisciplinary primary care teams provides a platform to implement the JIGSAW-E model, where APPs are well placed to provide leadership and training in the delivery of evidence-based care for osteoarthritis.


Subject(s)
General Practitioners , Osteoarthritis , Physical Therapists , Humans , Qualitative Research , Osteoarthritis/therapy , Primary Health Care , Scotland
10.
Bioinformatics ; 38(23): 5206-5213, 2022 11 30.
Article in English | MEDLINE | ID: mdl-36214642

ABSTRACT

MOTIVATION: Cell typing is a critical task in the analysis of single-cell data, particularly when studying complex diseased tissues. Unfortunately, the sparsity and noise of single-cell data make accurate cell typing of individual cells difficult. To address these challenges, we previously developed the CAMML method for multi-label cell typing of single-cell RNA-sequencing (scRNA-seq) data. CAMML uses weighted gene sets to score each profiled cell for multiple potential cell types. While CAMML outperforms other scRNA-seq cell typing techniques, it only leverages transcriptomic data so cannot take advantage of newer multi-omic single-cell assays that jointly profile gene expression and protein abundance (e.g. joint scRNA-seq/CITE-seq). RESULTS: We developed the CAMML with the Integration of Marker Proteins (ChIMP) method to support multi-label cell typing of individual cells jointly profiled via scRNA-seq and CITE-seq. ChIMP combines cell type scores computed on scRNA-seq data via the CAMML approach with discretized CITE-seq measurements for cell type marker proteins. The multi-omic cell type scores generated by ChIMP allow researchers to more precisely and conservatively cell type joint scRNA-seq/CITE-seq data. AVAILABILITY AND IMPLEMENTATION: An implementation of this work is available on CRAN at https://cran.r-project.org/web/packages/CAMML/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Software , Transcriptome
11.
J Comput Graph Stat ; 31(2): 486-501, 2022.
Article in English | MEDLINE | ID: mdl-35693984

ABSTRACT

We present a novel technique for sparse principal component analysis. This method, named Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA), is based on the formula for computing squared eigenvector loadings of a Hermitian matrix from the eigenvalues of the full matrix and associated sub-matrices. We explore two versions of the EESPCA method: a version that uses a fixed threshold for inducing sparsity and a version that selects the threshold via cross-validation. Relative to the state-of-the-art sparse PCA methods of Witten et al., Yuan & Zhang and Tan et al., the fixed threshold EESPCA technique offers an order-of-magnitude improvement in computational speed, does not require estimation of tuning parameters via cross-validation, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures. Importantly, the EESPCA method achieves these benefits while maintaining out-of-sample reconstruction error and PC estimation error close to the lowest error generated by all evaluated approaches. EESPCA is a practical and effective technique for sparse PCA with particular relevance to computationally demanding statistical problems such as the analysis of high-dimensional data sets or application of statistical techniques like resampling that involve the repeated calculation of sparse PCs.

12.
Cancer Immunol Res ; 10(8): 962-977, 2022 08 03.
Article in English | MEDLINE | ID: mdl-35696724

ABSTRACT

Chimeric-antigen receptor (CAR) T-cell therapy has shown remarkable efficacy against hematologic tumors. Yet, CAR T-cell therapy has had little success against solid tumors due to obstacles presented by the tumor microenvironment (TME) of these cancers. Here, we show that CAR T cells armored with the engineered IL-2 superkine Super2 and IL-33 were able to promote tumor control as a single-agent therapy. IFNγ and perforin were dispensable for the effects of Super2- and IL-33-armored CAR T cells. Super2 and IL-33 synergized to shift leukocyte proportions in the TME and to recruit and activate a broad repertoire of endogenous innate and adaptive immune cells including tumor-specific T cells. However, depletion of CD8+ T cells or NK cells did not disrupt tumor control, suggesting that broad immune activation compensated for loss of individual cell subsets. Thus, we have shown that Super2 and IL-33 CAR T cells can promote antitumor immunity in multiple solid tumor models and can potentially overcome antigen loss, highlighting the potential of this universal CAR T-cell platform for the treatment of solid tumors.


Subject(s)
Neoplasms , Tumor Microenvironment , Humans , Immunotherapy, Adoptive , Interleukin-2 , Interleukin-33
13.
NAR Genom Bioinform ; 4(2): lqac035, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35651651

ABSTRACT

Significant advances in RNA sequencing have been recently made possible by using oligo(dT) primers for simultaneous mRNA enrichment and reverse transcription priming. The associated increase in efficiency has enabled more economical bulk RNA sequencing methods and the advent of high-throughput single-cell RNA sequencing, already one of the most widely adopted methods in transcriptomics. However, the effects of off-target oligo(dT) priming on gene expression quantification have not been appreciated. In the present study, we describe the extent, the possible causes, and the consequences of internal oligo(dT) priming across multiple public datasets obtained from various bulk and single-cell RNA sequencing platforms. To explore and address this issue, we developed a computational algorithm for RNA counting methods, which identifies the sequencing read alignments that likely resulted from internal oligo(dT) priming and removes them from the data. Directly comparing filtered datasets to those obtained by an alternative method reveals significant improvements in gene expression measurement. Finally, we infer a list of human genes whose expression quantification is most likely to be affected by internal oligo(dT) priming and predict that when measured using these methods, the expression of most genes may be inflated by at least 10% whereby some genes are affected more than others.

14.
PLoS Comput Biol ; 18(5): e1010091, 2022 05.
Article in English | MEDLINE | ID: mdl-35584140

ABSTRACT

Research in human-associated microbiomes often involves the analysis of taxonomic count tables generated via high-throughput sequencing. It is difficult to apply statistical tools as the data is high-dimensional, sparse, and compositional. An approachable way to alleviate high-dimensionality and sparsity is to aggregate variables into pre-defined sets. Set-based analysis is ubiquitous in the genomics literature and has demonstrable impact on improving interpretability and power of downstream analysis. Unfortunately, there is a lack of sophisticated set-based analysis methods specific to microbiome taxonomic data, where current practice often employs abundance summation as a technique for aggregation. This approach prevents comparison across sets of different sizes, does not preserve inter-sample distances, and amplifies protocol bias. Here, we attempt to fill this gap with a new single-sample taxon enrichment method that uses a novel log-ratio formulation based on the competitive null hypothesis commonly used in the enrichment analysis literature. Our approach, titled competitive balances for taxonomic enrichment analysis (CBEA), generates sample-specific enrichment scores as the scaled log-ratio of the subcomposition defined by taxa within a set and the subcomposition defined by its complement. We provide sample-level significance testing by estimating an empirical null distribution of our test statistic with valid p-values. Herein, we demonstrate, using both real data applications and simulations, that CBEA controls for type I error, even under high sparsity and high inter-taxa correlation scenarios. Additionally, CBEA provides informative scores that can be inputs to downstream analyses such as prediction tasks.


Subject(s)
Microbiota , Genomics/methods , High-Throughput Nucleotide Sequencing , Humans , Microbiota/genetics
15.
Environ Sci Process Impacts ; 24(4): 504-524, 2022 Apr 21.
Article in English | MEDLINE | ID: mdl-35348562

ABSTRACT

The laundering of synthetic fabrics has been identified as an important and diffuse source of microplastic (<5 mm) fibre contamination to wastewater systems. Home laundering can release up to 13 million fibres per kg of fabric, which end up in wastewater treatment plants. During treatment, 72-99% of microplastics are retained in the residual sewage sludge, which can contain upwards of 56 000 microplastics per kg. Sewage sludge is commonly disposed of by application to agricultural land as a soil amendment. In some European countries, application rates are up to 91%, representing an important pathway for microplastics to enter the terrestrial environment, which urgently requires quantification. Sewage sludge also often contains elevated concentrations of metals and metalloids, and some studies have quantified metal(loid) sorption onto various microplastics. The sorption of metals and metalloids is strongly influenced by the chemical properties of the sorbate, the solution chemistry, and the physicochemical properties of the microplastics themselves. Plastic-water partition coefficients for the sorption of cadmium, mercury and lead onto microplastics are up to 8, 32, and 217 mL g-1 respectively. Sorptive capacities of microplastics may increase over time, due to environmental degradation processes increasing the specific surface area and surface density of oxygen-containing functional groups. A range of metal(loid)s, including cadmium, chromium, and zinc, have been shown to readily desorb from microplastics under acidic conditions. Sorbed metal(loid)s may therefore become more bioavailable to soil organisms when the microplastics are ingested, due to the acidic gut conditions facilitating desorption. Polyester (polyethylene terephthalate) should be of particular focus for future research, as few quantitative sorption studies currently exist, it is potentially overlooked from density separation studies due to its high density, and it is by far the most widely used fibre in apparel textiles production.


Subject(s)
Metalloids , Water Pollutants, Chemical , Cadmium , Metals , Microplastics , Plastics , Sewage/chemistry , Soil , Water Pollutants, Chemical/analysis
16.
Pac Symp Biocomput ; 27: 199-210, 2022.
Article in English | MEDLINE | ID: mdl-34890149

ABSTRACT

Inferring the cell types in single-cell RNA-sequencing (scRNA-seq) data is of particular importance for understanding the potential cellular mechanisms and phenotypes occurring in complex tissues, such as the tumor-immune microenvironment (TME). The sparsity and noise of scRNA-seq data, combined with the fact that immune cell types often occur on a continuum, make cell typing of TME scRNA-seq data a significant challenge. Several single-label cell typing methods have been put forth to address the limitations of noise and sparsity, but accounting for the often overlapped spectrum of cell types in the immune TME remains an obstacle. To address this, we developed a new scRNA-seq cell-typing method, Cell-typing using variance Adjusted Mahalanobis distances with Multi-Labeling (CAMML). CAMML leverages cell type-specific weighted gene sets to score every cell in a dataset for every potential cell type. This allows cells to be labelled either by their highest scoring cell type as a single label classification or based on a score cut-off to give multi-label classification. For single-label cell typing, CAMML performance is comparable to existing cell typing methods, SingleR and Garnett. For scenarios where cells may exhibit features of multiple cell types (e.g., undifferentiated cells), the multi-label classification supported by CAMML offers important benefits relative to the current state-of-the-art methods. By integrating data across studies, omics platforms, and species, CAMML serves as a robust and adaptable method for overcoming the challenges of scRNA-seq analysis.


Subject(s)
Computational Biology , Single-Cell Analysis , RNA/genetics , Sequence Analysis, RNA , Exome Sequencing
17.
BMC Cancer ; 21(1): 1053, 2021 Sep 25.
Article in English | MEDLINE | ID: mdl-34563154

ABSTRACT

BACKGROUND: Over the past decades, approaches for diagnosing and treating cancer have seen significant improvement. However, the variability of patient and tumor characteristics has limited progress on methods for prognosis prediction. The development of high-throughput omics technologies now provides multiple approaches for characterizing tumors. Although a large number of published studies have focused on integration of multi-omics data and use of pathway-level models for cancer prognosis prediction, there still exists a gap of knowledge regarding the prognostic landscape across multi-omics data for multiple cancer types using both gene-level and pathway-level predictors. METHODS: In this study, we systematically evaluated three often available types of omics data (gene expression, copy number variation and somatic point mutation) covering both DNA-level and RNA-level features. We evaluated the landscape of predictive performance of these three omics modalities for 33 cancer types in the TCGA using a Lasso or Group Lasso-penalized Cox model and either gene or pathway level predictors. RESULTS: We constructed the prognostic landscape using three types of omics data for 33 cancer types on both the gene and pathway levels. Based on this landscape, we found that predictive performance is cancer type dependent and we also highlighted the cancer types and omics modalities that support the most accurate prognostic models. In general, models estimated on gene expression data provide the best predictive performance on either gene or pathway level and adding copy number variation or somatic point mutation data to gene expression data does not improve predictive performance, with some exceptional cohorts including low grade glioma and thyroid cancer. In general, pathway-level models have better interpretative performance, higher stability and smaller model size across multiple cancer types and omics data types relative to gene-level models. CONCLUSIONS: Based on this landscape and comprehensively comparison, models estimated on gene expression data provide the best predictive performance on either gene or pathway level. Pathway-level models have better interpretative performance, higher stability and smaller model size relative to gene-level models.


Subject(s)
DNA Copy Number Variations , Gene Expression Profiling/methods , Gene Expression , Neoplasms/genetics , Point Mutation , Cohort Studies , Databases, Genetic , Humans , Neoplasms/mortality , Neoplasms/pathology , Predictive Value of Tests , Prognosis , Proportional Hazards Models
18.
BMC Microbiol ; 21(1): 238, 2021 08 28.
Article in English | MEDLINE | ID: mdl-34454437

ABSTRACT

BACKGROUND: The infant intestinal microbiome plays an important role in metabolism and immune development with impacts on lifelong health. The linkage between the taxonomic composition of the microbiome and its metabolic phenotype is undefined and complicated by redundancies in the taxon-function relationship within microbial communities. To inform a more mechanistic understanding of the relationship between the microbiome and health, we performed an integrative statistical and machine learning-based analysis of microbe taxonomic structure and metabolic function in order to characterize the taxa-function relationship in early life. RESULTS: Stool samples collected from infants enrolled in the New Hampshire Birth Cohort Study (NHBCS) at approximately 6-weeks (n = 158) and 12-months (n = 282) of age were profiled using targeted and untargeted nuclear magnetic resonance (NMR) spectroscopy as well as DNA sequencing of the V4-V5 hypervariable region from the bacterial 16S rRNA gene. There was significant inter-omic concordance based on Procrustes analysis (6 weeks: p = 0.056; 12 months: p = 0.001), however this association was no longer significant when accounting for phylogenetic relationships using generalized UniFrac distance metric (6 weeks: p = 0.376; 12 months: p = 0.069). Sparse canonical correlation analysis showed significant correlation, as well as identifying sets of microbe/metabolites driving microbiome-metabolome relatedness. Performance of machine learning models varied across different metabolites, with support vector machines (radial basis function kernel) being the consistently top ranked model. However, predictive R2 values demonstrated poor predictive performance across all models assessed (avg: - 5.06% -- 6 weeks; - 3.7% -- 12 months). Conversely, the Spearman correlation metric was higher (avg: 0.344-6 weeks; 0.265-12 months). This demonstrated that taxonomic relative abundance was not predictive of metabolite concentrations. CONCLUSIONS: Our results suggest a degree of overall association between taxonomic profiles and metabolite concentrations. However, lack of predictive capacity for stool metabolic signatures reflects, in part, the possible role of functional redundancy in defining the taxa-function relationship in early life as well as the bidirectional nature of the microbiome-metabolome association. Our results provide evidence in favor of a multi-omic approach for microbiome studies, especially those focused on health outcomes.


Subject(s)
Bacteria/genetics , Feces/microbiology , Gastrointestinal Microbiome/genetics , Gastrointestinal Microbiome/physiology , Metabolome , Bacteria/classification , Bacteria/isolation & purification , Birth Cohort , Female , Humans , Infant , Machine Learning , Male , Phylogeny , RNA, Ribosomal, 16S/genetics , Sequence Analysis, DNA
19.
PLoS Comput Biol ; 17(6): e1009085, 2021 06.
Article in English | MEDLINE | ID: mdl-34143767

ABSTRACT

The genetic alterations that underlie cancer development are highly tissue-specific with the majority of driving alterations occurring in only a few cancer types and with alterations common to multiple cancer types often showing a tissue-specific functional impact. This tissue-specificity means that the biology of normal tissues carries important information regarding the pathophysiology of the associated cancers, information that can be leveraged to improve the power and accuracy of cancer genomic analyses. Research exploring the use of normal tissue data for the analysis of cancer genomics has primarily focused on the paired analysis of tumor and adjacent normal samples. Efforts to leverage the general characteristics of normal tissue for cancer analysis has received less attention with most investigations focusing on understanding the tissue-specific factors that lead to individual genomic alterations or dysregulated pathways within a single cancer type. To address this gap and support scenarios where adjacent normal tissue samples are not available, we explored the genome-wide association between the transcriptomes of 21 solid human cancers and their associated normal tissues as profiled in healthy individuals. While the average gene expression profiles of normal and cancerous tissue may appear distinct, with normal tissues more similar to other normal tissues than to the associated cancer types, when transformed into relative expression values, i.e., the ratio of expression in one tissue or cancer relative to the mean in other tissues or cancers, the close association between gene activity in normal tissues and related cancers is revealed. As we demonstrate through an analysis of tumor data from The Cancer Genome Atlas and normal tissue data from the Human Protein Atlas, this association between tissue-specific and cancer-specific expression values can be leveraged to improve the prognostic modeling of cancer, the comparative analysis of different cancer types, and the analysis of cancer and normal tissue pairs.


Subject(s)
Neoplasms/genetics , Computational Biology , Databases, Genetic/statistics & numerical data , Female , Gene Expression , Gene Expression Profiling/statistics & numerical data , Humans , Male , Organ Specificity/genetics , Principal Component Analysis , RNA-Seq , Reference Values , Survival Analysis
20.
Oncoimmunology ; 10(1): 1862529, 2021 03 09.
Article in English | MEDLINE | ID: mdl-33763292

ABSTRACT

A substantial fraction of patients with stage I-III colorectal adenocarcinoma (CRC) experience disease relapse after surgery with curative intent. However, biomarkers for predicting the likelihood of CRC relapse have not been fully explored. Therefore, we assessed the association between tumor infiltration by a broad array of innate and adaptive immune cell types and CRC relapse risk. We implemented a discovery-validation design including a discovery dataset from Moffitt Cancer Center (MCC; Tampa, FL) and three independent validation datasets: (1) GSE41258 (2) the Molecular Epidemiology of Colorectal Cancer (MECC) study, and (3) GSE39582. Infiltration by 22 immune cell types was inferred from tumor gene expression data, and the association between immune infiltration by each cell type and relapse-free survival was assessed using Cox proportional hazards regression. Within each of the four independent cohorts, CD4+ memory activated T cell (HR: 0.93, 95% CI: 0.90-0.96; FDR = 0.0001) infiltration was associated with longer time to disease relapse, independent of stage, microsatellite instability, and adjuvant therapy. Based on our meta-analysis across the four datasets, 10 innate and adaptive immune cell types associated with disease relapse of which 2 were internally validated using multiplex immunofluorescence. Moreover, immune cell type infiltration was a better predictors of disease relapse than Consensus Molecular Subtype (CMS) and other expression-based biomarkers (Immune-AICMCC:238.1-238.9; CMS-AICMCC: 241.0). These data suggest that transcriptome-derived immune profiles are prognostic indicators of CRC relapse and quantification of both innate and adaptive immune cell types may serve as candidate biomarkers for predicting prognosis and guiding frequency and modality of disease surveillance.


Subject(s)
Colorectal Neoplasms , Transcriptome , Colorectal Neoplasms/genetics , Humans , Microsatellite Instability , Prognosis , Recurrence
SELECTION OF CITATIONS
SEARCH DETAIL
...