RESUMEN
The myriad microorganisms that live in close association with humans have diverse effects on physiology, yet the molecular bases for these impacts remain mostly unknown1-3. Classical pathogens often invade host tissues and modulate immune responses through interactions with human extracellular and secreted proteins (the 'exoproteome'). Commensal microorganisms may also facilitate niche colonization and shape host biology by engaging host exoproteins; however, direct exoproteome-microbiota interactions remain largely unexplored. Here we developed and validated a novel technology, BASEHIT, that enables proteome-scale assessment of human exoproteome-microbiome interactions. Using BASEHIT, we interrogated more than 1.7 million potential interactions between 519 human-associated bacterial strains from diverse phylogenies and tissues of origin and 3,324 human exoproteins. The resulting interactome revealed an extensive network of transkingdom connectivity consisting of thousands of previously undescribed host-microorganism interactions involving 383 strains and 651 host proteins. Specific binding patterns within this network implied underlying biological logic; for example, conspecific strains exhibited shared exoprotein-binding patterns, and individual tissue isolates uniquely bound tissue-specific exoproteins. Furthermore, we observed dozens of unique and often strain-specific interactions with potential roles in niche colonization, tissue remodelling and immunomodulation, and found that strains with differing host interaction profiles had divergent interactions with host cells in vitro and effects on the host immune system in vivo. Overall, these studies expose a previously unexplored landscape of molecular-level host-microbiota interactions that may underlie causal effects of indigenous microorganisms on human health and disease.
Asunto(s)
Bacterias , Interacciones Microbiota-Huesped , Microbiota , Filogenia , Proteoma , Simbiosis , Animales , Femenino , Humanos , Ratones , Bacterias/clasificación , Bacterias/inmunología , Bacterias/metabolismo , Bacterias/patogenicidad , Interacciones Microbiota-Huesped/inmunología , Interacciones Microbiota-Huesped/fisiología , Tropismo al Anfitrión , Microbiota/inmunología , Microbiota/fisiología , Especificidad de Órganos , Unión Proteica , Proteoma/inmunología , Proteoma/metabolismo , Reproducibilidad de los ResultadosRESUMEN
MOTIVATION: Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features are essential. In experiments featuring multiple high-dimensional datasets collected from the same set of samples, it is useful to identify groups of associated features between the datasets in a way that provides high statistical power and false discovery rate (FDR) control. RESULTS: Here, we present a novel hierarchical framework, HAllA (Hierarchical All-against-All association testing), for structured association discovery between paired high-dimensional datasets. HAllA efficiently integrates hierarchical hypothesis testing with FDR correction to reveal significant linear and non-linear block-wise relationships among continuous and/or categorical data. We optimized and evaluated HAllA using heterogeneous synthetic datasets of known association structure, where HAllA outperformed all-against-all and other block-testing approaches across a range of common similarity measures. We then applied HAllA to a series of real-world multiomics datasets, revealing new associations between gene expression and host immune activity, the microbiome and host transcriptome, metabolomic profiling and human health phenotypes. AVAILABILITY AND IMPLEMENTATION: An open-source implementation of HAllA is freely available at http://huttenhower.sph.harvard.edu/halla along with documentation, demo datasets and a user group. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Microbiota , TranscriptomaRESUMEN
CD36 is a platelet membrane glycoprotein whose engagement with oxidized low-density lipoprotein (oxLDL) results in platelet activation. The CD36 gene has been associated with platelet count, platelet volume, as well as lipid levels and CVD risk by genome-wide association studies. Platelet CD36 expression levels have been shown to be associated with both the platelet oxLDL response and an elevated risk of thrombo-embolism. Several genomic variants have been identified as associated with platelet CD36 levels, however none have been conclusively demonstrated to be causative. We screened 81 expression quantitative trait loci (eQTL) single nucleotide polymorphisms (SNPs) associated with platelet CD36 expression by a Massively Parallel Reporter Assay (MPRA) and analyzed the results with a novel Bayesian statistical method. Ten eQTLs located 13kb to 55kb upstream of the CD36 transcriptional start site of transcript ENST00000309881 and 49kb to 92kb upstream of transcript ENST00000447544, demonstrated significant transcription shifts between their minor and major allele in the MPRA assay. Of these, rs2366739 and rs1194196, separated by only 20bp, were confirmed by luciferase assay to alter transcriptional regulation. In addition, electromobility shift assays demonstrated differential DNA:protein complex formation between the two alleles of this locus. Furthermore, deletion of the genomic locus by CRISPR/Cas9 in K562 and Meg-01 cells results in upregulation of CD36 transcription. These data indicate that we have identified a variant that regulates expression of CD36, which in turn affects platelet function. To assess the clinical relevance of our findings we used the PhenoScanner tool, which aggregates large scale GWAS findings; the results reinforce the clinical relevance of our variants and the utility of the MPRA assay. The study demonstrates a generalizable paradigm for functional testing of genetic variants to inform mechanistic studies, support patient management and develop precision therapies.
Asunto(s)
Antígenos CD36/genética , Enfermedades Cardiovasculares/genética , Polimorfismo de Nucleótido Simple , Teorema de Bayes , Enfermedades Cardiovasculares/metabolismo , Línea Celular , Regulación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Humanos , Células K562 , Lipoproteínas LDL/metabolismo , Recuento de Plaquetas , Sitios de Carácter CuantitativoRESUMEN
NGS studies have uncovered an ever-growing catalog of human variation while leaving an enormous gap between observed variation and experimental characterization of variant function. High-throughput screens powered by NGS have greatly increased the rate of variant functionalization, but the development of comprehensive statistical methods to analyze screen data has lagged. In the massively parallel reporter assay (MPRA), short barcodes are counted by sequencing DNA libraries transfected into cells and the cell's output RNA in order to simultaneously measure the shifts in transcription induced by thousands of genetic variants. These counts present many statistical challenges, including overdispersion, depth dependence, and uncertain DNA concentrations. So far, the statistical methods used have been rudimentary, employing transformations on count level data and disregarding experimental and technical structure while failing to quantify uncertainty in the statistical model. We have developed an extensive framework for the analysis of NGS functionalization screens available as an R package called malacoda (available from github.com/andrewGhazi/malacoda). Our software implements a probabilistic, fully Bayesian model of screen data. The model uses the negative binomial distribution with gamma priors to model sequencing counts while accounting for effects from input library preparation and sequencing depth. The method leverages the high-throughput nature of the assay to estimate the priors empirically. External annotations such as ENCODE data or DeepSea predictions can also be incorporated to obtain more informative priors-a transformative capability for data integration. The package also includes quality control and utility functions, including automated barcode counting and visualization methods. To validate our method, we analyzed several datasets using malacoda and alternative MPRA analysis methods. These data include experiments from the literature, simulated assays, and primary MPRA data. We also used luciferase assays to experimentally validate several hits from our primary data, as well as variants for which the various methods disagree and variants detectable only with the aid of external annotations.
Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Estadísticos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Teorema de Bayes , Variación Genética/genética , HumanosRESUMEN
Motivation: Genetic reporter assays are a convenient, relatively inexpensive method for studying the regulation of gene expression. Massively Parallel Reporter Assays (MPRA) are high-throughput functionalization assays that interrogate the transcriptional activity of many genetic variants at once using a library of synthetic barcoded constructs. Despite growing interest in this area, there are few computational tools to design and execute MPRA studies. Results: We designed an online web-tool and R package that allows for interactive MPRA experimental design encompassing both power analysis and design of constructs. Our tool is tuned using data from real MPRA studies. Users can adjust experimental parameters to examine the predicted effect on assay power as well as upload VCFs for automated construct sequence generation. Availability and implementation: The MPRA Design Tools web application is available here: https://andrewghazi.shinyapps.io/designmpra/, https://github.com/andrewGhazi/designMPRA and https://github.com/andrewGhazi/mpradesigntools. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional/métodos , Regulación de la Expresión Génica , Genes Reporteros , Técnicas Genéticas , Programas Informáticos , Bioensayo/métodosRESUMEN
The association of gut microbial features with type 2 diabetes (T2D) has been inconsistent due in part to the complexity of this disease and variation in study design. Even in cases in which individual microbial species have been associated with T2D, mechanisms have been unable to be attributed to these associations based on specific microbial strains. We conducted a comprehensive study of the T2D microbiome, analyzing 8,117 shotgun metagenomes from 10 cohorts of individuals with T2D, prediabetes, and normoglycemic status in the United States, Europe, Israel and China. Dysbiosis in 19 phylogenetically diverse species was associated with T2D (false discovery rate < 0.10), for example, enriched Clostridium bolteae and depleted Butyrivibrio crossotus. These microorganisms also contributed to community-level functional changes potentially underlying T2D pathogenesis, for example, perturbations in glucose metabolism. Our study identifies within-species phylogenetic diversity for strains of 27 species that explain inter-individual differences in T2D risk, such as Eubacterium rectale. In some cases, these were explained by strain-specific gene carriage, including loci involved in various mechanisms of horizontal gene transfer and novel biological processes underlying metabolic risk, for example, quorum sensing. In summary, our study provides robust cross-cohort microbial signatures in a strain-resolved manner and offers new mechanistic insights into T2D.
Asunto(s)
Diabetes Mellitus Tipo 2 , Microbioma Gastrointestinal , Metagenoma , Filogenia , Diabetes Mellitus Tipo 2/microbiología , Diabetes Mellitus Tipo 2/genética , Humanos , Microbioma Gastrointestinal/genética , Metagenoma/genética , Estudios de Cohortes , Masculino , Persona de Mediana Edad , Femenino , China/epidemiología , Disbiosis/microbiología , Estados Unidos/epidemiología , Israel/epidemiología , Europa (Continente)/epidemiologíaRESUMEN
BACKGROUND: The gut microbiome regulates host energy balance and adiposity-related metabolic consequences, but it remains unknown how the gut microbiome modulates body weight response to physical activity (PA). METHODS: Nested in the Health Professionals Follow-up Study, a subcohort of 307 healthy men (mean[SD] age, 70[4] years) provided stool and blood samples in 2012-2013. Data from cohort long-term follow-ups and from the accelerometer, doubly labeled water, and plasma biomarker measurements during the time of stool collection were used to assess long-term and short-term associations of PA with adiposity. The gut microbiome was profiled by shotgun metagenomics and metatranscriptomics. A subcohort of 209 healthy women from the Nurses' Health Study II was used for validation. RESULTS: The microbial species Alistipes putredinis was found to modify the association between PA and body weight. Specifically, in individuals with higher abundance of A. putredinis, each 15-MET-hour/week increment in long-term PA was associated with 2.26 kg (95% CI, 1.53-2.98 kg) less weight gain from age 21 to the time of stool collection, whereas those with lower abundance of A. putredinis only had 1.01 kg (95% CI, 0.41-1.61 kg) less weight gain (pinteraction = 0.019). Consistent modification associated with A. putredinis was observed for short-term PA in relation to BMI, fat mass%, plasma HbA1c, and 6-month weight change. This modification effect might be partly attributable to four metabolic pathways encoded by A. putredinis, including folate transformation, fatty acid ß-oxidation, gluconeogenesis, and stearate biosynthesis. CONCLUSIONS: A greater abundance of A. putredinis may strengthen the beneficial association of PA with body weight change, suggesting the potential of gut microbial intervention to improve the efficacy of PA in body weight management. Video Abstract.
Asunto(s)
Microbioma Gastrointestinal , Femenino , Humanos , Masculino , Adulto Joven , Peso Corporal , Ejercicio Físico/fisiología , Estudios de Seguimiento , Microbioma Gastrointestinal/genética , Obesidad/metabolismo , Aumento de Peso , AncianoRESUMEN
Microbiology has long studied the ways in which subtle genetic differences between closely related microbial strains can have profound impacts on their phenotypes and those of their surrounding environments and communities. Despite the growth in high-throughput microbial community profiling, however, such strain-level differences remain challenging to detect. Once detected, few quantitative approaches have been well-validated for associating strain variants from microbial communities with phenotypes of interest, such as medication usage, treatment efficacy, host environment, or health. First, the term "strain" itself is not used consistently when defining a highly-resolved taxonomic or genomic unit from within a microbial community. Second, computational methods for identifying such strains directly from shotgun metagenomics are difficult, with several possible reference- and assembly-based approaches available, each with different sensitivity/specificity tradeoffs. Finally, statistical challenges exist in using any of the resulting strain profiles for downstream analyses, which can include strain tracking, phylogenetic analysis, or genetic association studies. We provide an in depth discussion of recently available computational tools that can be applied for this task, as well as statistical models and gaps in performing and interpreting any of these three main types of studies using strain-resolved shotgun metagenomic profiling of microbial communities.
Asunto(s)
Metagenómica , Microbiota , Metagenoma , Metagenómica/métodos , Microbiota/genética , FilogeniaRESUMEN
Importance: While congenital malformations and genetic diseases are a leading cause of early infant death, to our knowledge, the contribution of single-gene disorders in this group is undetermined. Objective: To determine the diagnostic yield and use of clinical exome sequencing in critically ill infants. Design, Setting, and Participants: Clinical exome sequencing was performed for 278 unrelated infants within the first 100 days of life who were admitted to Texas Children's Hospital in Houston, Texas, during a 5-year period between December 2011 and January 2017. Exome sequencing types included proband exome, trio exome, and critical trio exome, a rapid genomic assay for seriously ill infants. Main Outcomes and Measures: Indications for testing, diagnostic yield of clinical exome sequencing, turnaround time, molecular findings, patient age at diagnosis, and effect on medical management among a group of critically ill infants who were suspected to have genetic disorders. Results: The mean (SEM) age for infants participating in the study was 28.5 (1.7) days; of these, the mean (SEM) age was 29.0 (2.2) days for infants undergoing proband exome sequencing, 31.5 (3.9) days for trio exome, and 22.7 (3.9) days for critical trio exome. Clinical indications for exome sequencing included a range of medical concerns. Overall, a molecular diagnosis was achieved in 102 infants (36.7%) by clinical exome sequencing, with relatively low yield for cardiovascular abnormalities. The diagnosis affected medical management for 53 infants (52.0%) and had a substantial effect on informed redirection of care, initiation of new subspecialist care, medication/dietary modifications, and furthering life-saving procedures in select patients. Critical trio exome sequencing revealed a molecular diagnosis in 32 of 63 infants (50.8%) at a mean (SEM) of 33.1 (5.6) days of life with a mean (SEM) turnaround time of 13.0 (0.4) days. Clinical care was altered by the diagnosis in 23 of 32 patients (71.9%). The diagnostic yield, patient age at diagnosis, and medical effect in the group that underwent critical trio exome sequencing were significantly different compared with the group who underwent regular exome testing. For deceased infants (n = 81), genetic disorders were molecularly diagnosed in 39 (48.1%) by exome sequencing, with implications for recurrence risk counseling. Conclusions and Relevance: Exome sequencing is a powerful tool for the diagnostic evaluation of critically ill infants with suspected monogenic disorders in the neonatal and pediatric intensive care units and its use has a notable effect on clinical decision making.