Pesquisa | BVS Integralidade em Saúde

Leveraging gene co-regulation to identify gene sets enriched for disease heritability.

Siewert-Rocks, Katherine M; Kim, Samuel S; Yao, Douglas W; Shi, Huwenbo; Price, Alkes L.

Am J Hum Genet ; 109(3): 393-404, 2022 03 03.

Artigo em Inglês | MEDLINE | ID: mdl-35108496

RESUMO

Identifying gene sets that are associated to disease can provide valuable biological knowledge, but a fundamental challenge of gene set analyses of GWAS data is linking disease-associated SNPs to genes. Transcriptome-wide association studies (TWASs) detect associations between the genetically predicted expression of a gene and disease risk, thus implicating candidate disease genes. However, causal disease genes at TWAS-associated loci generally remain unknown due to gene co-regulation, which leads to correlations across genes in predicted expression. We developed a method, gene co-regulation score (GCSC) regression, to identify gene sets that are enriched for disease heritability explained by predicted expression. GCSC regresses TWAS chi-square statistics on gene co-regulation scores reflecting correlations in predicted gene expression; a gene set is enriched for heritability if genes with high co-regulation to the set have higher TWAS chi-square statistics than genes with low co-regulation to the set, beyond what is expected based on co-regulation to all genes. We verified via simulations that GCSC is well calibrated and well powered. We applied GCSC to gene expression data from GTEx (48 tissues) and GWAS summary statistics for 43 independent diseases and complex traits analyzing a broad set of biological pathways and specifically expressed gene sets. We identified many enriched sets, recapitulating known biology. For Alzheimer disease, we detected evidence of an immune basis, and specifically a role for antigen presentation, in analyses of both biological pathways and specifically expressed gene sets. Our results highlight the advantages of leveraging gene co-regulation within the TWAS framework to identify enriched gene sets.

Assuntos

Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Predisposição Genética para Doença , Humanos , Herança Multifatorial , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Transcriptoma

Compressed Perturb-seq: highly efficient screens for regulatory circuits using random composite perturbations.

Yao, Douglas; Binan, Loic; Bezney, Jon; Simonton, Brooke; Freedman, Jahanara; Frangieh, Chris J; Dey, Kushal; Geiger-Schuller, Kathryn; Eraslan, Basak; Gusev, Alexander; Regev, Aviv; Cleary, Brian.

bioRxiv ; 2023 Jan 23.

Artigo em Inglês | MEDLINE | ID: mdl-36747806

RESUMO

Pooled CRISPR screens with single-cell RNA-seq readout (Perturb-seq) have emerged as a key technique in functional genomics, but are limited in scale by cost and combinatorial complexity. Here, we reimagine Perturb-seq's design through the lens of algorithms applied to random, low-dimensional observations. We present compressed Perturb-seq, which measures multiple random perturbations per cell or multiple cells per droplet and computationally decompresses these measurements by leveraging the sparse structure of regulatory circuits. Applied to 598 genes in the immune response to bacterial lipopolysaccharide, compressed Perturb-seq achieves the same accuracy as conventional Perturb-seq at 4 to 20-fold reduced cost, with greater power to learn genetic interactions. We identify known and novel regulators of immune responses and uncover evolutionarily constrained genes with downstream targets enriched for immune disease heritability, including many missed by existing GWAS or trans-eQTL studies. Our framework enables new scales of interrogation for a foundational method in functional genomics.

Scalable genetic screening for regulatory circuits using compressed Perturb-seq.

Yao, Douglas; Binan, Loic; Bezney, Jon; Simonton, Brooke; Freedman, Jahanara; Frangieh, Chris J; Dey, Kushal; Geiger-Schuller, Kathryn; Eraslan, Basak; Gusev, Alexander; Regev, Aviv; Cleary, Brian.

Nat Biotechnol ; 2023 Oct 23.

Artigo em Inglês | MEDLINE | ID: mdl-37872410

RESUMO

Pooled CRISPR screens with single-cell RNA sequencing readout (Perturb-seq) have emerged as a key technique in functional genomics, but they are limited in scale by cost and combinatorial complexity. In this study, we modified the design of Perturb-seq by incorporating algorithms applied to random, low-dimensional observations. Compressed Perturb-seq measures multiple random perturbations per cell or multiple cells per droplet and computationally decompresses these measurements by leveraging the sparse structure of regulatory circuits. Applied to 598 genes in the immune response to bacterial lipopolysaccharide, compressed Perturb-seq achieves the same accuracy as conventional Perturb-seq with an order of magnitude cost reduction and greater power to learn genetic interactions. We identified known and novel regulators of immune responses and uncovered evolutionarily constrained genes with downstream targets enriched for immune disease heritability, including many missed by existing genome-wide association studies. Our framework enables new scales of interrogation for a foundational method in functional genomics.

Quantifying genetic effects on disease mediated by assayed gene expression levels.

Yao, Douglas W; O'Connor, Luke J; Price, Alkes L; Gusev, Alexander.

Nat Genet ; 52(6): 626-633, 2020 06.

Artigo em Inglês | MEDLINE | ID: mdl-32424349

RESUMO

Disease variants identified by genome-wide association studies (GWAS) tend to overlap with expression quantitative trait loci (eQTLs), but it remains unclear whether this overlap is driven by gene expression levels 'mediating' genetic effects on disease. Here, we introduce a new method, mediated expression score regression (MESC), to estimate disease heritability mediated by the cis genetic component of gene expression levels. We applied MESC to GWAS summary statistics for 42 traits (average N = 323,000) and cis-eQTL summary statistics for 48 tissues from the Genotype-Tissue Expression (GTEx) consortium. Averaging across traits, only 11 ± 2% of heritability was mediated by assayed gene expression levels. Expression-mediated heritability was enriched in genes with evidence of selective constraint and genes with disease-appropriate annotations. Our results demonstrate that assayed bulk tissue eQTLs, although disease relevant, cannot explain the majority of disease heritability.

Assuntos

Expressão Gênica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Locos de Características Quantitativas , Calibragem , Estudo de Associação Genômica Ampla/métodos , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Análise de Regressão

Profiling immunoglobulin repertoires across multiple human tissues using RNA sequencing.

Mandric, Igor; Rotman, Jeremy; Yang, Harry Taegyun; Strauli, Nicolas; Montoya, Dennis J; Van Der Wey, William; Ronas, Jiem R; Statz, Benjamin; Yao, Douglas; Petrova, Velislava; Zelikovsky, Alex; Spreafico, Roberto; Shifman, Sagiv; Zaitlen, Noah; Rossetti, Maura; Ansel, K Mark; Eskin, Eleazar; Mangul, Serghei.

Nat Commun ; 11(1): 3126, 2020 06 19.

Artigo em Inglês | MEDLINE | ID: mdl-32561710

RESUMO

Profiling immunoglobulin (Ig) receptor repertoires with specialized assays can be cost-ineffective and time-consuming. Here we report ImReP, a computational method for rapid and accurate profiling of the Ig repertoire, including the complementary-determining region 3 (CDR3), using regular RNA sequencing data such as those from 8,555 samples across 53 tissues types from 544 individuals in the Genotype-Tissue Expression (GTEx v6) project. Using ImReP and GTEx v6 data, we generate a collection of 3.6 million Ig sequences, termed the atlas of immunoglobulin repertoires (TAIR), across a broad range of tissue types that often do not have reported Ig repertoires information. Moreover, the flow of Ig clonotypes and inter-tissue repertoire similarities across immune-related tissues are also evaluated. In summary, TAIR is one of the largest collections of CDR3 sequences and tissue types, and should serve as an important resource for studying immunological diseases.

Assuntos

Regiões Determinantes de Complementaridade/genética , Biologia Computacional/métodos , RNA-Seq , Conjuntos de Dados como Assunto , Estudos de Viabilidade , Humanos , Receptores de Antígenos de Linfócitos B/genética

Author Correction: Profiling immunoglobulin repertoires across multiple human tissues using RNA sequencing.

Nat Commun ; 11(1): 4499, 2020 09 04.

Artigo em Inglês | MEDLINE | ID: mdl-32887888

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

Benchmarking of computational error-correction methods for next-generation sequencing data.

Mitchell, Keith; Brito, Jaqueline J; Mandric, Igor; Wu, Qiaozhen; Knyazev, Sergey; Chang, Sei; Martin, Lana S; Karlsberg, Aaron; Gerasimov, Ekaterina; Littman, Russell; Hill, Brian L; Wu, Nicholas C; Yang, Harry Taegyun; Hsieh, Kevin; Chen, Linus; Littman, Eli; Shabani, Taylor; Enik, German; Yao, Douglas; Sun, Ren; Schroeder, Jan; Eskin, Eleazar; Zelikovsky, Alex; Skums, Pavel; Pop, Mihai; Mangul, Serghei.

Genome Biol ; 21(1): 71, 2020 03 17.

Artigo em Inglês | MEDLINE | ID: mdl-32183840

RESUMO

BACKGROUND: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Benchmarking , Biologia Computacional/métodos , Humanos , Receptores de Antígenos de Linfócitos T/genética , Vírus/genética , Sequenciamento Completo do Genoma

A linear mixed model approach to gene expression-tumor aneuploidy association studies.

Yao, Douglas W; Balanis, Nikolas G; Eskin, Eleazar; Graeber, Thomas G.

Sci Rep ; 9(1): 11944, 2019 08 16.

Artigo em Inglês | MEDLINE | ID: mdl-31420589

RESUMO

Aneuploidy, defined as abnormal chromosome number or somatic DNA copy number, is a characteristic of many aggressive tumors and is thought to drive tumorigenesis. Gene expression-aneuploidy association studies have previously been conducted to explore cellular mechanisms associated with aneuploidy. However, in an observational setting, gene expression is influenced by many factors that can act as confounders between gene expression and aneuploidy, leading to spurious correlations between the two variables. These factors include known confounders such as sample purity or batch effect, as well as gene co-regulation which induces correlations between the expression of causal genes and non-causal genes. We use a linear mixed-effects model (LMM) to account for confounding effects of tumor purity and gene co-regulation on gene expression-aneuploidy associations. When applied to patient tumor data across diverse tumor types, we observe that the LMM both accounts for the impact of purity on aneuploidy measurements and identifies a new association between histone gene expression and aneuploidy.

Assuntos

Aneuploidia , Regulação Neoplásica da Expressão Gênica , Histonas/genética , Proteínas de Neoplasias/genética , Neoplasias/diagnóstico , Neoplasias/genética , Carcinogênese/genética , Carcinogênese/metabolismo , Carcinogênese/patologia , Variações do Número de Cópias de DNA , Conjuntos de Dados como Assunto , Estudo de Associação Genômica Ampla , Instabilidade Genômica , Histonas/metabolismo , Humanos , Modelos Lineares , Proteínas de Neoplasias/metabolismo , Neoplasias/metabolismo , Neoplasias/patologia

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa