Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Biostatistics ; 25(4): 1254-1272, 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-38649751

RESUMO

CRISPR genome engineering and single-cell RNA sequencing have accelerated biological discovery. Single-cell CRISPR screens unite these two technologies, linking genetic perturbations in individual cells to changes in gene expression and illuminating regulatory networks underlying diseases. Despite their promise, single-cell CRISPR screens present considerable statistical challenges. We demonstrate through theoretical and real data analyses that a standard method for estimation and inference in single-cell CRISPR screens-"thresholded regression"-exhibits attenuation bias and a bias-variance tradeoff as a function of an intrinsic, challenging-to-select tuning parameter. To overcome these difficulties, we introduce GLM-EIV ("GLM-based errors-in-variables"), a new method for single-cell CRISPR screen analysis. GLM-EIV extends the classical errors-in-variables model to responses and noisy predictors that are exponential family-distributed and potentially impacted by the same set of confounding variables. We develop a computational infrastructure to deploy GLM-EIV across hundreds of processors on clouds (e.g. Microsoft Azure) and high-performance clusters. Leveraging this infrastructure, we apply GLM-EIV to analyze two recent, large-scale, single-cell CRISPR screen datasets, yielding several new insights.


Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Sistemas CRISPR-Cas/genética , Modelos Estatísticos
2.
bioRxiv ; 2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38659821

RESUMO

Single-cell CRISPR screens (perturb-seq) link genetic perturbations to phenotypic changes in individual cells. The most fundamental task in perturb-seq analysis is to test for association between a perturbation and a count outcome, such as gene expression. We conduct the first-ever comprehensive benchmarking study of association testing methods for low multiplicity-of-infection (MOI) perturb-seq data, finding that existing methods produce excess false positives. We conduct an extensive empirical investigation of the data, identifying three core analysis challenges: sparsity, confounding, and model misspecification. Finally, we develop an association testing method - SCEPTRE low-MOI - that resolves these analysis challenges and demonstrates improved calibration and power.

3.
Genome Biol ; 25(1): 124, 2024 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-38760839

RESUMO

Single-cell CRISPR screens (perturb-seq) link genetic perturbations to phenotypic changes in individual cells. The most fundamental task in perturb-seq analysis is to test for association between a perturbation and a count outcome, such as gene expression. We conduct the first-ever comprehensive benchmarking study of association testing methods for low multiplicity-of-infection (MOI) perturb-seq data, finding that existing methods produce excess false positives. We conduct an extensive empirical investigation of the data, identifying three core analysis challenges: sparsity, confounding, and model misspecification. Finally, we develop an association testing method - SCEPTRE low-MOI - that resolves these analysis challenges and demonstrates improved calibration and power.


Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Sistemas CRISPR-Cas , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas
4.
bioRxiv ; 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38562830

RESUMO

Over 1,100 independent signals have been identified with genome-wide association studies (GWAS) for bone mineral density (BMD), a key risk factor for mortality-increasing fragility fractures; however, the effector gene(s) for most remain unknown. Informed by a variant-to-gene mapping strategy implicating 89 non-coding elements predicted to regulate osteoblast gene expression at BMD GWAS loci, we executed a single-cell CRISPRi screen in human fetal osteoblast 1.19 cells (hFOBs). The BMD relevance of hFOBs was supported by heritability enrichment from cross-cell type stratified LD-score regression involving 98 cell types grouped into 15 tissues. 24 genes showed perturbation in the screen, with four (ARID5B, CC2D1B, EIF4G2, and NCOA3) exhibiting consistent effects upon siRNA knockdown on three measures of osteoblast maturation and mineralization. Lastly, additional heritability enrichments, genetic correlations, and multi-trait fine-mapping revealed that many BMD GWAS signals are pleiotropic and likely mediate their effects via non-bone tissues that warrant attention in future screens.

5.
J Am Stat Assoc ; 118(541): 165-176, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37346227

RESUMO

Scientific hypotheses in a variety of applications have domain-specific structures, such as the tree structure of the International Classification of Diseases (ICD), the directed acyclic graph structure of the Gene Ontology (GO), or the spatial structure in genome-wide association studies. In the context of multiple testing, the resulting relationships among hypotheses can create redundancies among rejections that hinder interpretability. This leads to the practice of filtering rejection sets obtained from multiple testing procedures, which may in turn invalidate their inferential guarantees. We propose Focused BH, a simple, flexible, and principled methodology to adjust for the application of any pre-specified filter. We prove that Focused BH controls the false discovery rate under various conditions, including when the filter satisfies an intuitive monotonicity property and the p-values are positively dependent. We demonstrate in simulations that Focused BH performs well across a variety of settings, and illustrate this method's practical utility via analyses of real datasets based on ICD and GO.

6.
Science ; 380(6646): eadh7699, 2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37141313

RESUMO

Most variants associated with complex traits and diseases identified by genome-wide association studies (GWAS) map to noncoding regions of the genome with unknown effects. Using ancestrally diverse, biobank-scale GWAS data, massively parallel CRISPR screens, and single-cell transcriptomic and proteomic sequencing, we discovered 124 cis-target genes of 91 noncoding blood trait GWAS loci. Using precise variant insertion through base editing, we connected specific variants with gene expression changes. We also identified trans-effect networks of noncoding loci when cis target genes encoded transcription factors or microRNAs. Networks were themselves enriched for GWAS variants and demonstrated polygenic contributions to complex traits. This platform enables massively parallel characterization of the target genes and mechanisms of human noncoding variants in both cis and trans.


Assuntos
Doença , Estudo de Associação Genômica Ampla , Herança Multifatorial , Locos de Características Quantitativas , Análise de Célula Única , Humanos , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único , Proteômica , Células Sanguíneas , RNA-Seq , Doença/genética
7.
Biometrika ; 109(2): 277-293, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37416628

RESUMO

We consider the problem of conditional independence testing: given a response Y and covariates (X,Z), we test the null hypothesis that Y⫫X∣Z. The conditional randomization test was recently proposed as a way to use distributional information about X∣Z to exactly and nonasymptotically control Type-I error using any test statistic in any dimensionality without assuming anything about Y∣(X,Z). This flexibility, in principle, allows one to derive powerful test statistics from complex prediction algorithms while maintaining statistical validity. Yet the direct use of such advanced test statistics in the conditional randomization test is prohibitively computationally expensive, especially with multiple testing, due to the requirement to recompute the test statistic many times on resampled data. We propose the distilled conditional randomization test, a novel approach to using state-of-the-art machine learning algorithms in the conditional randomization test while drastically reducing the number of times those algorithms need to be run, thereby taking advantage of their power and the conditional randomization test's statistical guarantees without suffering the usual computational expense. In addition to distillation, we propose a number of other tricks, like screening and recycling computations, to further speed up the conditional randomization test without sacrificing its high power and exact validity. Indeed, we show in simulations that all our proposals combined lead to a test that has similar power to the most powerful existing conditional randomization test implementations, but requires orders of magnitude less computation, making it a practical tool even for large datasets. We demonstrate these benefits on a breast cancer dataset by identifying biomarkers related to cancer stage.

8.
Genome Biol ; 22(1): 344, 2021 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-34930414

RESUMO

Single-cell CRISPR screens are a promising biotechnology for mapping regulatory elements to target genes at genome-wide scale. However, technical factors like sequencing depth impact not only expression measurement but also perturbation detection, creating a confounding effect. We demonstrate on two single-cell CRISPR screens how these challenges cause calibration issues. We propose SCEPTRE: analysis of single-cell perturbation screens via conditional resampling, which infers associations between perturbations and expression by resampling the former according to a working model for perturbation detection probability in each cell. SCEPTRE demonstrates very good calibration and sensitivity on CRISPR screen data, yielding hundreds of new regulatory relationships supported by orthogonal biological evidence.


Assuntos
Sistemas CRISPR-Cas , Genoma Humano , Análise de Célula Única/métodos , Calibragem , Sequenciamento de Cromatina por Imunoprecipitação/métodos , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Edição de Genes , Expressão Gênica , Humanos
9.
Nat Commun ; 11(1): 1799, 2020 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-32265451

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

10.
Nat Commun ; 11(1): 1093, 2020 02 27.
Artigo em Inglês | MEDLINE | ID: mdl-32107378

RESUMO

In the statistical analysis of genome-wide association data, it is challenging to precisely localize the variants that affect complex traits, due to linkage disequilibrium, and to maximize power while limiting spurious findings. Here we report on KnockoffZoom: a flexible method that localizes causal variants at multiple resolutions by testing the conditional associations of genetic segments of decreasing width, while provably controlling the false discovery rate. Our method utilizes artificial genotypes as negative controls and is equally valid for quantitative and binary phenotypes, without requiring any assumptions about their genetic architectures. Instead, we rely on well-established genetic models of linkage disequilibrium. We demonstrate that our method can detect more associations than mixed effects models and achieve fine-mapping precision, at comparable computational cost. Lastly, we apply KnockoffZoom to data from 350k subjects in the UK Biobank and report many new findings.


Assuntos
Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Desequilíbrio de Ligação , Modelos Genéticos , Algoritmos , Mapeamento Cromossômico/métodos , Conjuntos de Dados como Assunto , Estudos de Viabilidade , Humanos , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Software
11.
Ann Appl Stat ; 13(1): 1-33, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31687060

RESUMO

We tackle the problem of selecting from among a large number of variables those that are "important" for an outcome. We consider situations where groups of variables are also of interest. For example, each variable might be a genetic polymorphism, and we might want to study how a trait depends on variability in genes, segments of DNA that typically contain multiple such polymorphisms. In this context, to discover that a variable is relevant for the outcome implies discovering that the larger entity it represents is also important. To guarantee meaningful results with high chance of replicability, we suggest controlling the rate of false discoveries for findings at the level of individual variables and at the level of groups. Building on the knockoff construction of Barber and Candès [Ann. Statist. 43 (2015) 2055-2085] and the multilayer testing framework of Barber and Ramdas [J. Roy. Statist. Soc. Ser. B 79 (2017) 1247-1268], we introduce the multilayer knockoff filter (MKF). We prove that MKF simultaneously controls the FDR at each resolution and use simulations to show that it incurs little power loss compared to methods that provide guarantees only for the discoveries of individual variables. We apply MKF to analyze a genetic dataset and find that it successfully reduces the number of false gene discoveries without a significant reduction in power.

12.
Sci Rep ; 9(1): 7793, 2019 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-31127124

RESUMO

The Gene Ontology (GO) is a central resource for functional-genomics research. Scientists rely on the functional annotations in the GO for hypothesis generation and couple it with high-throughput biological data to enhance interpretation of results. At the same time, the sheer number of concepts (>30,000) and relationships (>70,000) presents a challenge: it can be difficult to draw a comprehensive picture of how certain concepts of interest might relate with the rest of the ontology structure. Here we present new visualization strategies to facilitate the exploration and use of the information in the GO. We rely on novel graphical display and software architecture that allow significant interaction. To illustrate the potential of our strategies, we provide examples from high-throughput genomic analyses, including chromatin immunoprecipitation experiments and genome-wide association studies. The scientist can also use our visualizations to identify gene sets that likely experience coordinated changes in their expression and use them to simulate biologically-grounded single cell RNA sequencing data, or conduct power studies for differential gene expression studies using our built-in pipeline. Our software and documentation are available at http://aegis.stanford.edu .


Assuntos
Ontologia Genética , Genômica , Software , Animais , Imunoprecipitação da Cromatina , Estudo de Associação Genômica Ampla , Genômica/métodos , Humanos , Análise de Sequência de RNA , Design de Software , Interface Usuário-Computador
13.
Proc IEEE Int Symp Biomed Imaging ; 2015: 200-204, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26682015

RESUMO

Classifying structural variability in noisy projections of biological macromolecules is a central problem in Cryo-EM. In this work, we build on a previous method for estimating the covariance matrix of the three-dimensional structure present in the molecules being imaged. Our proposed method allows for incorporation of contrast transfer function and non-uniform distribution of viewing angles, making it more suitable for real-world data. We evaluate its performance on a synthetic dataset and an experimental dataset obtained by imaging a 70S ribosome complex.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA