RESUMO
CRISPR base editing screens enable analysis of disease-associated variants at scale; however, variable efficiency and precision confounds the assessment of variant-induced phenotypes. Here, we provide an integrated experimental and computational pipeline that improves estimation of variant effects in base editing screens. We use a reporter construct to measure guide RNA (gRNA) editing outcomes alongside their phenotypic consequences and introduce base editor screen analysis with activity normalization (BEAN), a Bayesian network that uses per-guide editing outcomes provided by the reporter and target site chromatin accessibility to estimate variant impacts. BEAN outperforms existing tools in variant effect quantification. We use BEAN to pinpoint common regulatory variants that alter low-density lipoprotein (LDL) uptake, implicating previously unreported genes. Additionally, through saturation base editing of LDLR, we accurately quantify missense variant pathogenicity that is consistent with measurements in UK Biobank patients and identify underlying structural mechanisms. This work provides a widely applicable approach to improve the power of base editing screens for disease-associated variant characterization.
Assuntos
Sistemas CRISPR-Cas , Edição de Genes , Genótipo , Fenótipo , RNA Guia de Sistemas CRISPR-Cas , Humanos , Edição de Genes/métodos , RNA Guia de Sistemas CRISPR-Cas/genética , Teorema de Bayes , Receptores de LDL/genética , Células HEK293RESUMO
CRISPR base editing screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, confounding the assessment of variant-induced phenotypic effects. Here, we provide an integrated pipeline that improves the estimation of variant impact in base editing screens. We perform high-throughput ABE8e-SpRY base editing screens with an integrated reporter construct to measure the editing efficiency and outcomes of each gRNA alongside their phenotypic consequences. We introduce BEAN, a Bayesian network that accounts for per-guide editing outcomes and target site chromatin accessibility to estimate variant impacts. We show this pipeline attains superior performance compared to existing tools in variant classification and effect size quantification. We use BEAN to pinpoint common variants that alter LDL uptake, implicating novel genes. Additionally, through saturation base editing of LDLR, we enable accurate quantitative prediction of the effects of missense variants on LDL-C levels, which aligns with measurements in UK Biobank individuals, and identify structural mechanisms underlying variant pathogenicity. This work provides a widely applicable approach to improve the power of base editor screens for disease-associated variant characterization.
RESUMO
Transcription factors (TFs) are key regulatory proteins that control the transcriptional rate of cells by binding short DNA sequences called transcription factor binding sites (TFBS) or motifs. Identifying and characterizing TFBS is fundamental to understanding the regulatory mechanisms governing the transcriptional state of cells. During the last decades, several experimental methods have been developed to recover DNA sequences containing TFBS. In parallel, computational methods have been proposed to discover and identify TFBS motifs based on these DNA sequences. This is one of the most widely investigated problems in bioinformatics and is referred to as the motif discovery problem. In this manuscript, we review classical and novel experimental and computational methods developed to discover and characterize TFBS motifs in DNA sequences, highlighting their advantages and drawbacks. We also discuss open challenges and future perspectives that could fill the remaining gaps in the field.
Assuntos
Algoritmos , Fatores de Transcrição , Ligação Proteica , Fatores de Transcrição/metabolismo , Sítios de Ligação , Sequência de Bases , Biologia ComputacionalRESUMO
CRISPR gene editing holds great promise to modify DNA sequences in somatic cells to treat disease. However, standard computational and biochemical methods to predict off-target potential focus on reference genomes. We developed an efficient tool called CRISPRme that considers single-nucleotide polymorphism (SNP) and indel genetic variants to nominate and prioritize off-target sites. We tested the software with a BCL11A enhancer targeting guide RNA (gRNA) showing promise in clinical trials for sickle cell disease and ß-thalassemia and found that the top candidate off-target is produced by an allele common in African-ancestry populations (MAF 4.5%) that introduces a protospacer adjacent motif (PAM) sequence. We validated that SpCas9 generates strictly allele-specific indels and pericentric inversions in CD34+ hematopoietic stem and progenitor cells (HSPCs), although high-fidelity Cas9 mitigates this off-target. This report illustrates how genetic variants should be considered as modifiers of gene editing outcomes. We expect that variant-aware off-target assessment will become integral to therapeutic genome editing evaluation and provide a powerful approach for comprehensive off-target nomination.
Assuntos
Sistemas CRISPR-Cas , Edição de Genes , Humanos , Edição de Genes/métodos , Sistemas CRISPR-Cas/genética , Células-Tronco Hematopoéticas , Mutação INDEL , RNA Guia de Sistemas CRISPR-CasRESUMO
Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO.