RESUMO
The lack of population-scale databases hampers research and diagnostics for medically relevant tandem repeats and repeat expansions. We attempt to fill this gap using our pathSTR web tool, which leverages long-read sequencing of large cohorts to determine repeat length and sequence composition in a healthy population. The current version includes 1040 individuals of the 1000 Genomes Project cohort sequenced on the Oxford Nanopore Technologies PromethION. A comprehensive set of medically relevant tandem repeats was genotyped using STRdust and LongTR to determine the tandem repeat length and sequence composition. PathSTR provides rich visualizations of this dataset and the feature to upload one's data for comparison along the control cohort. We demonstrate the implementation of this application using data from targeted nanopore sequencing of a patient with Myotonic Dystrophy type 1. This resource will empower the genetics community to get a more complete overview of normal variation in tandem repeat length and sequence composition and, as such, enable a better assessment of rare tandem repeat alleles observed in patients.
RESUMO
Long-read sequencing can resolve regions of the genome that are inaccessible to short reads, and therefore are ideal for genome-gap closure, solving structural rearrangements and sequencing through repetitive elements. Here we introduce the Xdrop technology: a novel microfluidic-based system that allows for targeted enrichment of long DNA molecules starting from only a few nanograms of DNA. Xdrop is based on the isolation of long DNA fragments in millions of droplets, where the droplets containing a target sequence of interest are fluorescently labeled and sorted using flow cytometry. The final product from the Xdrop procedure is an enriched population of long DNA molecules that can be investigated by sequencing. To demonstrate the capability of Xdrop, we performed enrichment of the human papilloma virus 18 integrated into the genome of human HeLa cells. Analysis of the sequencing reads resolved three HPV18-chr8 integrations at base-pair resolution, and the captured fragments extended up to 30 kb into the human genome at the integration sites. Further, we enriched the complete TP53 locus in a leukemia cell line and could successfully phase coexisting mutations using PacBio sequencing. In summary, our results show that Xdrop is an efficient enrichment technology for studying complex genomic regions.
Assuntos
Técnicas Analíticas Microfluídicas , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA , Células HeLa , Papillomavirus Humano 18/genética , Humanos , Células Jurkat , Técnicas de Amplificação de Ácido Nucleico , Análise de Sequência de DNA/métodos , Proteína Supressora de Tumor p53/genética , Integração ViralRESUMO
Accurate and contiguous genome assembly is key to a comprehensive understanding of the processes shaping genomic diversity and evolution. Yet, it is frequently constrained by constitutive heterochromatin, usually characterized by highly repetitive DNA. As a key feature of genome architecture associated with centromeric and subtelomeric regions, it locally influences meiotic recombination. In this study, we assess the impact of large tandem repeat arrays on the recombination rate landscape in an avian speciation model, the Eurasian crow. We assembled two high-quality genome references using single-molecule real-time sequencing (long-read assembly [LR]) and single-molecule optical maps (optical map assembly [OM]). A three-way comparison including the published short-read assembly (SR) constructed for the same individual allowed assessing assembly properties and pinpointing misassemblies. By combining information from all three assemblies, we characterized 36 previously unidentified large repetitive regions in the proximity of sequence assembly breakpoints, the majority of which contained complex arrays of a 14-kb satellite repeat or its 1.2-kb subunit. Using whole-genome population resequencing data, we estimated the population-scaled recombination rate (ρ) and found it to be significantly reduced in these regions. These findings are consistent with an effect of low recombination in regions adjacent to centromeric or subtelomeric heterochromatin and add to our understanding of the processes generating widespread heterogeneity in genetic diversity and differentiation along the genome. By combining three different technologies, our results highlight the importance of adding a layer of information on genome structure that is inaccessible to each approach independently.
Assuntos
Mapeamento de Sequências Contíguas/normas , Genoma , Sequências de Repetição em Tandem , Animais , Cromatina/genética , Cromatina/metabolismo , Mapeamento de Sequências Contíguas/métodos , Corvos/genética , Recombinação Homóloga , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normasRESUMO
Amplification of DNA is required as a mandatory step during library preparation in most targeted sequencing protocols. This can be a critical limitation when targeting regions that are highly repetitive or with extreme guanine-cytosine (GC) content, including repeat expansions associated with human disease. Here, we used an amplification-free protocol for targeted enrichment utilizing the CRISPR/Cas9 system (No-Amp Targeted sequencing) in combination with single molecule, real-time (SMRT) sequencing for studying repeat elements in the huntingtin (HTT) gene, where an expanded CAG repeat is causative for Huntington disease. We also developed a robust data analysis pipeline for repeat element analysis that is independent of alignment of reads to a reference genome. The method was applied to 11 diagnostic blood samples, and for all 22 alleles the resulting CAG repeat count agreed with previous results based on fragment analysis. The amplification-free protocol also allowed for studying somatic variability of repeat elements in our samples, without the interference of PCR stutter. In summary, with No-Amp Targeted sequencing in combination with our analysis pipeline, we could accurately study repeat elements that are difficult to investigate using PCR-based methods.
Assuntos
Genoma Humano/genética , Proteína Huntingtina/genética , Doença de Huntington/genética , Expansão das Repetições de Trinucleotídeos/genética , Alelos , Ataxina-10/genética , Proteína C9orf72/genética , Sistemas CRISPR-Cas/genética , Proteína do X Frágil da Deficiência Intelectual/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Doença de Huntington/patologia , RNA Guia de Cinetoplastídeos/genética , Análise de Sequência de DNARESUMO
BACKGROUND: The evolution of mutations in the BCR-ABL1 fusion gene transcript renders CML patients resistant to tyrosine kinase inhibitor (TKI) based therapy. Thus screening for BCR-ABL1 mutations is recommended particularly in patients experiencing poor response to treatment. Herein we describe a novel approach for the detection and surveillance of BCR-ABL1 mutations in CML patients. METHODS: To detect mutations in the BCR-ABL1 transcript we developed an assay based on the Pacific Biosciences (PacBio) sequencing technology, which allows for single-molecule long-read sequencing of BCR-ABL1 fusion transcript molecules. Samples from six patients with poor response to therapy were analyzed both at diagnosis and follow-up. cDNA was generated from total RNA and a 1,6 kb fragment encompassing the BCR-ABL1 transcript was amplified using long range PCR. To estimate the sensitivity of the assay, a serial dilution experiment was performed. RESULTS: Over 10,000 full-length BCR-ABL1 sequences were obtained for all samples studied. Through the serial dilution analysis, mutations in CML patient samples could be detected down to a level of at least 1%. Notably, the assay was determined to be sufficiently sensitive even in patients harboring a low abundance of BCR-ABL1 levels. The PacBio sequencing successfully identified all mutations seen by standard methods. Importantly, we identified several mutations that escaped detection by the clinical routine analysis. Resistance mutations were found in all but one of the patients. Due to the long reads afforded by PacBio sequencing, compound mutations present in the same molecule were readily distinguished from independent alterations arising in different molecules. Moreover, several transcript isoforms of the BCR-ABL1 transcript were identified in two of the CML patients. Finally, our assay allowed for a quick turn around time allowing samples to be reported upon within 2 days. CONCLUSIONS: In summary the PacBio sequencing assay can be applied to detect BCR-ABL1 resistance mutations in both diagnostic and follow-up CML patient samples using a simple protocol applicable to routine diagnosis. The method besides its sensitivity, gives a complete view of the clonal distribution of mutations, which is of importance when making therapy decisions.
Assuntos
Processamento Alternativo , Evolução Clonal/genética , Proteínas de Fusão bcr-abl/genética , Leucemia Mielogênica Crônica BCR-ABL Positiva/genética , Mutação , Análise de Sequência de RNA , Adulto , Idoso , Alelos , Resistencia a Medicamentos Antineoplásicos/genética , Feminino , Regulação Leucêmica da Expressão Gênica/efeitos dos fármacos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Leucemia Mielogênica Crônica BCR-ABL Positiva/tratamento farmacológico , Masculino , Pessoa de Meia-Idade , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Proteínas Quinases/uso terapêutico , Isoformas de RNA , Sensibilidade e EspecificidadeRESUMO
X-chromosome inactivation (XCI) analyses often assist in diagnostics of X-linked traits, however accurate assessment remains challenging with current methods. We developed a novel strategy using amplification-free Cas9 enrichment and Oxford nanopore technologies sequencing called XCI-ONT, to investigate and rigorously quantify XCI in human androgen receptor gene (AR) and human X-linked retinitis pigmentosa 2 gene (RP2). XCI-ONT measures methylation over 116 CpGs in AR and 58 CpGs in RP2, and separate parental X-chromosomes without PCR bias. We show the usefulness of the XCI-ONT strategy over the PCR-based golden standard XCI technique that only investigates one or two CpGs per gene. The results highlight the limitations of using the golden standard technique when the XCI pattern is partially skewed and the advantages of XCI-ONT to rigorously quantify XCI. This study provides a universal XCI-method on DNA, which is highly valuable in clinical and research framework of X-linked traits.
Assuntos
Sequenciamento por Nanoporos , Humanos , DNA , Genes Ligados ao Cromossomo X , Inativação do Cromossomo X/genética , Cromossomos Humanos X/genéticaRESUMO
CRISPR-Cas9 genome editing has potential to cure diseases without current treatments, but therapies must be safe. Here we show that CRISPR-Cas9 editing can introduce unintended mutations in vivo, which are passed on to the next generation. By editing fertilized zebrafish eggs using four guide RNAs selected for off-target activity in vitro, followed by long-read sequencing of DNA from >1100 larvae, juvenile and adult fish across two generations, we find that structural variants (SVs), i.e., insertions and deletions ≥50 bp, represent 6% of editing outcomes in founder larvae. These SVs occur both at on-target and off-target sites. Our results also illustrate that adult founder zebrafish are mosaic in their germ cells, and that 26% of their offspring carries an off-target mutation and 9% an SV. Hence, pre-testing for off-target activity and SVs using patient material is advisable in clinical applications, to reduce the risk of unanticipated effects with potentially large implications.
Assuntos
Sistemas CRISPR-Cas , Edição de Genes/métodos , Peixe-Zebra/genética , Animais , DNA , Terapia Genética , Células Germinativas , Humanos , Mutação , RNA Guia de Cinetoplastídeos/genéticaRESUMO
BACKGROUND: One ongoing concern about CRISPR-Cas9 genome editing is that unspecific guide RNA (gRNA) binding may induce off-target mutations. However, accurate prediction of CRISPR-Cas9 off-target activity is challenging. Here, we present SMRT-OTS and Nano-OTS, two novel, amplification-free, long-read sequencing protocols for detection of gRNA-driven digestion of genomic DNA by Cas9 in vitro. RESULTS: The methods are assessed using the human cell line HEK293, re-sequenced at 18x coverage using highly accurate HiFi SMRT reads. SMRT-OTS and Nano-OTS are first applied to three different gRNAs targeting HEK293 genomic DNA, resulting in a set of 55 high-confidence gRNA cleavage sites identified by both methods. Twenty-five of these sites are not reported by off-target prediction software, either because they contain four or more single nucleotide mismatches or insertion/deletion mismatches, as compared with the human reference. Additional experiments reveal that 85% of Cas9 cleavage sites are also found by other in vitro-based methods and that on- and off-target sites are detectable in gene bodies where short-reads fail to uniquely align. Even though SMRT-OTS and Nano-OTS identify several sites with previously validated off-target editing activity in cells, our own CRISPR-Cas9 editing experiments in human fibroblasts do not give rise to detectable off-target mutations at the in vitro-predicted sites. However, indel and structural variation events are enriched at the on-target sites. CONCLUSIONS: Amplification-free long-read sequencing reveals Cas9 cleavage sites in vitro that would have been difficult to predict using computational tools, including in dark genomic regions inaccessible by short-read sequencing.
Assuntos
Sequência de Bases , Sistemas CRISPR-Cas , Biologia Computacional/métodos , Edição de Genes/métodos , DNA , Variação Genética , Genômica , Células HEK293 , Humanos , Mutação , Sequenciamento por Nanoporos , RNA Guia de Cinetoplastídeos , Análise de Sequência de DNA , SoftwareRESUMO
A meta-analysis of genome-wide association studies (GWAS) identified eight loci that are associated with heart rate variability (HRV), but candidate genes in these loci remain uncharacterized. We developed an image- and CRISPR/Cas9-based pipeline to systematically characterize candidate genes for HRV in live zebrafish embryos. Nine zebrafish orthologues of six human candidate genes were targeted simultaneously in eggs from fish that transgenically express GFP on smooth muscle cells (Tg[acta2:GFP]), to visualize the beating heart. An automated analysis of repeated 30 s recordings of beating atria in 381 live, intact zebrafish embryos at 2 and 5 days post-fertilization highlighted genes that influence HRV (hcn4 and si:dkey-65j6.2 [KIAA1755]); heart rate (rgs6 and hcn4); and the risk of sinoatrial pauses and arrests (hcn4). Exposure to 10 or 25 µM ivabradine-an open channel blocker of HCNs-for 24 h resulted in a dose-dependent higher HRV and lower heart rate at 5 days post-fertilization. Hence, our screen confirmed the role of established genes for heart rate and rhythm (RGS6 and HCN4); showed that ivabradine reduces heart rate and increases HRV in zebrafish embryos, as it does in humans; and highlighted a novel gene that plays a role in HRV (KIAA1755).
Assuntos
Bradicardia/genética , Frequência Cardíaca/fisiologia , Canais Disparados por Nucleotídeos Cíclicos Ativados por Hiperpolarização/genética , Contração Miocárdica/fisiologia , Proteínas RGS/genética , Animais , Animais Geneticamente Modificados , Bradicardia/diagnóstico por imagem , Bradicardia/metabolismo , Bradicardia/fisiopatologia , Sistemas CRISPR-Cas , Fármacos Cardiovasculares/farmacologia , Embrião não Mamífero , Genes Reporter , Estudo de Associação Genômica Ampla , Proteínas de Fluorescência Verde/genética , Proteínas de Fluorescência Verde/metabolismo , Frequência Cardíaca/efeitos dos fármacos , Humanos , Canais Disparados por Nucleotídeos Cíclicos Ativados por Hiperpolarização/antagonistas & inibidores , Canais Disparados por Nucleotídeos Cíclicos Ativados por Hiperpolarização/metabolismo , Ivabradina/farmacologia , Metanálise como Assunto , Contração Miocárdica/efeitos dos fármacos , Miócitos de Músculo Liso/citologia , Miócitos de Músculo Liso/efeitos dos fármacos , Miócitos de Músculo Liso/metabolismo , Imagem Óptica/métodos , Domínios de Homologia à Plecstrina/genética , Proteínas RGS/metabolismo , Peixe-ZebraRESUMO
The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.