RESUMEN
Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensive, not scalable enough or too error-prone. Here we develop an efficient and scalable wet lab and computational protocol, Napu, for Oxford Nanopore Technologies long-read sequencing that seeks to address those limitations. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the National Institutes of Health Center for Alzheimer's and Related Dementias. Using a single PromethION flow cell, we can detect single nucleotide polymorphisms with F1-score comparable to Illumina short-read sequencing. Small indel calling remains difficult within homopolymers and tandem repeats, but achieves good concordance to Illumina indel calls elsewhere. Further, we can discover structural variants with F1-score on par with state-of-the-art de novo assembly methods. Our protocol phases small and structural variants at megabase scales and produces highly accurate, haplotype-specific methylation calls.
Asunto(s)
Genoma Humano , Secuenciación de Nanoporos , Humanos , Análisis de Secuencia de ADN/métodos , Haplotipos , Metilación , Proyectos Piloto , Secuenciación de Nucleótidos de Alto Rendimiento/métodosRESUMEN
DNA methylation most commonly occurs as 5-methylcytosine (5-mC) in the human genome and has been associated with human diseases. Recent developments in single-molecule sequencing technologies (Oxford Nanopore Technologies (ONT) and Pacific Biosciences) have enabled readouts of long, native DNA molecules, including cytosine methylation. ONT recently upgraded their Nanopore sequencing chemistry and kits from R9 to the R10 version, which yielded increased accuracy and sequencing throughput. However the effects on methylation detection have not yet been documented. Here we performed a series of computational analyses to characterize differences in Nanopore-based 5mC detection between the ONT R9 and R10 chemistries. We compared 5mC calls in R9 and R10 for three human genome datasets: a cell line, a frontal cortex brain sample, and a blood sample. We performed an in-depth analysis on CpG islands and homopolymer regions, and documented high concordance for methylation detection among sequencing technologies. The strongest correlation was observed between Nanopore R10 and Illumina bisulfite technologies for cell line-derived datasets. Subtle differences in methylation datasets between technologies can impact analysis tools such as differential methylation calling software. Our findings show that comparisons can be drawn between methylation data from different Nanopore chemistries using guided hypotheses. This work will facilitate comparison among Nanopore data cohorts derived using different chemistries from large scale sequencing efforts, such as the NIH CARD Long Read Initiative.
RESUMEN
Background: Mutations within the genes PRKN and PINK1 are the leading cause of early onset autosomal recessive Parkinson's disease (PD). However, the genetic cause of most early-onset PD (EOPD) cases still remains unresolved. Long-read sequencing has successfully identified many pathogenic structural variants that cause disease, but this technology has not been widely applied to PD. We recently identified the genetic cause of EOPD in a pair of monozygotic twins by uncovering a complex structural variant that spans over 7 Mb, utilizing Oxford Nanopore Technologies (ONT) long-read sequencing. In this study, we aimed to expand on this and assess whether a second variant could be detected with ONT long-read sequencing in other unresolved EOPD cases reported to carry one heterozygous variant in PRKN or PINK1. Methods: ONT long-read sequencing was performed on patients with one reported PRKN/PINK1 pathogenic variant. EOPD patients with an age at onset younger than 50 were included in this study. As a positive control, we also included EOPD patients who had already been identified to carry two known PRKN pathogenic variants. Initial genetic testing was performed using either short-read targeted panel sequencing for single nucleotide variants and multiplex ligation-dependent probe amplification (MLPA) for copy number variants. Results: 48 patients were included in this study (PRKN "one-variant" n = 24, PINK1 "one-variant" n = 12, PRKN "two-variants" n = 12). Using ONT long-read sequencing, we detected a second pathogenic variant in six PRKN "one-variant" patients (26%, 6/23) but none in the PINK1 "one-variant" patients (0%, 0/12). Long-read sequencing identified one case with a complex inversion, two instances of structural variant overlap, and three cases of duplication. In addition, in the positive control PRKN "two-variants" group, we were able to identify both pathogenic variants in PRKN in all the patients (100%, 12/12). Conclusions: This data highlights that ONT long-read sequencing is a powerful tool to identify a pathogenic structural variant at the PRKN locus that is often missed by conventional methods. Therefore, for cases where conventional methods fail to detect a second variant for EOPD, long-read sequencing should be considered as an alternative and complementary approach.
RESUMEN
Long-read sequencing technologies substantially overcome the limitations of short-reads but to date have not been considered as feasible replacement at scale due to a combination of being too expensive, not scalable enough, or too error-prone. Here, we develop an efficient and scalable wet lab and computational protocol for Oxford Nanopore Technologies (ONT) long-read sequencing that seeks to provide a genuine alternative to short-reads for large-scale genomics projects. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the NIH Center for Alzheimer's and Related Dementias (CARD). Using a single PromethION flow cell, we can detect SNPs with F1-score better than Illumina short-read sequencing. Small indel calling remains to be difficult inside homopolymers and tandem repeats, but is comparable to Illumina calls elsewhere. Further, we can discover structural variants with F1-score comparable to state-of the-art methods involving Pacific Biosciences HiFi sequencing and trio information (but at a lower cost and greater throughput). Using ONT based phasing, we can then combine and phase small and structural variants at megabase scales. Our protocol also produces highly accurate, haplotype-specific methylation calls. Overall, this makes large-scale long-read sequencing projects feasible; the protocol is currently being used to sequence thousands of brain-based genomes as a part of the NIH CARD initiative. We provide the protocol and software as open-source integrated pipelines for generating phased variant calls and assemblies.
RESUMEN
Determining the mechanisms by which the sex-chromosome complement (SCC) affects learning, attention, and impulsivity has implications for observed sex differences in prevalence, severity, and prognosis of psychiatric/neurodevelopmental disorders and syndromes associated with sex-chromosome aneuploidy. Here, Four Core Genotypes (FCG) mice were evaluated in order to assess the separable and/or interacting effects of gonads (testes vs. ovaries) and their secretions and/or SCC (XX vs. XY) acting via non-gonadal mechanisms on behavior. We tested FCG mice on a reversal-learning task that enables the quantification of aspects of learning, attention and impulsivity. Across testing phases (involving the initial acquisition of a spatial discrimination and subsequent reversal learning), overall error rate was larger in XY compared with XX mice. Although XX and XY groups did not differ in the total number of trials required in order to reach a preset performance criterion, analyses of reversal error types showed more perseverative errors in XY than XX mice, with no difference in regressive errors. Additionally, prepotent-response latencies during the reversal phase were shorter in XY males, as compared with both XX gonadal males and females of either SCC, and failures to sustain the observing response were more frequent in XY mice than XX mice during the acquisition phase. These results indicate that SCC affects the characteristic pattern of response selection during acquisition and reversal performance without affecting the overall learning rate. More broadly, these results show direct effects of the SCC on cognitive processes that are relevant to psychiatric/neurodevelopmental disorders and syndromes associated with sex-chromosome aneuploidies.