RESUMEN
Indigenous Australians harbour rich and unique genomic diversity. However, Aboriginal and Torres Strait Islander ancestries are historically under-represented in genomics research and almost completely missing from reference datasets1-3. Addressing this representation gap is critical, both to advance our understanding of global human genomic diversity and as a prerequisite for ensuring equitable outcomes in genomic medicine. Here we apply population-scale whole-genome long-read sequencing4 to profile genomic structural variation across four remote Indigenous communities. We uncover an abundance of large insertion-deletion variants (20-49 bp; n = 136,797), structural variants (50 b-50 kb; n = 159,912) and regions of variable copy number (>50 kb; n = 156). The majority of variants are composed of tandem repeat or interspersed mobile element sequences (up to 90%) and have not been previously annotated (up to 62%). A large fraction of structural variants appear to be exclusive to Indigenous Australians (12% lower-bound estimate) and most of these are found in only a single community, underscoring the need for broad and deep sampling to achieve a comprehensive catalogue of genomic structural variation across the Australian continent. Finally, we explore short tandem repeats throughout the genome to characterize allelic diversity at 50 known disease loci5, uncover hundreds of novel repeat expansion sites within protein-coding genes, and identify unique patterns of diversity and constraint among short tandem repeat sequences. Our study sheds new light on the dimensions and dynamics of genomic structural variation within and beyond Australia.
Asunto(s)
Aborigenas Australianos e Isleños del Estrecho de Torres , Genoma Humano , Variación Estructural del Genoma , Humanos , Alelos , Australia/etnología , Aborigenas Australianos e Isleños del Estrecho de Torres/genética , Conjuntos de Datos como Asunto , Variaciones en el Número de Copia de ADN/genética , Sitios Genéticos/genética , Genética Médica , Variación Estructural del Genoma/genética , Genómica , Mutación INDEL/genética , Secuencias Repetitivas Esparcidas/genética , Repeticiones de Microsatélite/genética , Genoma Humano/genéticaRESUMEN
PURPOSE: Genome sequencing (GS)-specific diagnostic rates in prospective tightly ascertained exome sequencing (ES)-negative intellectual disability (ID) cohorts have not been reported extensively. METHODS: ES, GS, epigenetic signatures, and long-read sequencing diagnoses were assessed in 74 trios with at least moderate ID. RESULTS: The ES diagnostic yield was 42 of 74 (57%). GS diagnoses were made in 9 of 32 (28%) ES-unresolved families. Repeated ES with a contemporary pipeline on the GS-diagnosed families identified 8 of 9 single-nucleotide variations/copy-number variations undetected in older ES, confirming a GS-unique diagnostic rate of 1 in 32 (3%). Episignatures contributed diagnostic information in 9% with GS corroboration in 1 of 32 (3%) and diagnostic clues in 2 of 32 (6%). A genetic etiology for ID was detected in 51 of 74 (69%) families. Twelve candidate disease genes were identified. Contemporary ES followed by GS cost US$4976 (95% CI: $3704; $6969) per diagnosis and first-line GS at a cost of $7062 (95% CI: $6210; $8475) per diagnosis. CONCLUSION: Performing GS only in ID trios would be cost equivalent to ES if GS were available at $2435, about a 60% reduction from current prices. This study demonstrates that first-line GS achieves higher diagnostic rate than contemporary ES but at a higher cost.
Asunto(s)
Secuenciación del Exoma , Exoma , Discapacidad Intelectual , Humanos , Discapacidad Intelectual/genética , Discapacidad Intelectual/diagnóstico , Masculino , Femenino , Exoma/genética , Secuenciación del Exoma/economía , Estudios de Cohortes , Pruebas Genéticas/economía , Pruebas Genéticas/métodos , Secuenciación Completa del Genoma/economía , Niño , Genoma Humano/genética , Variaciones en el Número de Copia de ADN/genética , Polimorfismo de Nucleótido Simple/genética , PreescolarRESUMEN
Primary liver cancer is an increasing problem worldwide and is associated with significant mortality. A popular method of modeling liver cancer in mice is plasmid hydrodynamic tail vein injection (HTVI). However, plasmid-HTVI models rarely recapitulate the chronic liver injury which precedes the development of most human liver cancer. We sought to investigate how liver injury using thioacetamide contributes to the pathogenesis and progression of liver cancer in two oncogenic plasmid-HTVI-induced mouse liver cancer models. Fourteen-week-old male mice received double-oncogene plasmid-HTVI (SB/AKT/c-Met and SB/AKT/NRas) and then twice-weekly intraperitoneal injections of thioacetamide for 6 weeks. Liver tissue was examined for histopathological changes, including fibrosis and steatosis. Further characterization of fibrosis and inflammation was performed with immunostaining and real-time quantitative PCR. RNA sequencing with pathway analysis was used to explore novel pathways altered in the cancer models. Hepatocellular and cholangiocellular tumors were observed in mice injected with double-oncogene plasmid-HTVI models (SB/AKT/c-Met and SB/AKT/NRas). Thioacetamide induced mild fibrosis and increased alpha smooth muscle actin-expressing cells. However, the combination of plasmids and thioacetamide did not significantly increase tumor size, but increased multiplicity of small neoplastic lesions. Cancer and/or liver injury up-regulated profibrotic and proinflammatory genes while metabolic pathway genes were mostly down-regulated. We conclude that the liver injury microenvironment can interact with liver cancer and alter its presentation. However, the effects on cancer development vary depending on the genetic drivers with differing active oncogenic pathways. Therefore, the choice of plasmid-HTVI model and injury agent may influence the extent to which injury promotes liver cancer development.
Asunto(s)
Plásmidos , Tioacetamida , Animales , Plásmidos/genética , Tioacetamida/toxicidad , Masculino , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/patología , Neoplasias Hepáticas/inducido químicamente , Neoplasias Hepáticas/metabolismo , Modelos Animales de Enfermedad , Hígado/metabolismo , Hígado/patología , Ratones , Ratones Endogámicos C57BL , Neoplasias Hepáticas Experimentales/genética , Neoplasias Hepáticas Experimentales/patología , Neoplasias Hepáticas Experimentales/inducido químicamente , Neoplasias Hepáticas Experimentales/metabolismo , Cirrosis Hepática/genética , Cirrosis Hepática/metabolismo , Cirrosis Hepática/patología , Cirrosis Hepática/inducido químicamente , ADN/genética , ADN/metabolismoRESUMEN
BACKGROUND: Loss-of-function variants in MME (membrane metalloendopeptidase) are a known cause of recessive Charcot-Marie-Tooth Neuropathy (CMT). A deep intronic variant, MME c.1188+428A>G (NM_000902.5), was identified through whole genome sequencing (WGS) of two Australian families with recessive inheritance of axonal CMT using the seqr platform. MME c.1188+428A>G was detected in a homozygous state in Family 1, and in a compound heterozygous state with a known pathogenic MME variant (c.467del; p.Pro156Leufs*14) in Family 2. AIMS: We aimed to determine the pathogenicity of the MME c.1188+428A>G variant through segregation and splicing analysis. METHODS: The splicing impact of the deep intronic MME variant c.1188+428A>G was assessed using an in vitro exon-trapping assay. RESULTS: The exon-trapping assay demonstrated that the MME c.1188+428A>G variant created a novel splice donor site resulting in the inclusion of an 83 bp pseudoexon between MME exons 12 and 13. The incorporation of the pseudoexon into MME transcript is predicted to lead to a coding frameshift and premature termination codon (PTC) in MME exon 14 (p.Ala397ProfsTer47). This PTC is likely to result in nonsense mediated decay (NMD) of MME transcript leading to a pathogenic loss-of-function. INTERPRETATION: To our knowledge, this is the first report of a pathogenic deep intronic MME variant causing CMT. This is of significance as deep intronic variants are missed using whole exome sequencing screening methods. Individuals with CMT should be reassessed for deep intronic variants, with splicing impacts being considered in relation to the potential pathogenicity of variants.
Asunto(s)
Enfermedad de Charcot-Marie-Tooth , Metaloendopeptidasas , Empalme del ARN , Adulto , Femenino , Humanos , Masculino , Enfermedad de Charcot-Marie-Tooth/genética , Intrones , Metaloendopeptidasas/genética , Mutación , LinajeRESUMEN
Cerebellar ataxia, neuropathy and vestibular areflexia syndrome (CANVAS) is an autosomal recessive neurodegenerative disease, usually caused by biallelic AAGGG repeat expansions in RFC1. In this study, we leveraged whole genome sequencing data from nearly 10 000 individuals recruited within the Genomics England sequencing project to investigate the normal and pathogenic variation of the RFC1 repeat. We identified three novel repeat motifs, AGGGC (n = 6 from five families), AAGGC (n = 2 from one family) and AGAGG (n = 1), associated with CANVAS in the homozygous or compound heterozygous state with the common pathogenic AAGGG expansion. While AAAAG, AAAGGG and AAGAG expansions appear to be benign, we revealed a pathogenic role for large AAAGG repeat configuration expansions (n = 5). Long-read sequencing was used to characterize the entire repeat sequence, and six patients exhibited a pure AGGGC expansion, while the other patients presented complex motifs with AAGGG or AAAGG interruptions. All pathogenic motifs appeared to have arisen from a common haplotype and were predicted to form highly stable G quadruplexes, which have previously been demonstrated to affect gene transcription in other conditions. The assessment of these novel configurations is warranted in CANVAS patients with negative or inconclusive genetic testing. Particular attention should be paid to carriers of compound AAGGG/AAAGG expansions when the AAAGG motif is very large (>500 repeats) or the AAGGG motif is interrupted. Accurate sizing and full sequencing of the satellite repeat with long-read sequencing is recommended in clinically selected cases to enable accurate molecular diagnosis and counsel patients and their families.
Asunto(s)
Ataxia Cerebelosa , Enfermedades del Sistema Nervioso Periférico , Síndrome , Enfermedades Vestibulares , Humanos , Vestibulopatía Bilateral , Ataxia Cerebelosa/genética , Ataxia Cerebelosa/diagnóstico , Enfermedades Neurodegenerativas , Enfermedades del Sistema Nervioso Periférico/diagnóstico , Enfermedades del Sistema Nervioso Periférico/genética , Enfermedades Vestibulares/diagnóstico , Enfermedades Vestibulares/genéticaRESUMEN
Biallelic mutations in sorbitol dehydrogenase (SORD) have been recently identified as a common cause of recessive axonal Charcot-Marie-Tooth neuropathy (CMT2). We aimed to assess a novel long-read sequencing approach to overcome current limitations in SORD neuropathy diagnostics due to the SORD2P pseudogene and the phasing of biallelic mutations in recessive disease. We conducted a screen of our Australian whole exome sequencing (WES) CMT cohort to identify individuals with homozygous or compound heterozygous SORD variants. Individuals detected with SORD mutations then underwent long-read sequencing, clinical assessment, and serum sorbitol analysis. An individual was detected with compound heterozygous truncating mutations in SORD exon 7, NM_003104.5:c.625C>T (p.Arg209Ter) and NM_003104.5:c.757del (p.Ala253GlnfsTer27). Subsequent Oxford Nanopore Tech (ONT) long-read sequencing was used to successfully differentiate SORD from the highly homologous non-functional SORD2P pseudogene and confirmed that the mutations were biallelic through haplotype-resolved analysis. The patient presented with axonal sensorimotor polyneuropathy (CMT2) and ulnar neuropathy without compression at the elbow. Burning neuropathic pain in the forearms and feet was also reported and was exacerbated by alcohol consumption and improved with alcohol cessation. UPLC-tandem mass spectrometry confirmed that the patient had elevated serum sorbitol levels (12.0 mg/L) consistent with levels previously observed in patients with biallelic SORD mutations. This represents a novel clinical presentation and expands the phenotype associated with biallelic SORD mutations causing CMT2. Our study is the first report of long-read sequencing for an individual with CMT and demonstrates the utility of this approach for clinical genomics.
Asunto(s)
Enfermedad de Charcot-Marie-Tooth , L-Iditol 2-Deshidrogenasa , Australia , Enfermedad de Charcot-Marie-Tooth/diagnóstico , Enfermedad de Charcot-Marie-Tooth/genética , Humanos , L-Iditol 2-Deshidrogenasa/genética , Mutación , Linaje , Fenotipo , Sorbitol , Secuenciación del ExomaRESUMEN
Activation-induced deaminase (AID) initiates hypermutation of Ig genes in activated B cells by converting C:G into U:G base pairs. G1-phase variants of uracil base excision repair (BER) and mismatch repair (MMR) then deploy translesion polymerases including REV1 and Pol η, which exacerbates mutation. dNTP paucity may contribute to hypermutation, because dNTP levels are reduced in G1 phase to inhibit viral replication. To derestrict G1-phase dNTP supply, we CRISPR-inactivated SAMHD1 (which degrades dNTPs) in germinal center B cells. Samhd1 inactivation increased B cell virus susceptibility, increased transition mutations at C:G base pairs, and substantially decreased transversion mutations at A:T and C:G base pairs in both strands. We conclude that SAMHD1's restriction of dNTP supply enhances AID's mutagenicity and that the evolution of Ig hypermutation included the repurposing of antiviral mechanisms based on dNTP starvation.
Asunto(s)
Linfocitos B/inmunología , Fase G1/inmunología , Activación de Linfocitos , Mutación , Proteína 1 que Contiene Dominios SAM y HD , Hipermutación Somática de Inmunoglobulina/inmunología , Animales , Linfocitos B/citología , Citidina Desaminasa/inmunología , ADN Polimerasa Dirigida por ADN , Fase G1/genética , Masculino , Ratones , Ratones Transgénicos , Nucleotidiltransferasas/genética , Nucleotidiltransferasas/inmunología , Proteína 1 que Contiene Dominios SAM y HD/genética , Proteína 1 que Contiene Dominios SAM y HD/inmunologíaRESUMEN
AID deaminates C to U in either strand of Ig genes, exclusively producing C:G/G:C to T:A/A:T transition mutations if U is left unrepaired. Error-prone processing by UNG2 or mismatch repair diversifies mutation, predominantly at C:G or A:T base pairs, respectively. Here, we show that transversions at C:G base pairs occur by two distinct processing pathways that are dictated by sequence context. Within and near AGCT mutation hotspots, transversion mutation at C:G was driven by UNG2 without requirement for mismatch repair. Deaminations in AGCT were refractive both to processing by UNG2 and to high-fidelity base excision repair (BER) downstream of UNG2, regardless of mismatch repair activity. We propose that AGCT sequences resist faithful BER because they bind BER-inhibitory protein(s) and/or because hemi-deaminated AGCT motifs innately form a BER-resistant DNA structure. Distal to AGCT sequences, transversions at G were largely co-dependent on UNG2 and mismatch repair. We propose that AGCT-distal transversions are produced when apyrimidinic sites are exposed in mismatch excision patches, because completion of mismatch repair would require bypass of these sites.
Asunto(s)
Citidina Desaminasa/metabolismo , Reparación de la Incompatibilidad de ADN , Reparación del ADN , Mutación , Uracil-ADN Glicosidasa/metabolismo , Traslado Adoptivo , Animales , Emparejamiento Base , Secuencia de Bases , Masculino , Ratones Endogámicos C57BL , Uracilo/metabolismo , Uracil-ADN Glicosidasa/genéticaRESUMEN
The expression of genes encompasses their transcription into mRNA followed by translation into protein. In recent years, next-generation sequencing and mass spectrometry methods have profiled DNA, RNA and protein abundance in cells. However, there are currently no reference standards that are compatible across these genomic, transcriptomic and proteomic methods, and provide an integrated measure of gene expression. Here, we use synthetic biology principles to engineer a multi-omics control, termed pREF, that can act as a universal molecular standard for next-generation sequencing and mass spectrometry methods. The pREF sequence encodes 21 synthetic genes that can be in vitro transcribed into spike-in mRNA controls, and in vitro translated to generate matched protein controls. The synthetic genes provide qualitative controls that can measure sensitivity and quantitative accuracy of DNA, RNA and peptide detection. We demonstrate the use of pREF in metagenome DNA sequencing and RNA sequencing experiments and evaluate the quantification of proteins using mass spectrometry. Unlike previous spike-in controls, pREF can be independently propagated and the synthetic mRNA and protein controls can be sustainably prepared by recipient laboratories using common molecular biology techniques. Together, this provides a universal synthetic standard able to integrate genomic, transcriptomic and proteomic methods.
Asunto(s)
ADN , Proteómica , ARN Mensajero/genética , ARN Mensajero/metabolismo , ADN/genética , Genómica , ARNRESUMEN
The factors driving or preventing pathological expansion of tandem repeats remain largely unknown. Here, we assessed the FGF14 (GAA)·(TTC) repeat locus in 2,530 individuals by long-read and Sanger sequencing and identified a common 5'-flanking variant in 70.34% of alleles analyzed (3,463/4,923) that represents the phylogenetically ancestral allele and is present on all major haplotypes. This common sequence variation is present nearly exclusively on nonpathogenic alleles with fewer than 30 GAA-pure triplets and is associated with enhanced stability of the repeat locus upon intergenerational transmission and increased Fiber-seq chromatin accessibility.
Asunto(s)
Alelos , Factores de Crecimiento de Fibroblastos , Factores de Crecimiento de Fibroblastos/genética , Factores de Crecimiento de Fibroblastos/metabolismo , Humanos , Haplotipos , Variación Genética , Sitios GenéticosRESUMEN
Oculopharyngodistal myopathy (OPDM) is an inherited myopathy manifesting with ptosis, dysphagia and distal weakness. Pathologically it is characterised by rimmed vacuoles and intranuclear inclusions on muscle biopsy. In recent years CGG ⢠CCG repeat expansion in four different genes were identified in OPDM individuals in Asian populations. None of these have been found in affected individuals of non-Asian ancestry. In this study we describe the identification of CCG expansions in ABCD3, ranging from 118 to 694 repeats, in 35 affected individuals across eight unrelated OPDM families of European ancestry. ABCD3 transcript appears upregulated in fibroblasts and skeletal muscle from OPDM individuals, suggesting a potential role of over-expression of CCG repeat containing ABCD3 transcript in progressive skeletal muscle degeneration. The study provides further evidence of the role of non-coding repeat expansions in unsolved neuromuscular diseases and strengthens the association between the CGG ⢠CCG repeat motif and a specific pattern of muscle weakness.
Asunto(s)
Músculo Esquelético , Expansión de Repetición de Trinucleótido , Población Blanca , Humanos , Masculino , Femenino , Adulto , Expansión de Repetición de Trinucleótido/genética , Persona de Mediana Edad , Población Blanca/genética , Músculo Esquelético/patología , Transportadoras de Casetes de Unión a ATP/genética , Miopatías Estructurales Congénitas/genética , Miopatías Estructurales Congénitas/patología , Linaje , Anciano , Adulto Joven , Fibroblastos/metabolismo , Fibroblastos/patología , Debilidad Muscular/genética , Debilidad Muscular/patología , Adolescente , Distrofias MuscularesRESUMEN
Autosomal dominant polycystic kidney disease (ADPKD) is the most common monogenic cause of kidney failure and is primarily associated with PKD1 or PKD2. Approximately 10% of patients remain undiagnosed after standard genetic testing. We aimed to utilise short and long-read genome sequencing and RNA studies to investigate undiagnosed families. Patients with typical ADPKD phenotype and undiagnosed after genetic diagnostics were recruited. Probands underwent short-read genome sequencing, PKD1 and PKD2 coding and non-coding analyses and then genome-wide analysis. Targeted RNA studies investigated variants suspected to impact splicing. Those undiagnosed then underwent Oxford Nanopore Technologies long-read genome sequencing. From over 172 probands, 9 met inclusion criteria and consented. A genetic diagnosis was made in 8 of 9 (89%) families undiagnosed on prior genetic testing. Six had variants impacting splicing, five in non-coding regions of PKD1. Short-read genome sequencing identified novel branchpoint, AG-exclusion zone and missense variants generating cryptic splice sites and a deletion causing critical intron shortening. Long-read sequencing confirmed the diagnosis in one family. Most undiagnosed families with typical ADPKD have splice-impacting variants in PKD1. We describe a pragmatic method for diagnostic laboratories to assess PKD1 and PKD2 non-coding regions and validate suspected splicing variants through targeted RNA studies.
RESUMEN
Objective: Duchenne muscular dystrophy (DMD) is caused by pathogenic variants in the dystrophin gene (DMD). Hypermethylated CGG expansions within DIP2B 5' UTR are associated with an intellectual development disorder. Here, we demonstrate the diagnostic utility of genomic short-read sequencing (SRS) and transcriptome sequencing to identify a novel DMD structural variant (SV) and a DIP2B CGG expansion in a patient with DMD for whom conventional diagnostic testing failed to yield a genetic diagnosis. Methods: We performed genomic SRS, skeletal muscle transcriptome sequencing, and targeted programmable long-read sequencing (LRS). Results: The proband had a typical DMD clinical presentation, autism spectrum disorder (ASD), and dystrophinopathy on muscle biopsy. Transcriptome analysis identified 6 aberrantly expressed genes; DMD and DIP2B were the strongest underexpression and overexpression outliers, respectively. Genomic SRS identified a 216 kb paracentric inversion (NC_000023.11: g.33162217-33378800) overlapping 2 DMD promoters. ExpansionHunter indicated an expansion of 109 CGG repeats within the 5' UTR of DIP2B. Targeted genomic LRS confirmed the SV and genotyped the DIP2B repeat expansion as 270 CGG repeats. Discussion: Here, transcriptome data heavily guided genomic analysis to resolve a complex DMD inversion and a DIP2B repeat expansion. Longitudinal follow-up will be important for clarifying the clinical significance of the DIP2B genotype.
RESUMEN
Cerebellar ataxia, neuropathy and vestibular areflexia syndrome is a progressive, generally late-onset, neurological disorder associated with biallelic pentanucleotide expansions in Intron 2 of the RFC1 gene. The locus exhibits substantial genetic variability, with multiple pathogenic and benign pentanucleotide repeat alleles previously identified. To determine the contribution of pathogenic RFC1 expansions to neurological disease within an Australasian cohort and further investigate the heterogeneity exhibited at the locus, a combination of flanking and repeat-primed PCR was used to screen a cohort of 242 Australasian patients with neurological disease. Patients whose data indicated large gaps within expanded alleles following repeat-primed PCR, underwent targeted long-read sequencing to identify novel repeat motifs at the locus. To increase diagnostic yield, additional probes at the RFC1 repeat region were incorporated into the PathWest diagnostic laboratory targeted neurological disease gene panel to enable first-pass screening of the locus for all samples tested on the panel. Within the Australasian cohort, we detected known pathogenic biallelic expansions in 15.3% (n = 37) of patients. Thirty indicated biallelic AAGGG expansions, two had biallelic 'Maori alleles' [(AAAGG)exp(AAGGG)exp], two samples were compound heterozygous for the Maori allele and an AAGGG expansion, two samples had biallelic ACAGG expansions and one sample was compound heterozygous for the ACAGG and AAGGG expansions. Forty-five samples tested indicated the presence of biallelic expansions not known to be pathogenic. A large proportion (84%) showed complex interrupted patterns following repeat-primed PCR, suggesting that these expansions are likely to be comprised of more than one repeat motif, including previously unknown repeats. Using targeted long-read sequencing, we identified three novel repeat motifs in expanded alleles. Here, we also show that short-read sequencing can be used to reliably screen for the presence or absence of biallelic RFC1 expansions in all samples tested using the PathWest targeted neurological disease gene panel. Our results show that RFC1 pathogenic expansions make a substantial contribution to neurological disease in the Australasian population and further extend the heterogeneity of the locus. To accommodate the increased complexity, we outline a multi-step workflow utilizing both targeted short- and long-read sequencing to achieve a definitive genotype and provide accurate diagnoses for patients.
RESUMEN
Our understanding of the molecular pathology of posttraumatic stress disorder (PTSD) is evolving due to advances in sequencing technologies. With the recent emergence of Oxford Nanopore direct RNA-seq (dRNA-seq), it is now also possible to interrogate diverse RNA modifications, collectively known as the "epitranscriptome.". Here, we present our analyses of the male and female mouse amygdala transcriptome and epitranscriptome, obtained using parallel Illumina RNA-seq and Oxford Nanopore dRNA-seq, associated with the acquisition of PTSD-like fear induced by Pavlovian cued-fear conditioning. We report significant sex-specific differences in the amygdala transcriptional response during fear acquisition and a range of shared and dimorphic epitranscriptomic signatures. Differential RNA modifications are enriched among mRNA transcripts associated with neurotransmitter regulation and mitochondrial function, many of which have been previously implicated in PTSD. Very few differentially modified transcripts are also differentially expressed, suggesting an influential, expression-independent role for epitranscriptional regulation in PTSD-like fear acquisition.
RESUMEN
Library adaptors are short oligonucleotides that are attached to RNA and DNA samples in preparation for next-generation sequencing (NGS). Adaptors can also include additional functional elements, such as sample indexes and unique molecular identifiers, to improve library analysis. Here, we describe Control Library Adaptors, termed CAPTORs, that measure the accuracy and reliability of NGS. CAPTORs can be integrated within the library preparation of RNA and DNA samples, and their encoded information is retrieved during sequencing. We show how CAPTORs can measure the accuracy of nanopore sequencing, evaluate the quantitative performance of metagenomic and RNA sequencing, and improve normalisation between samples. CAPTORs can also be customised for clinical diagnoses, correcting systematic sequencing errors and improving the diagnosis of pathogenic BRCA1/2 variants in breast cancer. CAPTORs are a simple and effective method to increase the accuracy and reliability of NGS, enabling comparisons between samples, reagents and laboratories, and supporting the use of nanopore sequencing for clinical diagnosis.
Asunto(s)
Secuenciación de Nanoporos , Reproducibilidad de los Resultados , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARNRESUMEN
More than 50 neurological and neuromuscular diseases are caused by short tandem repeat (STR) expansions, with 37 different genes implicated to date. We describe the use of programmable targeted long-read sequencing with Oxford Nanopore's ReadUntil function for parallel genotyping of all known neuropathogenic STRs in a single assay. Our approach enables accurate, haplotype-resolved assembly and DNA methylation profiling of STR sites, from a list of predetermined candidates. This correctly diagnoses all individuals in a small cohort (n = 37) including patients with various neurogenetic diseases (n = 25). Targeted long-read sequencing solves large and complex STR expansions that confound established molecular tests and short-read sequencing and identifies noncanonical STR motif conformations and internal sequence interruptions. We observe a diversity of STR alleles of known and unknown pathogenicity, suggesting that long-read sequencing will redefine the genetic landscape of repeat disorders. Last, we show how the inclusion of pharmacogenomic genes as secondary ReadUntil targets can further inform patient care.
Asunto(s)
Secuenciación de Nanoporos , Alelos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Repeticiones de Microsatélite/genética , Análisis de Secuencia de ADNRESUMEN
Accumulating evidence supports the high prevalence of co-infections among Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) patients, and their potential to worsen the clinical outcome of COVID-19. However, there are few data on Southern Hemisphere populations, and most studies to date have investigated a narrow spectrum of viruses using targeted qRT-PCR. Here we assessed respiratory viral co-infections among SARS-CoV-2 patients in Australia, through respiratory virome characterization. Nasopharyngeal swabs of 92 SARS-CoV-2-positive cases were sequenced using pan-viral hybrid-capture and the Twist Respiratory Virus Panel. In total, 8% of cases were co-infected, with rhinovirus (6%) or influenzavirus (2%). Twist capture also achieved near-complete sequencing (> 90% coverage, > tenfold depth) of the SARS-CoV-2 genome in 95% of specimens with Ct < 30. Our results highlight the importance of assessing all pathogens in symptomatic patients, and the dual-functionality of Twist hybrid-capture, for SARS-CoV-2 whole-genome sequencing without amplicon generation and the simultaneous identification of viral co-infections with ease.
Asunto(s)
COVID-19/diagnóstico , COVID-19/virología , Coinfección/diagnóstico , Coinfección/virología , SARS-CoV-2/genética , Análisis de Secuencia de ADN , Viroma/genética , Australia/epidemiología , Coinfección/epidemiología , Biología Computacional , Genoma Viral , Humanos , Sistemas de Lectura Abierta/genética , Reproducibilidad de los Resultados , Secuenciación Completa del GenomaRESUMEN
Circulating tumor DNA (ctDNA) sequencing is being rapidly adopted in precision oncology, but the accuracy, sensitivity and reproducibility of ctDNA assays is poorly understood. Here we report the findings of a multi-site, cross-platform evaluation of the analytical performance of five industry-leading ctDNA assays. We evaluated each stage of the ctDNA sequencing workflow with simulations, synthetic DNA spike-in experiments and proficiency testing on standardized, cell-line-derived reference samples. Above 0.5% variant allele frequency, ctDNA mutations were detected with high sensitivity, precision and reproducibility by all five assays, whereas, below this limit, detection became unreliable and varied widely between assays, especially when input material was limited. Missed mutations (false negatives) were more common than erroneous candidates (false positives), indicating that the reliable sampling of rare ctDNA fragments is the key challenge for ctDNA assays. This comprehensive evaluation of the analytical performance of ctDNA assays serves to inform best practice guidelines and provides a resource for precision oncology.
Asunto(s)
ADN Tumoral Circulante/genética , Oncología Médica , Neoplasias/genética , Medicina de Precisión , Análisis de Secuencia de ADN/normas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Límite de Detección , Guías de Práctica Clínica como Asunto , Reproducibilidad de los ResultadosRESUMEN
The advent of portable nanopore sequencing devices has enabled DNA and RNA sequencing to be performed in the field or the clinic. However, advances in in situ genomics require parallel development of portable, offline solutions for the computational analysis of sequencing data. Here we introduce Genopo, a mobile toolkit for nanopore sequencing analysis. Genopo compacts popular bioinformatics tools to an Android application, enabling fully portable computation. To demonstrate its utility for in situ genome analysis, we use Genopo to determine the complete genome sequence of the human coronavirus SARS-CoV-2 in nine patient isolates sequenced on a nanopore device, with Genopo executing this workflow in less than 30 min per sample on a range of popular smartphones. We further show how Genopo can be used to profile DNA methylation in a human genome sample, illustrating a flexible, efficient architecture that is suitable to run many popular bioinformatics tools and accommodate small or large genomes. As the first ever smartphone application for nanopore sequencing analysis, Genopo enables the genomics community to harness this cheap, ubiquitous computational resource.