Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
Add more filters

Publication year range
1.
Blood ; 142(8): 711-723, 2023 08 24.
Article in English | MEDLINE | ID: mdl-37216686

ABSTRACT

Intrachromosomal amplification of chromosome 21 defines a subtype of high-risk childhood acute lymphoblastic leukemia (iAMP21-ALL) characterized by copy number changes and complex rearrangements of chromosome 21. The genomic basis of iAMP21-ALL and the pathogenic role of the region of amplification of chromosome 21 to leukemogenesis remains incompletely understood. In this study, using integrated whole genome and transcriptome sequencing of 124 patients with iAMP21-ALL, including rare cases arising in the context of constitutional chromosomal aberrations, we identified subgroups of iAMP21-ALL based on the patterns of copy number alteration and structural variation. This large data set enabled formal delineation of a 7.8 Mb common region of amplification harboring 71 genes, 43 of which were differentially expressed compared with non-iAMP21-ALL ones, including multiple genes implicated in the pathogenesis of acute leukemia (CHAF1B, DYRK1A, ERG, HMGN1, and RUNX1). Using multimodal single-cell genomic profiling, including single-cell whole genome sequencing of 2 cases, we documented clonal heterogeneity and genomic evolution, demonstrating that the acquisition of the iAMP21 chromosome is an early event that may undergo progressive amplification during disease ontogeny. We show that UV-mutational signatures and high mutation load are characteristic secondary genetic features. Although the genomic alterations of chromosome 21 are variable, these integrated genomic analyses and demonstration of an extended common minimal region of amplification broaden the definition of iAMP21-ALL for more precise diagnosis using cytogenetic or genomic methods to inform clinical management.


Subject(s)
Chromosomes, Human, Pair 21 , Precursor Cell Lymphoblastic Leukemia-Lymphoma , Humans , Child , Chromosomes, Human, Pair 21/genetics , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Chromosome Aberrations , Cytogenetics , Genomics , Chromatin Assembly Factor-1/genetics
2.
Pharmacogenomics J ; 21(2): 251-261, 2021 04.
Article in English | MEDLINE | ID: mdl-33462347

ABSTRACT

Responsible for the metabolism of ~21% of clinically used drugs, CYP2D6 is a critical component of personalized medicine initiatives. Genotyping CYP2D6 is challenging due to sequence similarity with its pseudogene paralog CYP2D7 and a high number and variety of common structural variants (SVs). Here we describe a novel bioinformatics method, Cyrius, that accurately genotypes CYP2D6 using whole-genome sequencing (WGS) data. We show that Cyrius has superior performance (96.5% concordance with truth genotypes) compared to existing methods (84-86.8%). After implementing the improvements identified from the comparison against the truth data, Cyrius's accuracy has since been improved to 99.3%. Using Cyrius, we built a haplotype frequency database from 2504 ethnically diverse samples and estimate that SV-containing star alleles are more frequent than previously reported. Cyrius will be an important tool to incorporate pharmacogenomics in WGS-based precision medicine initiatives.


Subject(s)
Cytochrome P-450 CYP2D6/genetics , Genotyping Techniques/methods , Alleles , Computational Biology/methods , Ethnicity/genetics , Genotype , Haplotypes/genetics , Humans , Polymorphism, Genetic/genetics , Whole Genome Sequencing/methods
3.
Genome Res ; 27(1): 157-164, 2017 01.
Article in English | MEDLINE | ID: mdl-27903644

ABSTRACT

Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased "Platinum" variant catalog of 4.7 million single-nucleotide variants (SNVs) plus 0.7 million small (1-50 bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and 11 children of this pedigree. Platinum genotypes are highly concordant with the current catalog of the National Institute of Standards and Technology for both SNVs (>99.99%) and indels (99.92%) and add a validated truth catalog that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission ("nonplatinum") revealed that the majority of these variants are de novo and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.


Subject(s)
Genome, Human/genetics , Genomics , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Algorithms , Databases, Genetic , Exome/genetics , Genotype , Humans , INDEL Mutation/genetics , Pedigree , Polymorphism, Single Nucleotide , Software
4.
Genome Res ; 27(11): 1895-1903, 2017 11.
Article in English | MEDLINE | ID: mdl-28887402

ABSTRACT

Identifying large expansions of short tandem repeats (STRs), such as those that cause amyotrophic lateral sclerosis (ALS) and fragile X syndrome, is challenging for short-read whole-genome sequencing (WGS) data. A solution to this problem is an important step toward integrating WGS into precision medicine. We developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3001 ALS patients who have been tested for the presence of the C9orf72 repeat expansion with repeat-primed PCR (RP-PCR). Compared against this truth data, ExpansionHunter correctly classified all (212/212, 95% CI [0.98, 1.00]) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9% (2786/2789, 95% CI [0.997, 1.00]) of the wild-type samples were correctly classified as wild type by this method with the remaining three samples identified as possible expansions. We further applied our algorithm to a set of 152 samples in which every sample had one of eight different pathogenic repeat expansions, including those associated with fragile X syndrome, Friedreich's ataxia, and Huntington's disease, and correctly flagged all but one of the known repeat expansions. Thus, ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions.


Subject(s)
Amyotrophic Lateral Sclerosis/genetics , DNA Repeat Expansion , Whole Genome Sequencing/methods , Algorithms , C9orf72 Protein/genetics , Databases, Genetic , Humans , Precision Medicine , Sensitivity and Specificity , Software
5.
Bioinformatics ; 35(22): 4754-4756, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31134279

ABSTRACT

SUMMARY: We describe a novel computational method for genotyping repeats using sequence graphs. This method addresses the long-standing need to accurately genotype medically important loci containing repeats adjacent to other variants or imperfect DNA repeats such as polyalanine repeats. Here we introduce a new version of our repeat genotyping software, ExpansionHunter, that uses this method to perform targeted genotyping of a broad class of such loci. AVAILABILITY AND IMPLEMENTATION: ExpansionHunter is implemented in C++ and is available under the Apache License Version 2.0. The source code, documentation, and Linux/macOS binaries are available at https://github.com/Illumina/ExpansionHunter/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Microsatellite Repeats , Software , Genotype
6.
Genet Med ; 22(5): 945-953, 2020 05.
Article in English | MEDLINE | ID: mdl-32066871

ABSTRACT

PURPOSE: Spinal muscular atrophy (SMA), caused by loss of the SMN1 gene, is a leading cause of early childhood death. Due to the near identical sequences of SMN1 and SMN2, analysis of this region is challenging. Population-wide SMA screening to quantify the SMN1 copy number (CN) is recommended by the American College of Medical Genetics and Genomics. METHODS: We developed a method that accurately identifies the CN of SMN1 and SMN2 using genome sequencing (GS) data by analyzing read depth and eight informative reference genome differences between SMN1/2. RESULTS: We characterized SMN1/2 in 12,747 genomes, identified 1568 samples with SMN1 gains or losses and 6615 samples with SMN2 gains or losses, and calculated a pan-ethnic carrier frequency of 2%, consistent with previous studies. Additionally, 99.8% of our SMN1 and 99.7% of SMN2 CN calls agreed with orthogonal methods, with a recall of 100% for SMA and 97.8% for carriers, and a precision of 100% for both SMA and carriers. CONCLUSION: This SMN copy-number caller can be used to identify both carrier and affected status of SMA, enabling SMA testing to be offered as a comprehensive test in neonatal care and an accurate carrier screening tool in GS sequencing projects.


Subject(s)
Muscular Atrophy, Spinal , Base Sequence , Child , Child, Preschool , Humans , Muscular Atrophy, Spinal/diagnosis , Muscular Atrophy, Spinal/genetics , Survival of Motor Neuron 1 Protein/genetics
7.
Genet Med ; 21(5): 1121-1130, 2019 05.
Article in English | MEDLINE | ID: mdl-30293986

ABSTRACT

PURPOSE: Current diagnostic testing for genetic disorders involves serial use of specialized assays spanning multiple technologies. In principle, genome sequencing (GS) can detect all genomic pathogenic variant types on a single platform. Here we evaluate copy-number variant (CNV) calling as part of a clinically accredited GS test. METHODS: We performed analytical validation of CNV calling on 17 reference samples, compared the sensitivity of GS-based variants with those from a clinical microarray, and set a bound on precision using orthogonal technologies. We developed a protocol for family-based analysis of GS-based CNV calls, and deployed this across a clinical cohort of 79 rare and undiagnosed cases. RESULTS: We found that CNV calls from GS are at least as sensitive as those from microarrays, while only creating a modest increase in the number of variants interpreted (~10 CNVs per case). We identified clinically significant CNVs in 15% of the first 79 cases analyzed, all of which were confirmed by an orthogonal approach. The pipeline also enabled discovery of a uniparental disomy (UPD) and a 50% mosaic trisomy 14. Directed analysis of select CNVs enabled breakpoint level resolution of genomic rearrangements and phasing of de novo CNVs. CONCLUSION: Robust identification of CNVs by GS is possible within a clinical testing environment.


Subject(s)
DNA Copy Number Variations/genetics , Rare Diseases/genetics , Undiagnosed Diseases/genetics , Adolescent , Child , Child, Preschool , Chromosome Mapping/methods , Cohort Studies , Female , Genetic Testing/methods , Genome, Human , Genomics/methods , Humans , Infant , Male , Rare Diseases/diagnosis , Undiagnosed Diseases/diagnosis , Whole Genome Sequencing/methods , Young Adult
8.
Genet Med ; 20(10): 1196-1205, 2018 10.
Article in English | MEDLINE | ID: mdl-29388947

ABSTRACT

PURPOSE: Fresh-frozen (FF) tissue is the optimal source of DNA for whole-genome sequencing (WGS) of cancer patients. However, it is not always available, limiting the widespread application of WGS in clinical practice. We explored the viability of using formalin-fixed, paraffin-embedded (FFPE) tissues, available routinely for cancer patients, as a source of DNA for clinical WGS. METHODS: We conducted a prospective study using DNAs from matched FF, FFPE, and peripheral blood germ-line specimens collected from 52 cancer patients (156 samples) following routine diagnostic protocols. We compared somatic variants detected in FFPE and matching FF samples. RESULTS: We found the single-nucleotide variant agreement reached 71% across the genome and somatic copy-number alterations (CNAs) detection from FFPE samples was suboptimal (0.44 median correlation with FF) due to nonuniform coverage. CNA detection was improved significantly with lower reverse crosslinking temperature in FFPE DNA extraction (80 °C or 65 °C depending on the methods). Our final data showed somatic variant detection from FFPE for clinical decision making is possible. We detected 98% of clinically actionable variants (including 30/31 CNAs). CONCLUSION: We present the first prospective WGS study of cancer patients using FFPE specimens collected in a routine clinical environment proving WGS can be applied in the clinic.


Subject(s)
DNA Copy Number Variations/genetics , Genome, Human/genetics , Neoplasms/genetics , Whole Genome Sequencing/methods , Decision Making , Female , Humans , Male , Neoplasms/blood , Neoplasms/pathology , Paraffin Embedding , Polymorphism, Single Nucleotide/genetics
10.
Nature ; 463(7278): 191-6, 2010 Jan 14.
Article in English | MEDLINE | ID: mdl-20016485

ABSTRACT

All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.


Subject(s)
Genes, Neoplasm/genetics , Genome, Human/genetics , Mutation/genetics , Neoplasms/genetics , Adult , Cell Line, Tumor , DNA Damage/genetics , DNA Mutational Analysis , DNA Repair/genetics , Gene Dosage/genetics , Humans , Loss of Heterozygosity/genetics , Male , Melanoma/etiology , Melanoma/genetics , MicroRNAs/genetics , Mutagenesis, Insertional/genetics , Neoplasms/etiology , Polymorphism, Single Nucleotide/genetics , Precision Medicine , Sequence Deletion/genetics , Ultraviolet Rays
11.
Nature ; 456(7218): 53-9, 2008 Nov 06.
Article in English | MEDLINE | ID: mdl-18987734

ABSTRACT

DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.


Subject(s)
Genome, Human/genetics , Genomics/methods , Sequence Analysis, DNA/methods , Chromosomes, Human, X/genetics , Consensus Sequence/genetics , Genomics/economics , Genotype , Humans , Male , Nigeria , Polymorphism, Single Nucleotide/genetics , Sensitivity and Specificity , Sequence Analysis, DNA/economics
12.
Haematologica ; 98(9): 1383-7, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23716552

ABSTRACT

The congenital dyserythropoietic anemias are a heterogeneous group of rare disorders primarily affecting erythropoiesis with characteristic morphological abnormalities and a block in erythroid maturation. Mutations in the CDAN1 gene, which encodes Codanin-1, underlie the majority of congenital dyserythropoietic anemia type I cases. However, no likely pathogenic CDAN1 mutation has been detected in approximately 20% of cases, suggesting the presence of at least one other locus. We used whole genome sequencing and segregation analysis to identify a homozygous T to A transversion (c.533T>A), predicted to lead to a p.L178Q missense substitution in C15ORF41, a gene of unknown function, in a consanguineous pedigree of Middle-Eastern origin. Sequencing C15ORF41 in other CDAN1 mutation-negative congenital dyserythropoietic anemia type I pedigrees identified a homozygous transition (c.281A>G), predicted to lead to a p.Y94C substitution, in two further pedigrees of SouthEast Asian origin. The haplotype surrounding the c.281A>G change suggests a founder effect for this mutation in Pakistan. Detailed sequence similarity searches indicate that C15ORF41 encodes a novel restriction endonuclease that is a member of the Holliday junction resolvase family of proteins.


Subject(s)
Anemia, Dyserythropoietic, Congenital/diagnosis , Anemia, Dyserythropoietic, Congenital/genetics , Glycoproteins/genetics , Homozygote , Mutation, Missense/genetics , Endonucleases/chemistry , Endonucleases/genetics , Female , Glycoproteins/chemistry , Humans , Male , Nuclear Proteins , Pedigree , Predictive Value of Tests , Protein Structure, Secondary , Protein Structure, Tertiary
13.
Cell Genom ; 3(2): 100258, 2023 Feb 08.
Article in English | MEDLINE | ID: mdl-36819666

ABSTRACT

Current standards in clinical genetics recognize the need to establish the validity of gene-disease relationships as a first step in the interpretation of sequence variants. We describe our experience incorporating the ClinGen Gene-Disease Clinical Validity framework in our interpretation and reporting workflow for a clinical genome sequencing (cGS) test for individuals with rare and undiagnosed genetic diseases. This "reactive" gene curation is completed upon identification of candidate variants during active case analysis and within the test turn-around time by focusing on the most impactful evidence and taking advantage of the broad applicability of the framework to cover a wide range of disease areas. We demonstrate that reactive gene curation can be successfully implemented in support of cGS in a clinical laboratory environment, enabling robust clinical decision making and allowing all variants to be fully and appropriately considered and their clinical significance confidently interpreted.

14.
Leukemia ; 37(3): 518-528, 2023 03.
Article in English | MEDLINE | ID: mdl-36658389

ABSTRACT

Childhood B-cell acute lymphoblastic leukaemia (B-ALL) is characterised by recurrent genetic abnormalities that drive risk-directed treatment strategies. Using current techniques, accurate detection of such aberrations can be challenging, due to the rapidly expanding list of key genetic abnormalities. Whole genome sequencing (WGS) has the potential to improve genetic testing, but requires comprehensive validation. We performed WGS on 210 childhood B-ALL samples annotated with clinical and genetic data. We devised a molecular classification system to subtype these patients based on identification of key genetic changes in tumour-normal and tumour-only analyses. This approach detected 294 subtype-defining genetic abnormalities in 96% (202/210) patients. Novel genetic variants, including fusions involving genes in the MAP kinase pathway, were identified. WGS results were concordant with standard-of-care methods and whole transcriptome sequencing (WTS). We expanded the catalogue of genetic profiles that reliably classify PAX5alt and ETV6::RUNX1-like subtypes. Our novel bioinformatic pipeline improved detection of DUX4 rearrangements (DUX4-r): a good-risk B-ALL subtype with high survival rates. Overall, we have validated that WGS provides a standalone, reliable genetic test to detect all subtype-defining genetic abnormalities in B-ALL, accurately classifying patients for the risk-directed treatment stratification, while simultaneously performing as a research tool to identify novel disease biomarkers.


Subject(s)
Precursor B-Cell Lymphoblastic Leukemia-Lymphoma , Precursor Cell Lymphoblastic Leukemia-Lymphoma , Humans , Precursor Cell Lymphoblastic Leukemia-Lymphoma/drug therapy , Precursor B-Cell Lymphoblastic Leukemia-Lymphoma/diagnosis , Precursor B-Cell Lymphoblastic Leukemia-Lymphoma/genetics , Computational Biology , Genetic Testing , Whole Genome Sequencing
15.
Leukemia ; 37(3): 529-538, 2023 03.
Article in English | MEDLINE | ID: mdl-36550215

ABSTRACT

Incorporating genetics into risk-stratification for treatment of childhood B-progenitor acute lymphoblastic leukaemia (B-ALL) has contributed significantly to improved survival. In about 30% B-ALL (B-other-ALL) without well-established chromosomal changes, new genetic subtypes have recently emerged, yet their true prognostic relevance largely remains unclear. We integrated next generation sequencing (NGS): whole genome sequencing (WGS) (n = 157) and bespoke targeted NGS (t-NGS) (n = 175) (overlap n = 36), with existing genetic annotation in a representative cohort of 351 B-other-ALL patients from the childhood ALL trail, UKALL2003. PAX5alt was most frequently observed (n = 91), whereas PAX5 P80R mutations (n = 11) defined a distinct PAX5 subtype. DUX4-r subtype (n = 80) was defined by DUX4 rearrangements and/or ERG deletions. These patients had a low relapse rate and excellent survival. ETV6::RUNX1-like subtype (n = 21) was characterised by multiple abnormalities of ETV6 and IKZF1, with no reported relapses or deaths, indicating their excellent prognosis in this trial. An inferior outcome for patients with ABL-class fusions (n = 25) was confirmed. Integration of NGS into genomic profiling of B-other-ALL within a single childhood ALL trial, UKALL2003, has shown the added clinical value of NGS-based approaches, through improved accuracy in detection and classification into the range of risk stratifying genetic subtypes, while validating their prognostic significance.


Subject(s)
Precursor B-Cell Lymphoblastic Leukemia-Lymphoma , Precursor Cell Lymphoblastic Leukemia-Lymphoma , Humans , Clinical Trials as Topic , Genetic Markers , Genomics , Neoplasm Recurrence, Local , Precursor B-Cell Lymphoblastic Leukemia-Lymphoma/genetics , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Prognosis , Child
16.
Curr Opin Genet Dev ; 16(6): 545-52, 2006 Dec.
Article in English | MEDLINE | ID: mdl-17055251

ABSTRACT

DNA sequencing can be used to gain important information on genes, genetic variation and gene function for biological and medical studies. The growing collection of publicly available reference genome sequences will underpin a new era of whole genome re-sequencing, but sequencing costs need to fall and throughput needs to rise by several orders of magnitude. Novel technologies are being developed to meet this need by generating massive amounts of sequence that can be aligned to the reference sequence. The challenge is to maintain the high standards of accuracy and completeness that are hallmarks of the previous genome projects. One or more new sequencing technologies are expected to become the mainstay of future research, and to make DNA sequencing centre stage as a routine tool in genetic research in the coming years.


Subject(s)
Genomics/methods , Sequence Analysis, DNA/methods , Genome, Human , Humans , Sequence Analysis, DNA/economics
17.
Curr Opin Genet Dev ; 16(3): 213-8, 2006 Jun.
Article in English | MEDLINE | ID: mdl-16650760

ABSTRACT

The sequences of both of the human sex chromosomes and of a substantial part of the chimpanzee Y chromosome have now been determined, and most of the protein-coding genes have been identified. The X chromosome codes for more than 800 proteins but the Y chromosome for only approximately 60, illustrating their very different evolutionary histories since their origin from an autosomal pair approximately 300 million years ago and explaining their differential importance in disease. These sequences have provided the basis for understanding normal patterns of variation, such as the distribution of SNPs, and patterns of linkage disequilibrium. In addition, they have been useful for identifying variants associated with simple Mendelian disorders such as microphthalmia or mental retardation, and more complex disorders such as osteoporosis.


Subject(s)
Chromosomes, Human, X/genetics , Chromosomes, Human, Y/genetics , Animals , Base Sequence , Genetic Diseases, Inborn/genetics , Genetic Variation/genetics , Genome, Human/genetics , Humans
18.
Nature ; 429(6990): 440-5, 2004 May 27.
Article in English | MEDLINE | ID: mdl-15164068

ABSTRACT

We have the human genome sequence. It is freely available, accurate and nearly complete. But is the genome ready for medicine? The new resource is already changing genetic research strategies to find information of medical value. Now we need high-quality annotation of all the functionally important sequences and the variations within them that contribute to health and disease. To achieve this, we need more genome sequences, systematic experimental analyses, and extensive information on human phenotypes. Flexible and user-friendly access to well-annotated genomes will create an environment for innovation, and the potential for unlimited use of sequencing in biomedical research and practice.


Subject(s)
Genetics, Medical/trends , Genome, Human , Genomics/trends , Medicine/trends , Genetic Variation , Humans
19.
Genome Biol ; 21(1): 102, 2020 04 28.
Article in English | MEDLINE | ID: mdl-32345345

ABSTRACT

Repeat expansions are responsible for over 40 monogenic disorders, and undoubtedly more pathogenic repeat expansions remain to be discovered. Existing methods for detecting repeat expansions in short-read sequencing data require predefined repeat catalogs. Recent discoveries emphasize the need for methods that do not require pre-specified candidate repeats. To address this need, we introduce ExpansionHunter Denovo, an efficient catalog-free method for genome-wide repeat expansion detection. Analysis of real and simulated data shows that our method can identify large expansions of 41 out of 44 pathogenic repeats, including nine recently reported non-reference repeat expansions not discoverable via existing methods.


Subject(s)
DNA Repeat Expansion , Software , Case-Control Studies , Fragile X Syndrome/genetics , Friedreich Ataxia/genetics , High-Throughput Nucleotide Sequencing , Humans , Huntington Disease/genetics , Microsatellite Repeats , Myotonic Dystrophy/genetics , Whole Genome Sequencing
SELECTION OF CITATIONS
SEARCH DETAIL