Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 113
Filter
1.
Am J Hum Genet ; 111(5): 990-995, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38636510

ABSTRACT

Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.


Subject(s)
Gene Frequency , Genotype , Polymorphism, Single Nucleotide , Software , Humans , Cohort Studies , Linkage Disequilibrium , Genome-Wide Association Study/methods , Genome, Human , Quality Control , Machine Learning , Whole Genome Sequencing/standards , Whole Genome Sequencing/methods
2.
Prenat Diagn ; 43(9): 1132-1141, 2023 08.
Article in English | MEDLINE | ID: mdl-37355983

ABSTRACT

OBJECTIVE: This study aimed to assess the diagnostic yield of prenatal genetic testing using trio whole exome sequencing (WES) and trio whole genome sequencing (WGS) in pregnancies with fetal anomalies by comparing the results with conventional chromosomal microarray (CMA) analysis. METHODS: A total of 40 pregnancies with fetal anomalies or increased nuchal translucency (NT ≥ 5 mm) were included between the 12th and 21st week of gestation. Trio WES/WGS and CMA were performed in all cases. RESULTS: The trio WES/WGS analysis increased the diagnostic yield by 25% in cases with negative CMA results. Furthermore, all six chromosomal aberrations identified by CMA were independently detected by WES/WGS analysis. In total, 16 out of 40 cases obtained a genetic sequence variant, copy number variant, or aneuploidy explaining the phenotype, resulting in an overall WES/WGS diagnostic yield of 40%. WES analysis provided a more reliable identification of mosaic sequence variants than WGS because of its higher sequencing depth. CONCLUSIONS: Prenatal WES/WGS proved to be powerful diagnostic tools for fetal anomalies, surpassing the diagnostic yield of CMA. They have the potential to serve as standalone methods for prenatal diagnosis. The study highlighted the limitations of WGS in accurately detecting mosaic variants, which is particularly relevant when analyzing chorionic villus samples.


Subject(s)
Exome Sequencing , Prenatal Diagnosis , Whole Genome Sequencing , Female , Humans , Pregnancy , Prenatal Diagnosis/methods , Whole Genome Sequencing/standards , Exome Sequencing/standards , Microarray Analysis/standards , Congenital Abnormalities/genetics , Genetic Variation/genetics
3.
Article in English | MEDLINE | ID: mdl-34964003

ABSTRACT

PURPOSE: Molecular tumor profiling is becoming a routine part of clinical cancer care, typically involving tumor-only panel testing without matched germline. We hypothesized that integrated germline sequencing could improve clinical interpretation and enhance the identification of germline variants with significant hereditary risks. MATERIALS AND METHODS: Tumors from pediatric patients with high-risk, extracranial solid malignancies were sequenced with a targeted panel of cancer-associated genes. Later, germline DNA was analyzed for a subset of these genes. We performed a post hoc analysis to identify how an integrated analysis of tumor and germline data would improve clinical interpretation. RESULTS: One hundred sixty participants with both tumor-only and germline sequencing reports were eligible for this analysis. Germline sequencing identified 38 pathogenic or likely pathogenic variants among 35 (22%) patients. Twenty-five (66%) of these were included in the tumor sequencing report. The remaining germline pathogenic or likely pathogenic variants were single-nucleotide variants filtered out of tumor-only analysis because of population frequency or copy-number variation masked by additional copy-number changes in the tumor. In tumor-only sequencing, 308 of 434 (71%) single-nucleotide variants reported were present in the germline, including 31% with suggested clinical utility. Finally, we provide further evidence that the variant allele fraction from tumor-only sequencing is insufficient to differentiate somatic from germline events. CONCLUSION: A paired approach to analyzing tumor and germline sequencing data would be expected to improve the efficiency and accuracy of distinguishing somatic mutations and germline variants, thereby facilitating the process of variant curation and therapeutic interpretation for somatic reports, as well as the identification of variants associated with germline cancer predisposition.


Subject(s)
Neoplasms/genetics , Whole Genome Sequencing/standards , Adolescent , Adult , Child , Child, Preschool , Female , Genetic Predisposition to Disease/genetics , Humans , Infant , Male , Precision Medicine/methods , Precision Medicine/standards , Precision Medicine/trends , Whole Genome Sequencing/methods , Whole Genome Sequencing/statistics & numerical data
4.
Cell Rep ; 37(7): 110017, 2021 11 16.
Article in English | MEDLINE | ID: mdl-34788621

ABSTRACT

The lack of haplotype reference panels and whole-genome sequencing resources specific to the Chinese population has greatly hindered genetic studies in the world's largest population. Here, we present the NyuWa genome resource, based on deep (26.2×) sequencing of 2,999 Chinese individuals, and construct a NyuWa reference panel of 5,804 haplotypes and 19.3 million variants, which is a high-quality publicly available Chinese population-specific reference panel with thousands of samples. Compared with other panels, the NyuWa reference panel reduces the Han Chinese imputation error rate by a margin ranging from 30% to 51%. Population structure and imputation simulation tests support the applicability of one integrated reference panel for northern and southern Chinese. In addition, a total of 22,504 loss-of-function variants in coding and noncoding genes are identified, including 11,493 novel variants. These results highlight the value of the NyuWa genome resource in facilitating genetic research in Chinese and Asian populations.


Subject(s)
Asian People/genetics , Genome/genetics , Genomics/methods , Alleles , China , Databases, Genetic , Gene Frequency/genetics , Genome, Human/genetics , Genome-Wide Association Study/methods , Genotype , Haplotypes/genetics , High-Throughput Nucleotide Sequencing/methods , Humans , Polymorphism, Single Nucleotide , Reference Standards , Whole Genome Sequencing/standards
6.
Nat Biotechnol ; 39(9): 1141-1150, 2021 09.
Article in English | MEDLINE | ID: mdl-34504346

ABSTRACT

Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.


Subject(s)
Benchmarking , Exome Sequencing/standards , Neoplasms/genetics , Sequence Analysis, DNA/standards , Whole Genome Sequencing/standards , Cell Line , Cell Line, Tumor , High-Throughput Nucleotide Sequencing/methods , Humans , Mutation , Neoplasms/pathology , Reproducibility of Results
7.
Nat Biotechnol ; 39(9): 1151-1160, 2021 09.
Article in English | MEDLINE | ID: mdl-34504347

ABSTRACT

The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.


Subject(s)
Benchmarking , Breast Neoplasms/genetics , DNA Mutational Analysis/standards , High-Throughput Nucleotide Sequencing/standards , Whole Genome Sequencing/standards , Cell Line, Tumor , Datasets as Topic , Germ Cells , Humans , Mutation , Reference Standards , Reproducibility of Results
8.
Viruses ; 13(9)2021 08 30.
Article in English | MEDLINE | ID: mdl-34578305

ABSTRACT

Despite the effectiveness of direct-acting antiviral agents in treating hepatitis C virus (HCV), cases of treatment failure have been associated with the emergence of resistance-associated substitutions. To better guide clinical decision-making, we developed and validated a near-whole-genome HCV genotype-independent next-generation sequencing strategy. HCV genotype 1-6 samples from direct-acting antiviral agent treatment-naïve and -treated HCV-infected individuals were included. Viral RNA was extracted using a NucliSens easyMAG and amplified using nested reverse transcription-polymerase chain reaction. Libraries were prepared using Nextera XT and sequenced on the Illumina MiSeq sequencing platform. Data were processed by an in-house pipeline (MiCall). Nucleotide consensus sequences were aligned to reference strain sequences for resistance-associated substitution identification and compared to NS3, NS5a, and NS5b sequence data obtained from a validated in-house assay optimized for HCV genotype 1. Sequencing success rates (defined as achieving >100-fold read coverage) approaching 90% were observed for most genotypes in samples with a viral load >5 log10 IU/mL. This genotype-independent sequencing method resulted in >99.8% nucleotide concordance with the genotype 1-optimized method, and 100% agreement in genotype assignment with paired line probe assay-based genotypes. The assay demonstrated high intra-run repeatability and inter-run reproducibility at detecting substitutions above 2% prevalence. This study highlights the performance of a freely available laboratory and bioinformatic approach for reliable HCV genotyping and resistance-associated substitution detection regardless of genotype.


Subject(s)
Genotype , Hepacivirus/genetics , Hepatitis C/virology , RNA, Viral/genetics , Whole Genome Sequencing/methods , Whole Genome Sequencing/standards , Genotyping Techniques , Hepacivirus/classification , Hepatitis C/diagnosis , Humans , Reproducibility of Results , Sensitivity and Specificity , Viral Load
9.
Viruses ; 13(7)2021 07 08.
Article in English | MEDLINE | ID: mdl-34372528

ABSTRACT

Next-generation sequencing (NGS) yields powerful opportunities for studying human papillomavirus (HPV) genomics for applications in epidemiology, public health, and clinical diagnostics. HPV genotypes, variants, and point mutations can be investigated in clinical materials and described in previously unprecedented detail. However, both the NGS laboratory analysis and bioinformatical approach require numerous steps and checks to ensure robust interpretation of results. Here, we provide a step-by-step review of recommendations for validation and quality assurance procedures of each step in the typical NGS workflow, with a focus on whole-genome sequencing approaches. The use of directed pilots and protocols to ensure optimization of sequencing data yield, followed by curated bioinformatical procedures, is particularly emphasized. Finally, the storage and sharing of data sets are discussed. The development of international standards for quality assurance should be a goal for the HPV NGS community, similar to what has been developed for other areas of sequencing efforts including microbiology and molecular pathology. We thus propose that it is time for NGS to be included in the global efforts on quality assurance and improvement of HPV-based testing and diagnostics.


Subject(s)
Genome, Viral , Genomics/standards , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Papillomaviridae/genetics , Whole Genome Sequencing/standards , Genomics/methods , Humans , Quality Control , Specimen Handling/methods , Specimen Handling/standards , Validation Studies as Topic , Workflow
10.
Sci Rep ; 11(1): 17171, 2021 08 25.
Article in English | MEDLINE | ID: mdl-34433869

ABSTRACT

Advances in whole genome amplification (WGA) techniques enable understanding of the genomic sequence at a single cell level. Demand for single cell dedicated WGA kits (scWGA) has led to the development of several commercial kit. To this point, no robust comparison of all available kits was performed. Here, we benchmark an economical assay, comparing all commercially available scWGA kits. Our comparison is based on targeted sequencing of thousands of genomic loci, including highly mutable regions, from a large cohort of human single cells. Using this approach we have demonstrated the superiority of Ampli1 in genome coverage and of RepliG in reduced error rate. In summary, we show that no single kit is optimal across all categories, highlighting the need for a dedicated kit selection in accordance with experimental requirements.


Subject(s)
Single-Cell Analysis/methods , Whole Genome Sequencing/methods , Cells, Cultured , Humans , Polymerase Chain Reaction/methods , Polymerase Chain Reaction/standards , Sensitivity and Specificity , Single-Cell Analysis/standards , Whole Genome Sequencing/standards
11.
Pathology ; 53(7): 902-911, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34274166

ABSTRACT

The adoption of whole genome sequencing (WGS) data over the past decade for pathogen surveillance, and decision-making for infectious diseases has rapidly transformed the landscape of clinical microbiology and public health. However, for successful transition to routine use of these techniques, it is crucial to ensure the WGS data generated meet defined quality standards for pathogen identification, typing, antimicrobial resistance detection and surveillance. Further, the ongoing development of these standards will ensure that the bioinformatic processes are capable of accurately identifying and characterising organisms of interest, and thereby facilitate the integration of WGS into routine clinical and public health laboratory setting. A pilot proficiency testing (PT) program for WGS of infectious agents was developed to facilitate widely applicable standardisation and benchmarking standards for WGS across a range of laboratories. The PT participating laboratories were required to generate WGS data from two bacterial isolates, and submit the raw data for independent bioinformatics analysis, as well as analyse the data with their own processes and answer relevant questions about the data. Overall, laboratories used a diverse range of bioinformatics tools and could generate and analyse high-quality data, either meeting or exceeding the minimum requirements. This pilot has provided valuable insight into the current state of genomics in clinical microbiology and public health laboratories across Australia. It will provide a baseline guide for the standardisation of WGS and enable the development of a PT program that allows an ongoing performance benchmark for accreditation of WGS-based test processes.


Subject(s)
Bacteria/genetics , Benchmarking/standards , Genome, Bacterial/genetics , Laboratories/standards , Whole Genome Sequencing/standards , Accreditation , Australia/epidemiology , Genomics , Humans , Laboratories, Clinical/standards , Laboratory Proficiency Testing , Public Health
12.
Genes (Basel) ; 12(6)2021 05 27.
Article in English | MEDLINE | ID: mdl-34071827

ABSTRACT

With limited access to trained clinical geneticists and/or genetic counselors in the majority of healthcare systems globally, and the expanding use of genetic testing in all specialties of medicine, many healthcare providers do not receive the relevant support to order the most appropriate genetic test for their patients. Therefore, it is essential to educate all healthcare providers about the basic concepts of genetic testing and how to properly utilize this testing for each patient. Here, we review the various genetic testing strategies and their utilization based on different clinical scenarios, and test characteristics, such as the types of genetic variation identified by each test, turnaround time, and diagnostic yield for different clinical indications. Additional considerations such as test cost, insurance reimbursement, and interpretation of variants of uncertain significance are also discussed. The goal of this review is to aid healthcare providers in utilizing the most appropriate, fastest, and most cost-effective genetic test for their patients, thereby increasing the likelihood of a timely diagnosis and reducing the financial burden on the healthcare system by eliminating unnecessary and redundant testing.


Subject(s)
Genetic Testing/methods , Pediatrics/methods , Practice Guidelines as Topic , Whole Genome Sequencing/methods , Genetic Testing/standards , Humans , Pediatrics/standards , Precision Medicine/methods , Precision Medicine/standards , Whole Genome Sequencing/standards
14.
Genes (Basel) ; 12(5)2021 04 26.
Article in English | MEDLINE | ID: mdl-33926025

ABSTRACT

Sequencing of whole microbial genomes has become a standard procedure for cluster detection, source tracking, outbreak investigation and surveillance of many microorganisms. An increasing number of laboratories are currently in a transition phase from classical methods towards next generation sequencing, generating unprecedented amounts of data. Since the precision of downstream analyses depends significantly on the quality of raw data generated on the sequencing instrument, a comprehensive, meaningful primary quality control is indispensable. Here, we present AQUAMIS, a Snakemake workflow for an extensive quality control and assembly of raw Illumina sequencing data, allowing laboratories to automatize the initial analysis of their microbial whole-genome sequencing data. AQUAMIS performs all steps of primary sequence analysis, consisting of read trimming, read quality control (QC), taxonomic classification, de-novo assembly, reference identification, assembly QC and contamination detection, both on the read and assembly level. The results are visualized in an interactive HTML report including species-specific QC thresholds, allowing non-bioinformaticians to assess the quality of sequencing experiments at a glance. All results are also available as a standard-compliant JSON file, facilitating easy downstream analyses and data exchange. We have applied AQUAMIS to analyze ~13,000 microbial isolates as well as ~1000 in-silico contaminated datasets, proving the workflow's ability to perform in high throughput routine sequencing environments and reliably predict contaminations. We found that intergenus and intragenus contaminations can be detected most accurately using a combination of different QC metrics available within AQUAMIS.


Subject(s)
Genome, Bacterial , Quality Control , Whole Genome Sequencing/methods , Contig Mapping/methods , Contig Mapping/standards , DNA Contamination , Escherichia coli , Listeria monocytogenes , Salmonella enterica , Sensitivity and Specificity , Software , Species Specificity , Whole Genome Sequencing/standards , Workflow
15.
Genes Genomics ; 43(7): 713-724, 2021 07.
Article in English | MEDLINE | ID: mdl-33864614

ABSTRACT

BACKGROUND: Illumina next generation sequencing (NGS) systems are the major sequencing platform in worldwide next-generation sequencing market. On the other hand, MGI Tech launched a series of new NGS equipment that promises to deliver high-quality sequencing data faster and at lower prices than Illumina's sequencing instruments. OBJECTIVE: In this study, we compared the performance of the two platform's major sequencing instruments-Illumina's NovaSeq 6000 and MGI's MGISEQ-2000 and DNBSEQ-T7-to test whether the MGISEQ-2000 and DNBSEQ-T7 sequencing instruments are also suitable for whole genome sequencing. METHODS: We sequenced two pairs of normal and tumor tissues from Korean lung cancer patients using the three platforms. Then, we called single nucleotide variants (SNVs) and insertion and deletion (indels) for somatic and germline variants to compare the performance among the three platforms. RESULTS: In quality control analysis, all of the three platforms showed high-quality scores and deep coverages. Comparison among the three platforms revealed that MGISEQ-2000 is most concordant with NovaSeq 6000 for germline SNVs and indels, and DNBSEQ-T7 is most concordant with NovaSeq 6000 for somatic SNVs and indels. CONCLUSIONS: These results suggest that the performances of the MGISEQ-2000 and DNBSEQ-T7 platforms are comparable to that of the Illumina NovaSeq 6000 platform and support the potential applicability of the MGISEQ-2000 and DNBSEQ-T7 platforms in actual genome analysis fields.


Subject(s)
High-Throughput Nucleotide Sequencing , Whole Genome Sequencing/methods , Genetic Variation , High-Throughput Nucleotide Sequencing/standards , Humans , Lung Neoplasms/genetics , Reference Values , Whole Genome Sequencing/standards
16.
Clin Microbiol Infect ; 27(7): 1036.e1-1036.e8, 2021 Jul.
Article in English | MEDLINE | ID: mdl-33813118

ABSTRACT

OBJECTIVES: Genotyping of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been instrumental in monitoring viral evolution and transmission during the pandemic. The quality of the sequence data obtained from these genotyping efforts depends on several factors, including the quantity/integrity of the input material, the technology, and laboratory-specific implementation. The current lack of guidelines for SARS-CoV-2 genotyping leads to inclusion of error-containing genome sequences in genomic epidemiology studies. We aimed to establish clear and broadly applicable recommendations for reliable virus genotyping. METHODS: We established and used a sequencing data analysis workflow that reliably identifies and removes technical artefacts; such artefacts can result in miscalls when using alternative pipelines to process clinical samples and synthetic viral genomes with an amplicon-based genotyping approach. We evaluated the impact of experimental factors, including viral load and sequencing depth, on correct sequence determination. RESULTS: We found that at least 1000 viral genomes are necessary to confidently detect variants in the SARS-CoV-2 genome at frequencies of ≥10%. The broad applicability of our recommendations was validated in over 200 clinical samples from six independent laboratories. The genotypes we determined for clinical isolates with sufficient quality cluster by sampling location and period. Our analysis also supports the rise in frequencies of 20A.EU1 and 20A.EU2, two recently reported European strains whose dissemination was facilitated by travel during the summer of 2020. CONCLUSIONS: We present much-needed recommendations for the reliable determination of SARS-CoV-2 genome sequences and demonstrate their broad applicability in a large cohort of clinical samples.


Subject(s)
COVID-19/diagnosis , Genotyping Techniques/standards , High-Throughput Nucleotide Sequencing/standards , SARS-CoV-2/genetics , Whole Genome Sequencing/standards , Artifacts , COVID-19/virology , Genome, Viral , Genotyping Techniques/methods , Guidelines as Topic , High-Throughput Nucleotide Sequencing/methods , Humans , RNA, Viral , Reproducibility of Results , SARS-CoV-2/isolation & purification , Sensitivity and Specificity , Whole Genome Sequencing/methods , Workflow
17.
Mol Genet Genomic Med ; 9(4): e1653, 2021 04.
Article in English | MEDLINE | ID: mdl-33687149

ABSTRACT

BACKGROUND: Sufficient fetal fraction (FF) is crucial for quality control of NIPT (Non-Invasive Prenatal Test) results. Different factors influencing bioinformatic estimation of FF should be considered when implementing NIPT. To what extent the total number of sequencing reads influences FF estimate has been unexplored. In this study, to test the robustness of SeqFF FF estimation and provide additional recommendations for NIPT analysis quality control, we compared the SeqFF FF estimates with two other methods and investigated how the number of sequencing reads and FF level affects the accuracy and precision of FF estimates. METHODS: WGS data of 516 NIPT samples from a prenatal screening program was obtained. Sample data were randomly downsampled by the read count, and FF was calculated by SeqFF software. Then, the outcome was compared with FF estimates from SNP- and chrY-based methods. FF estimated with different read counts and FF levels were compared with FF at 30 M reads as a reference. RESULTS: SeqFF FF highly correlates with SNP- and chrY-based FF estimates. Raising read count from 2 M to 10 M drastically increased the accuracy of FF estimates. After adding more reads, we saw a further improvement in FF accuracy, reaching a plateau at 20 M reads. Precision of SeqFF FF estimate is independent of FF level in the sample. CONCLUSION: SeqFF is a robust method for FF estimation for both genders and for any FF level in range 2-13%. Accuracy of FF estimates highly depends on the read count. We recommend using no less than 10 M reads to achieve accurate FF estimates for NIPT analysis in clinical settings.


Subject(s)
Noninvasive Prenatal Testing/methods , Whole Genome Sequencing/methods , Cell-Free Nucleic Acids/genetics , Chromosomes, Human, Y/genetics , Data Accuracy , Female , Humans , Noninvasive Prenatal Testing/standards , Polymorphism, Single Nucleotide , Pregnancy , Reproducibility of Results , Whole Genome Sequencing/standards
18.
Am J Hum Genet ; 108(4): 656-668, 2021 04 01.
Article in English | MEDLINE | ID: mdl-33770507

ABSTRACT

Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.


Subject(s)
DNA Mutational Analysis/economics , DNA Mutational Analysis/standards , Genetic Variation/genetics , Genetics, Population/economics , Africa , DNA Mutational Analysis/methods , Genetics, Population/methods , Genome, Human/genetics , Genome-Wide Association Study , Health Equity , Humans , Microbiota , Whole Genome Sequencing/economics , Whole Genome Sequencing/standards
19.
Am J Hum Genet ; 108(5): 919-928, 2021 05 06.
Article in English | MEDLINE | ID: mdl-33789087

ABSTRACT

Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology. Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly. Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 9.7% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 91.4% of deletions that were specifically discovered by lrWGS localized to these regions. Across the remaining 90.3% of reference sequence, we observed extremely high (93.8%) concordance between technologies for deletions in these datasets. In contrast, lrWGS was superior for detection of insertions across all genomic contexts. Given that non-SD/SR sequences encompass 95.9% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental. However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment.


Subject(s)
Genome, Human/genetics , Genomic Structural Variation , Genomics/methods , Goals , Whole Genome Sequencing/methods , Whole Genome Sequencing/standards , DNA Copy Number Variations , Exons/genetics , Humans , Research Design , Segmental Duplications, Genomic , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL
...