Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 113
Filtrar
1.
Am J Hum Genet ; 111(5): 990-995, 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38636510

RESUMEN

Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.


Asunto(s)
Frecuencia de los Genes , Genotipo , Polimorfismo de Nucleótido Simple , Programas Informáticos , Humanos , Estudios de Cohortes , Desequilibrio de Ligamiento , Estudio de Asociación del Genoma Completo/métodos , Genoma Humano , Control de Calidad , Aprendizaje Automático , Secuenciación Completa del Genoma/normas , Secuenciación Completa del Genoma/métodos
2.
Prenat Diagn ; 43(9): 1132-1141, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37355983

RESUMEN

OBJECTIVE: This study aimed to assess the diagnostic yield of prenatal genetic testing using trio whole exome sequencing (WES) and trio whole genome sequencing (WGS) in pregnancies with fetal anomalies by comparing the results with conventional chromosomal microarray (CMA) analysis. METHODS: A total of 40 pregnancies with fetal anomalies or increased nuchal translucency (NT ≥ 5 mm) were included between the 12th and 21st week of gestation. Trio WES/WGS and CMA were performed in all cases. RESULTS: The trio WES/WGS analysis increased the diagnostic yield by 25% in cases with negative CMA results. Furthermore, all six chromosomal aberrations identified by CMA were independently detected by WES/WGS analysis. In total, 16 out of 40 cases obtained a genetic sequence variant, copy number variant, or aneuploidy explaining the phenotype, resulting in an overall WES/WGS diagnostic yield of 40%. WES analysis provided a more reliable identification of mosaic sequence variants than WGS because of its higher sequencing depth. CONCLUSIONS: Prenatal WES/WGS proved to be powerful diagnostic tools for fetal anomalies, surpassing the diagnostic yield of CMA. They have the potential to serve as standalone methods for prenatal diagnosis. The study highlighted the limitations of WGS in accurately detecting mosaic variants, which is particularly relevant when analyzing chorionic villus samples.


Asunto(s)
Secuenciación del Exoma , Diagnóstico Prenatal , Secuenciación Completa del Genoma , Femenino , Humanos , Embarazo , Diagnóstico Prenatal/métodos , Secuenciación Completa del Genoma/normas , Secuenciación del Exoma/normas , Análisis por Micromatrices/normas , Anomalías Congénitas/genética , Variación Genética/genética
3.
Artículo en Inglés | MEDLINE | ID: mdl-34964003

RESUMEN

PURPOSE: Molecular tumor profiling is becoming a routine part of clinical cancer care, typically involving tumor-only panel testing without matched germline. We hypothesized that integrated germline sequencing could improve clinical interpretation and enhance the identification of germline variants with significant hereditary risks. MATERIALS AND METHODS: Tumors from pediatric patients with high-risk, extracranial solid malignancies were sequenced with a targeted panel of cancer-associated genes. Later, germline DNA was analyzed for a subset of these genes. We performed a post hoc analysis to identify how an integrated analysis of tumor and germline data would improve clinical interpretation. RESULTS: One hundred sixty participants with both tumor-only and germline sequencing reports were eligible for this analysis. Germline sequencing identified 38 pathogenic or likely pathogenic variants among 35 (22%) patients. Twenty-five (66%) of these were included in the tumor sequencing report. The remaining germline pathogenic or likely pathogenic variants were single-nucleotide variants filtered out of tumor-only analysis because of population frequency or copy-number variation masked by additional copy-number changes in the tumor. In tumor-only sequencing, 308 of 434 (71%) single-nucleotide variants reported were present in the germline, including 31% with suggested clinical utility. Finally, we provide further evidence that the variant allele fraction from tumor-only sequencing is insufficient to differentiate somatic from germline events. CONCLUSION: A paired approach to analyzing tumor and germline sequencing data would be expected to improve the efficiency and accuracy of distinguishing somatic mutations and germline variants, thereby facilitating the process of variant curation and therapeutic interpretation for somatic reports, as well as the identification of variants associated with germline cancer predisposition.


Asunto(s)
Neoplasias/genética , Secuenciación Completa del Genoma/normas , Adolescente , Adulto , Niño , Preescolar , Femenino , Predisposición Genética a la Enfermedad/genética , Humanos , Lactante , Masculino , Medicina de Precisión/métodos , Medicina de Precisión/normas , Medicina de Precisión/tendencias , Secuenciación Completa del Genoma/métodos , Secuenciación Completa del Genoma/estadística & datos numéricos
4.
Cell Rep ; 37(7): 110017, 2021 11 16.
Artículo en Inglés | MEDLINE | ID: mdl-34788621

RESUMEN

The lack of haplotype reference panels and whole-genome sequencing resources specific to the Chinese population has greatly hindered genetic studies in the world's largest population. Here, we present the NyuWa genome resource, based on deep (26.2×) sequencing of 2,999 Chinese individuals, and construct a NyuWa reference panel of 5,804 haplotypes and 19.3 million variants, which is a high-quality publicly available Chinese population-specific reference panel with thousands of samples. Compared with other panels, the NyuWa reference panel reduces the Han Chinese imputation error rate by a margin ranging from 30% to 51%. Population structure and imputation simulation tests support the applicability of one integrated reference panel for northern and southern Chinese. In addition, a total of 22,504 loss-of-function variants in coding and noncoding genes are identified, including 11,493 novel variants. These results highlight the value of the NyuWa genome resource in facilitating genetic research in Chinese and Asian populations.


Asunto(s)
Pueblo Asiatico/genética , Genoma/genética , Genómica/métodos , Alelos , China , Bases de Datos Genéticas , Frecuencia de los Genes/genética , Genoma Humano/genética , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Polimorfismo de Nucleótido Simple , Estándares de Referencia , Secuenciación Completa del Genoma/normas
6.
Nat Biotechnol ; 39(9): 1141-1150, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34504346

RESUMEN

Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.


Asunto(s)
Benchmarking , Secuenciación del Exoma/normas , Neoplasias/genética , Análisis de Secuencia de ADN/normas , Secuenciación Completa del Genoma/normas , Línea Celular , Línea Celular Tumoral , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Mutación , Neoplasias/patología , Reproducibilidad de los Resultados
7.
Nat Biotechnol ; 39(9): 1151-1160, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34504347

RESUMEN

The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.


Asunto(s)
Benchmarking , Neoplasias de la Mama/genética , Análisis Mutacional de ADN/normas , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Secuenciación Completa del Genoma/normas , Línea Celular Tumoral , Conjuntos de Datos como Asunto , Células Germinativas , Humanos , Mutación , Estándares de Referencia , Reproducibilidad de los Resultados
8.
Viruses ; 13(9)2021 08 30.
Artículo en Inglés | MEDLINE | ID: mdl-34578305

RESUMEN

Despite the effectiveness of direct-acting antiviral agents in treating hepatitis C virus (HCV), cases of treatment failure have been associated with the emergence of resistance-associated substitutions. To better guide clinical decision-making, we developed and validated a near-whole-genome HCV genotype-independent next-generation sequencing strategy. HCV genotype 1-6 samples from direct-acting antiviral agent treatment-naïve and -treated HCV-infected individuals were included. Viral RNA was extracted using a NucliSens easyMAG and amplified using nested reverse transcription-polymerase chain reaction. Libraries were prepared using Nextera XT and sequenced on the Illumina MiSeq sequencing platform. Data were processed by an in-house pipeline (MiCall). Nucleotide consensus sequences were aligned to reference strain sequences for resistance-associated substitution identification and compared to NS3, NS5a, and NS5b sequence data obtained from a validated in-house assay optimized for HCV genotype 1. Sequencing success rates (defined as achieving >100-fold read coverage) approaching 90% were observed for most genotypes in samples with a viral load >5 log10 IU/mL. This genotype-independent sequencing method resulted in >99.8% nucleotide concordance with the genotype 1-optimized method, and 100% agreement in genotype assignment with paired line probe assay-based genotypes. The assay demonstrated high intra-run repeatability and inter-run reproducibility at detecting substitutions above 2% prevalence. This study highlights the performance of a freely available laboratory and bioinformatic approach for reliable HCV genotyping and resistance-associated substitution detection regardless of genotype.


Asunto(s)
Genotipo , Hepacivirus/genética , Hepatitis C/virología , ARN Viral/genética , Secuenciación Completa del Genoma/métodos , Secuenciación Completa del Genoma/normas , Técnicas de Genotipaje , Hepacivirus/clasificación , Hepatitis C/diagnóstico , Humanos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Carga Viral
9.
Viruses ; 13(7)2021 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-34372528

RESUMEN

Next-generation sequencing (NGS) yields powerful opportunities for studying human papillomavirus (HPV) genomics for applications in epidemiology, public health, and clinical diagnostics. HPV genotypes, variants, and point mutations can be investigated in clinical materials and described in previously unprecedented detail. However, both the NGS laboratory analysis and bioinformatical approach require numerous steps and checks to ensure robust interpretation of results. Here, we provide a step-by-step review of recommendations for validation and quality assurance procedures of each step in the typical NGS workflow, with a focus on whole-genome sequencing approaches. The use of directed pilots and protocols to ensure optimization of sequencing data yield, followed by curated bioinformatical procedures, is particularly emphasized. Finally, the storage and sharing of data sets are discussed. The development of international standards for quality assurance should be a goal for the HPV NGS community, similar to what has been developed for other areas of sequencing efforts including microbiology and molecular pathology. We thus propose that it is time for NGS to be included in the global efforts on quality assurance and improvement of HPV-based testing and diagnostics.


Asunto(s)
Genoma Viral , Genómica/normas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Papillomaviridae/genética , Secuenciación Completa del Genoma/normas , Genómica/métodos , Humanos , Control de Calidad , Manejo de Especímenes/métodos , Manejo de Especímenes/normas , Estudios de Validación como Asunto , Flujo de Trabajo
10.
Sci Rep ; 11(1): 17171, 2021 08 25.
Artículo en Inglés | MEDLINE | ID: mdl-34433869

RESUMEN

Advances in whole genome amplification (WGA) techniques enable understanding of the genomic sequence at a single cell level. Demand for single cell dedicated WGA kits (scWGA) has led to the development of several commercial kit. To this point, no robust comparison of all available kits was performed. Here, we benchmark an economical assay, comparing all commercially available scWGA kits. Our comparison is based on targeted sequencing of thousands of genomic loci, including highly mutable regions, from a large cohort of human single cells. Using this approach we have demonstrated the superiority of Ampli1 in genome coverage and of RepliG in reduced error rate. In summary, we show that no single kit is optimal across all categories, highlighting the need for a dedicated kit selection in accordance with experimental requirements.


Asunto(s)
Análisis de la Célula Individual/métodos , Secuenciación Completa del Genoma/métodos , Células Cultivadas , Humanos , Reacción en Cadena de la Polimerasa/métodos , Reacción en Cadena de la Polimerasa/normas , Sensibilidad y Especificidad , Análisis de la Célula Individual/normas , Secuenciación Completa del Genoma/normas
11.
Pathology ; 53(7): 902-911, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34274166

RESUMEN

The adoption of whole genome sequencing (WGS) data over the past decade for pathogen surveillance, and decision-making for infectious diseases has rapidly transformed the landscape of clinical microbiology and public health. However, for successful transition to routine use of these techniques, it is crucial to ensure the WGS data generated meet defined quality standards for pathogen identification, typing, antimicrobial resistance detection and surveillance. Further, the ongoing development of these standards will ensure that the bioinformatic processes are capable of accurately identifying and characterising organisms of interest, and thereby facilitate the integration of WGS into routine clinical and public health laboratory setting. A pilot proficiency testing (PT) program for WGS of infectious agents was developed to facilitate widely applicable standardisation and benchmarking standards for WGS across a range of laboratories. The PT participating laboratories were required to generate WGS data from two bacterial isolates, and submit the raw data for independent bioinformatics analysis, as well as analyse the data with their own processes and answer relevant questions about the data. Overall, laboratories used a diverse range of bioinformatics tools and could generate and analyse high-quality data, either meeting or exceeding the minimum requirements. This pilot has provided valuable insight into the current state of genomics in clinical microbiology and public health laboratories across Australia. It will provide a baseline guide for the standardisation of WGS and enable the development of a PT program that allows an ongoing performance benchmark for accreditation of WGS-based test processes.


Asunto(s)
Bacterias/genética , Benchmarking/normas , Genoma Bacteriano/genética , Laboratorios/normas , Secuenciación Completa del Genoma/normas , Acreditación , Australia/epidemiología , Genómica , Humanos , Laboratorios Clínicos/normas , Ensayos de Aptitud de Laboratorios , Salud Pública
12.
Genes (Basel) ; 12(6)2021 05 27.
Artículo en Inglés | MEDLINE | ID: mdl-34071827

RESUMEN

With limited access to trained clinical geneticists and/or genetic counselors in the majority of healthcare systems globally, and the expanding use of genetic testing in all specialties of medicine, many healthcare providers do not receive the relevant support to order the most appropriate genetic test for their patients. Therefore, it is essential to educate all healthcare providers about the basic concepts of genetic testing and how to properly utilize this testing for each patient. Here, we review the various genetic testing strategies and their utilization based on different clinical scenarios, and test characteristics, such as the types of genetic variation identified by each test, turnaround time, and diagnostic yield for different clinical indications. Additional considerations such as test cost, insurance reimbursement, and interpretation of variants of uncertain significance are also discussed. The goal of this review is to aid healthcare providers in utilizing the most appropriate, fastest, and most cost-effective genetic test for their patients, thereby increasing the likelihood of a timely diagnosis and reducing the financial burden on the healthcare system by eliminating unnecessary and redundant testing.


Asunto(s)
Pruebas Genéticas/métodos , Pediatría/métodos , Guías de Práctica Clínica como Asunto , Secuenciación Completa del Genoma/métodos , Pruebas Genéticas/normas , Humanos , Pediatría/normas , Medicina de Precisión/métodos , Medicina de Precisión/normas , Secuenciación Completa del Genoma/normas
14.
Genes (Basel) ; 12(5)2021 04 26.
Artículo en Inglés | MEDLINE | ID: mdl-33926025

RESUMEN

Sequencing of whole microbial genomes has become a standard procedure for cluster detection, source tracking, outbreak investigation and surveillance of many microorganisms. An increasing number of laboratories are currently in a transition phase from classical methods towards next generation sequencing, generating unprecedented amounts of data. Since the precision of downstream analyses depends significantly on the quality of raw data generated on the sequencing instrument, a comprehensive, meaningful primary quality control is indispensable. Here, we present AQUAMIS, a Snakemake workflow for an extensive quality control and assembly of raw Illumina sequencing data, allowing laboratories to automatize the initial analysis of their microbial whole-genome sequencing data. AQUAMIS performs all steps of primary sequence analysis, consisting of read trimming, read quality control (QC), taxonomic classification, de-novo assembly, reference identification, assembly QC and contamination detection, both on the read and assembly level. The results are visualized in an interactive HTML report including species-specific QC thresholds, allowing non-bioinformaticians to assess the quality of sequencing experiments at a glance. All results are also available as a standard-compliant JSON file, facilitating easy downstream analyses and data exchange. We have applied AQUAMIS to analyze ~13,000 microbial isolates as well as ~1000 in-silico contaminated datasets, proving the workflow's ability to perform in high throughput routine sequencing environments and reliably predict contaminations. We found that intergenus and intragenus contaminations can be detected most accurately using a combination of different QC metrics available within AQUAMIS.


Asunto(s)
Genoma Bacteriano , Control de Calidad , Secuenciación Completa del Genoma/métodos , Mapeo Contig/métodos , Mapeo Contig/normas , Contaminación de ADN , Escherichia coli , Listeria monocytogenes , Salmonella enterica , Sensibilidad y Especificidad , Programas Informáticos , Especificidad de la Especie , Secuenciación Completa del Genoma/normas , Flujo de Trabajo
15.
Genes Genomics ; 43(7): 713-724, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-33864614

RESUMEN

BACKGROUND: Illumina next generation sequencing (NGS) systems are the major sequencing platform in worldwide next-generation sequencing market. On the other hand, MGI Tech launched a series of new NGS equipment that promises to deliver high-quality sequencing data faster and at lower prices than Illumina's sequencing instruments. OBJECTIVE: In this study, we compared the performance of the two platform's major sequencing instruments-Illumina's NovaSeq 6000 and MGI's MGISEQ-2000 and DNBSEQ-T7-to test whether the MGISEQ-2000 and DNBSEQ-T7 sequencing instruments are also suitable for whole genome sequencing. METHODS: We sequenced two pairs of normal and tumor tissues from Korean lung cancer patients using the three platforms. Then, we called single nucleotide variants (SNVs) and insertion and deletion (indels) for somatic and germline variants to compare the performance among the three platforms. RESULTS: In quality control analysis, all of the three platforms showed high-quality scores and deep coverages. Comparison among the three platforms revealed that MGISEQ-2000 is most concordant with NovaSeq 6000 for germline SNVs and indels, and DNBSEQ-T7 is most concordant with NovaSeq 6000 for somatic SNVs and indels. CONCLUSIONS: These results suggest that the performances of the MGISEQ-2000 and DNBSEQ-T7 platforms are comparable to that of the Illumina NovaSeq 6000 platform and support the potential applicability of the MGISEQ-2000 and DNBSEQ-T7 platforms in actual genome analysis fields.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Secuenciación Completa del Genoma/métodos , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Neoplasias Pulmonares/genética , Valores de Referencia , Secuenciación Completa del Genoma/normas
16.
Clin Microbiol Infect ; 27(7): 1036.e1-1036.e8, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-33813118

RESUMEN

OBJECTIVES: Genotyping of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been instrumental in monitoring viral evolution and transmission during the pandemic. The quality of the sequence data obtained from these genotyping efforts depends on several factors, including the quantity/integrity of the input material, the technology, and laboratory-specific implementation. The current lack of guidelines for SARS-CoV-2 genotyping leads to inclusion of error-containing genome sequences in genomic epidemiology studies. We aimed to establish clear and broadly applicable recommendations for reliable virus genotyping. METHODS: We established and used a sequencing data analysis workflow that reliably identifies and removes technical artefacts; such artefacts can result in miscalls when using alternative pipelines to process clinical samples and synthetic viral genomes with an amplicon-based genotyping approach. We evaluated the impact of experimental factors, including viral load and sequencing depth, on correct sequence determination. RESULTS: We found that at least 1000 viral genomes are necessary to confidently detect variants in the SARS-CoV-2 genome at frequencies of ≥10%. The broad applicability of our recommendations was validated in over 200 clinical samples from six independent laboratories. The genotypes we determined for clinical isolates with sufficient quality cluster by sampling location and period. Our analysis also supports the rise in frequencies of 20A.EU1 and 20A.EU2, two recently reported European strains whose dissemination was facilitated by travel during the summer of 2020. CONCLUSIONS: We present much-needed recommendations for the reliable determination of SARS-CoV-2 genome sequences and demonstrate their broad applicability in a large cohort of clinical samples.


Asunto(s)
COVID-19/diagnóstico , Técnicas de Genotipaje/normas , Secuenciación de Nucleótidos de Alto Rendimiento/normas , SARS-CoV-2/genética , Secuenciación Completa del Genoma/normas , Artefactos , COVID-19/virología , Genoma Viral , Técnicas de Genotipaje/métodos , Guías como Asunto , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , ARN Viral , Reproducibilidad de los Resultados , SARS-CoV-2/aislamiento & purificación , Sensibilidad y Especificidad , Secuenciación Completa del Genoma/métodos , Flujo de Trabajo
17.
Mol Genet Genomic Med ; 9(4): e1653, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33687149

RESUMEN

BACKGROUND: Sufficient fetal fraction (FF) is crucial for quality control of NIPT (Non-Invasive Prenatal Test) results. Different factors influencing bioinformatic estimation of FF should be considered when implementing NIPT. To what extent the total number of sequencing reads influences FF estimate has been unexplored. In this study, to test the robustness of SeqFF FF estimation and provide additional recommendations for NIPT analysis quality control, we compared the SeqFF FF estimates with two other methods and investigated how the number of sequencing reads and FF level affects the accuracy and precision of FF estimates. METHODS: WGS data of 516 NIPT samples from a prenatal screening program was obtained. Sample data were randomly downsampled by the read count, and FF was calculated by SeqFF software. Then, the outcome was compared with FF estimates from SNP- and chrY-based methods. FF estimated with different read counts and FF levels were compared with FF at 30 M reads as a reference. RESULTS: SeqFF FF highly correlates with SNP- and chrY-based FF estimates. Raising read count from 2 M to 10 M drastically increased the accuracy of FF estimates. After adding more reads, we saw a further improvement in FF accuracy, reaching a plateau at 20 M reads. Precision of SeqFF FF estimate is independent of FF level in the sample. CONCLUSION: SeqFF is a robust method for FF estimation for both genders and for any FF level in range 2-13%. Accuracy of FF estimates highly depends on the read count. We recommend using no less than 10 M reads to achieve accurate FF estimates for NIPT analysis in clinical settings.


Asunto(s)
Pruebas Prenatales no Invasivas/métodos , Secuenciación Completa del Genoma/métodos , Ácidos Nucleicos Libres de Células/genética , Cromosomas Humanos Y/genética , Exactitud de los Datos , Femenino , Humanos , Pruebas Prenatales no Invasivas/normas , Polimorfismo de Nucleótido Simple , Embarazo , Reproducibilidad de los Resultados , Secuenciación Completa del Genoma/normas
18.
Am J Hum Genet ; 108(4): 656-668, 2021 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-33770507

RESUMEN

Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.


Asunto(s)
Análisis Mutacional de ADN/economía , Análisis Mutacional de ADN/normas , Variación Genética/genética , Genética de Población/economía , África , Análisis Mutacional de ADN/métodos , Genética de Población/métodos , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Equidad en Salud , Humanos , Microbiota , Secuenciación Completa del Genoma/economía , Secuenciación Completa del Genoma/normas
19.
Am J Hum Genet ; 108(5): 919-928, 2021 05 06.
Artículo en Inglés | MEDLINE | ID: mdl-33789087

RESUMEN

Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology. Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly. Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 9.7% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 91.4% of deletions that were specifically discovered by lrWGS localized to these regions. Across the remaining 90.3% of reference sequence, we observed extremely high (93.8%) concordance between technologies for deletions in these datasets. In contrast, lrWGS was superior for detection of insertions across all genomic contexts. Given that non-SD/SR sequences encompass 95.9% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental. However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment.


Asunto(s)
Genoma Humano/genética , Variación Estructural del Genoma , Genómica/métodos , Objetivos , Secuenciación Completa del Genoma/métodos , Secuenciación Completa del Genoma/normas , Variaciones en el Número de Copia de ADN , Exones/genética , Humanos , Proyectos de Investigación , Duplicaciones Segmentarias en el Genoma , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...