RESUMO
DNA variants that arise after conception can show mosaicism, varying in presence and extent among tissues. Mosaic variants have been reported in Mendelian diseases, but further investigation is necessary to broadly understand their incidence, transmission, and clinical impact. A mosaic pathogenic variant in a disease-related gene may cause an atypical phenotype in terms of severity, clinical features, or timing of disease onset. Using high-depth sequencing, we studied results from one million unrelated individuals referred for genetic testing for almost 1,900 disease-related genes. We observed 5,939 mosaic sequence or intragenic copy number variants distributed across 509 genes in nearly 5,700 individuals, constituting approximately 2% of molecular diagnoses in the cohort. Cancer-related genes had the most mosaic variants and showed age-specific enrichment, in part reflecting clonal hematopoiesis in older individuals. We also observed many mosaic variants in genes related to early-onset conditions. Additional mosaic variants were observed in genes analyzed for reproductive carrier screening or associated with dominant disorders with low penetrance, posing challenges for interpreting their clinical significance. When we controlled for the potential involvement of clonal hematopoiesis, most mosaic variants were enriched in younger individuals and were present at higher levels than in older individuals. Furthermore, individuals with mosaicism showed later disease onset or milder phenotypes than individuals with non-mosaic variants in the same genes. Collectively, the large compendium of variants, disease correlations, and age-specific results identified in this study expand our understanding of the implications of mosaic DNA variation for diagnosis and genetic counseling.
Assuntos
Variações do Número de Cópias de DNA , Mosaicismo , Variações do Número de Cópias de DNA/genética , Testes Genéticos , Fenótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , MutaçãoRESUMO
BACKGROUND: Sex determination occurs across animal species, but most of our knowledge about its mechanisms comes from only a handful of bilaterian taxa. This limits our ability to infer the evolutionary history of sex determination within animals. RESULTS: In this study, we generated a linkage map of the genome of the colonial cnidarian Hydractinia symbiolongicarpus and used it to demonstrate that this species has an XX/XY sex determination system. We demonstrate that the X and Y chromosomes have pseudoautosomal and non-recombining regions. We then use the linkage map and a method based on the depth of sequencing coverage to identify genes encoded in the non-recombining region and show that many of them have male gonad-specific expression. In addition, we demonstrate that recombination rates are enhanced in the female genome and that the haploid chromosome number in Hydractinia is n = 15. CONCLUSIONS: These findings establish Hydractinia as a tractable non-bilaterian model system for the study of sex determination and the evolution of sex chromosomes.
Assuntos
Hidrozoários , Cromossomos Sexuais , Masculino , Feminino , Animais , Cromossomos Sexuais/genética , Mapeamento Cromossômico , Cromossomo Y/genética , Hidrozoários/genética , Evolução MolecularRESUMO
PURPOSE: To evaluate the coverage and accuracy of whole-exome sequencing (WES) across vendors. METHODS: Blood samples from three trios underwent WES at three vendors. Relative performance of the three WES services was measured for breadth and depth of coverage. The false-negative rates (FNRs) were estimated using the segregation pattern within each trio. RESULTS: Mean depth of coverage for all genes was 189.0, 124.9, and 38.3 for the three vendor services. Fifty-five of the American College of Medical Genetics and Genomics 56 genes, but only 56 of 63 pharmacogenes, were 100% covered at 10 × in at least one of the nine individuals for all vendors; however, there was substantial interindividual variability. For the two vendors with mean depth of coverage >120 ×, analytic positive predictive values (aPPVs) exceeded 99.1% for single-nucleotide variants and homozygous indels, and sensitivities were 98.9-99.9%; however, heterozygous indels showed lower accuracy and sensitivity. Among the trios, FNRs in the offspring were 0.07-0.62% at well-covered variants concordantly called in both parents. CONCLUSION: The current standard of 120 × coverage for clinical WES may be insufficient for consistent breadth of coverage across the exome. Ordering clinicians and researchers would benefit from vendors' reports that estimate sensitivity and aPPV, including depth of coverage across the exome.
Assuntos
Sequenciamento do Exoma/métodos , Exoma/genética , Genoma Humano/genética , Feminino , Genômica , Heterozigoto , Homozigoto , Humanos , Mutação INDEL/genética , Masculino , Anotação de Sequência MolecularRESUMO
Copy number variants (CNVs) play important roles in a number of human diseases and in pharmacogenetics. Powerful methods exist for CNV detection in whole genome sequencing (WGS) data, but such data are costly to obtain. Many disease causal CNVs span or are found in genome coding regions (exons), which makes CNV detection using whole exome sequencing (WES) data attractive. If reliably validated against WGS-based CNVs, exome-derived CNVs have potential applications in a clinical setting. Several algorithms have been developed to exploit exome data for CNV detection and comparisons made to find the most suitable methods for particular data samples. The results are not consistent across studies. Here, we review some of the exome CNV detection methods based on depth of coverage profiles and examine their performance to identify problems contributing to discrepancies in published results. We also present a streamlined strategy that uses a single metric, the likelihood ratio, to compare exome methods, and we demonstrated its utility using the VarScan 2 and eXome Hidden Markov Model (XHMM) programs using paired normal and tumour exome data from chronic lymphocytic leukaemia patients. We use array-based somatic CNV (SCNV) calls as a reference standard to compute prevalence-independent statistics, such as sensitivity, specificity and likelihood ratio, for validation of the exome-derived SCNVs. We also account for factors known to influence the performance of exome read depth methods, such as CNV size and frequency, while comparing our findings with published results.
Assuntos
Mapeamento Cromossômico/métodos , Variações do Número de Cópias de DNA/genética , DNA de Neoplasias/genética , Exoma/genética , Leucemia Linfocítica Crônica de Células B/genética , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Interpretação Estatística de Dados , Humanos , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
Self-compatibility is a highly desirable trait for pear breeding programs. Our breeding program previously developed a novel self-compatible pollen-part Japanese pear mutant (Pyrus pyrifolia Nakai), '415-1', by using γ-irradiated pollen. '415-1' carries the S-genotype S4dS5S5, with "d" indicating a duplication of S 5 responsible for breakdown of self-incompatibility. Until now, the size and inheritance of the duplicated segment was undetermined, and a reliable detection method was lacking. Here, we examined genome duplications and their inheritance in 140 F1 seedlings resulting from a cross between '515-20' (S1S3) and '415-1'. Amplicon sequencing of S-RNase and SFBB18 clearly detected S-haplotype duplications in the seedlings. Intriguingly, 30 partially triploid seedlings including genotypes S1S4dS5, S3S4dS5, S1S5dS5, S3S5dS5, and S3S4dS4 were detected among the 140 seedlings. Depth-of-coverage analysis using ddRAD-seq showed that the duplications in those individuals were limited to chromosome 17. Further analysis through resequencing confirmed an 11-Mb chromosome duplication spanning the middle to the end of chromosome 17. The duplicated segment remained consistent in size across generations. The presence of an S3S4dS4 seedling provided evidence for recombination between the duplicated S5 segment and the original S4haplotype, suggesting that the duplicated segment can pair with other parts of chromosome 17. This research provides valuable insights for improving pear breeding programs using partially triploid individuals.
RESUMO
Knowledge of the expected accuracy of HLA typing algorithms is important when choosing between algorithms and when evaluating the HLA typing predictions of an algorithm. This chapter guides the reader through an example benchmarking study that evaluates the performances of four NGS-based HLA typing algorithms as well as outlining factors to consider, when designing and running such a benchmarking study. The code related to this benchmarking workflow can be found at https://github.com/nikolasthuesen/springers-hla-benchmark/ .
Assuntos
Algoritmos , Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Teste de Histocompatibilidade , Teste de Histocompatibilidade/métodos , Teste de Histocompatibilidade/normas , Benchmarking/métodos , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Software , Antígenos HLA/genéticaRESUMO
There are many copy number variation (CNV) detection tools based on the depth of coverage. A characteristic feature of all tools based on the depth of coverage is the first stage of data processing-counting the depth of coverage in the investigated sequencing regions. However, each tool implements this stage in a slightly different way. Herein, we used data from the 1000 Genomes Project to present the impact of another depth of coverage counting strategies on the results of the CNVs detection process. In the study, we used 7 CNV calling tools: CODEX, CANOES, exomeCopy, ExomeDepth, CLAMMS, CNVkit, and CNVind; from each of these applications, we separated the process of counting the depth of coverage into independent modules. Then, we counted the depth of coverage by mentioned modules, and finally, the obtained depth of coverage tables were used as the input data set to other CNV calling tools. The performed experiments showed that the best methods of counting the depth of coverage are the algorithms implemented in the CLAMMS and CNVkit applications. Both ways allow obtaining much better sets of detected CNVs compared to counting the depth of coverage implemented in other tools. What is more, some CNV detection tools are reasonably resistant to changing the input depth of coverage table. In this study, we proved that the exomeCopy application gives an approximately similar set of the resulting rare CNVs, regardless of the method of counting the depth of coverage table.
RESUMO
Identifying the specific human leukocyte antigen (HLA) allele combination of an individual is crucial in organ donation, risk assessment of autoimmune and infectious diseases and cancer immunotherapy. However, due to the high genetic polymorphism in this region, HLA typing requires specialized methods. We investigated the performance of five next-generation sequencing (NGS) based HLA typing tools with a non-restricted license namely HLA*LA, Optitype, HISAT-genotype, Kourami and STC-Seq. This evaluation was done for the five HLA loci, HLA-A, -B, -C, -DRB1 and -DQB1 using whole-exome sequencing (WES) samples from 829 individuals. The robustness of the tools to lower depth of coverage (DOC) was evaluated by subsampling and HLA typing 230 WES samples at DOC ranging from 1X to 100X. The HLA typing accuracy was measured across four typing resolutions. Among these, we present two clinically-relevant typing resolutions (P group and pseudo-sequence), which specifically focus on the peptide binding region. On average, across the five HLA loci examined, HLA*LA was found to have the highest typing accuracy. For the individual loci, HLA-A, -B and -C, Optitype's typing accuracy was the highest and HLA*LA had the highest typing accuracy for HLA-DRB1 and -DQB1. The tools' robustness to lower DOC data varied widely and further depended on the specific HLA locus. For all Class I loci, Optitype had a typing accuracy above 95% (according to the modification of the amino acids in the functionally relevant portion of the HLA molecule) at 50X, but increasing the DOC beyond even 100X could still improve the typing accuracy of HISAT-genotype, Kourami, and STC-seq across all five HLA loci as well as HLA*LA's typing accuracy for HLA-DQB1. HLA typing is also used in studies of ancient DNA (aDNA), which is often based on sequencing data with lower quality and DOC. Interestingly, we found that Optitype's typing accuracy is not notably impaired by short read length or by DNA damage, which is typical of aDNA, as long as the DOC is sufficiently high.
Assuntos
Antígenos HLA-A , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA/métodos , Teste de Histocompatibilidade/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Antígenos HLA-A/genética , AlgoritmosRESUMO
Whole genome sequences (WGS) greatly increase our ability to precisely infer population genetic parameters, demographic processes, and selection signatures. However, WGS may still be not affordable for a representative number of individuals/populations. In this context, our goal was to assess the efficiency of several SNP genotyping strategies by testing their ability to accurately estimate parameters describing neutral diversity and to detect signatures of selection. We analysed 110 WGS at 12× coverage for four different species, i.e., sheep, goats and their wild counterparts. From these data we generated 946 data sets corresponding to random panels of 1K to 5M variants, commercial SNP chips and exome capture, for sample sizes of five to 48 individuals. We also extracted low-coverage genome resequencing of 1×, 2× and 5× by randomly subsampling reads from the 12× resequencing data. Globally, 5K to 10K random variants were enough for an accurate estimation of genome diversity. Conversely, commercial panels and exome capture displayed strong ascertainment biases. Besides the characterization of neutral diversity, the detection of the signature of selection and the accurate estimation of linkage disequilibrium (LD) required high-density panels of at least 1M variants. Finally, genotype likelihoods increased the quality of variant calling from low coverage resequencing but proportions of incorrect genotypes remained substantial, especially for heterozygote sites. Whole genome resequencing coverage of at least 5× appeared to be necessary for accurate assessment of genomic variations. These results have implications for studies seeking to deploy low-density SNP collections or genome scans across genetically diverse populations/species showing similar genetic characteristics and patterns of LD decay for a wide variety of purposes.
Assuntos
Genoma/genética , Polimorfismo de Nucleotídeo Único/genética , Animais , Exoma/genética , Frequência do Gene/genética , Genética Populacional/métodos , Genômica/métodos , Genótipo , Técnicas de Genotipagem/métodos , Cabras/genética , Heterozigoto , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Desequilíbrio de Ligação/genética , Análise de Sequência de DNA/métodos , Ovinos/genética , Sequenciamento Completo do Genoma/métodosRESUMO
BACKGROUND: Depth of coverage calculation is an important and computationally intensive preprocessing step in a variety of next-generation sequencing pipelines, including the analysis of RNA-sequencing data, detection of copy number variants, or quality control procedures. RESULTS: Building upon big data technologies, we have developed SeQuiLa-cov, an extension to the recently released SeQuiLa platform, which provides efficient depth of coverage calculations, reaching >100× speedup over the state-of-the-art tools. The performance and scalability of our solution allow for exome and genome-wide calculations running locally or on a cluster while hiding the complexity of the distributed computing with Structured Query Language Application Programming Interface. CONCLUSIONS: SeQuiLa-cov provides significant performance gain in depth of coverage calculations streamlining the widely used bioinformatic processing pipelines.
Assuntos
Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Genômica/métodos , Software , Algoritmos , Biologia Computacional/normas , Genômica/normas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Controle de Qualidade , Análise de Sequência de DNARESUMO
Next generation sequencing (NGS) is routinely used in clinical genetic testing. Quality management of NGS testing is essential to ensure performance is consistently and rigorously evaluated. Three primary metrics are used in NGS quality evaluation: depth of coverage, base quality and mapping quality. To provide consistency and transparency in the utilisation of these metrics we present the Quality Sequencing Minimum (QSM). The QSM defines the minimum quality requirement a laboratory has selected for depth of coverage (C), base quality (B) and mapping quality (M) and can be applied per base, exon, gene or other genomic region, as appropriate. The QSM format is CX_BY(P Y)_MZ(P Z). X is the parameter threshold for C, Y the parameter threshold for B, P Y the percentage of reads that must reach Y, Z the parameter threshold for M, P Z the percentage of reads that must reach Z. The data underlying the QSM is in the BAM file, so a QSM can be easily and automatically calculated in any NGS pipeline. We used the QSM to optimise cancer predisposition gene testing using the TruSight Cancer Panel (TSCP). We set the QSM as C50_B10(85)_M20(95). Test regions falling below the QSM were automatically flagged for review, with 100/1471 test regions QSM-flagged in multiple individuals. Supplementing these regions with 132 additional probes improved performance in 85/100. We also used the QSM to optimise testing of genes with pseudogenes such as PTEN and PMS2. In TSCP data from 960 individuals the median number of regions that passed QSM per sample was 1429 (97%). Importantly, the QSM can be used at an individual report level to provide succinct, comprehensive quality assurance information about individual test performance. We believe many laboratories would find the QSM useful. Furthermore, widespread adoption of the QSM would facilitate consistent, transparent reporting of genetic test performance by different laboratories.
RESUMO
Quality assurance and quality control are essential for robust next generation sequencing (NGS). Here we present CoverView, a fast, flexible, user-friendly quality evaluation tool for NGS data. CoverView processes mapped sequencing reads and user-specified regions to report depth of coverage, base and mapping quality metrics with increasing levels of detail from a chromosome-level summary to per-base profiles. CoverView can flag regions that do not fulfil user-specified quality requirements, allowing suboptimal data to be systematically and automatically presented for review. It also provides an interactive graphical user interface (GUI) that can be opened in a web browser and allows intuitive exploration of results. We have integrated CoverView into our accredited clinical cancer predisposition gene testing laboratory that uses the TruSight Cancer Panel (TSCP). CoverView has been invaluable for optimisation and quality control of our testing pipeline, providing transparent, consistent quality metric information and automatic flagging of regions that fall below quality thresholds. We demonstrate this utility with TSCP data from the Genome in a Bottle reference sample, which CoverView analysed in 13 seconds. CoverView uses data routinely generated by NGS pipelines, reads standard input formats, and rapidly creates easy-to-parse output text (.txt) files that are customised by a simple configuration file. CoverView can therefore be easily integrated into any NGS pipeline. CoverView and detailed documentation for its use are freely available at github.com/RahmanTeamDevelopment/CoverView/releases and www.icr.ac.uk/CoverView.
RESUMO
The RADseq technology allows researchers to efficiently develop thousands of polymorphic loci across multiple individuals with little or no prior information on the genome. However, many questions remain about the biases inherent to this technology. Notably, sequence misalignments arising from paralogy may affect the development of single nucleotide polymorphism (SNP) markers and the estimation of genetic diversity. We evaluated the impact of putative paralog loci on genetic diversity estimation during the development of SNPs from a RADseq dataset for the nonmodel tree species Robinia pseudoacacia L. We sequenced nine genotypes and analyzed the frequency of putative paralogous RAD loci as a function of both the depth of coverage and the mismatch threshold allowed between loci. Putative paralogy was detected in a very variable number of loci, from 1% to more than 20%, with the depth of coverage having a major influence on the result. Putative paralogy artificially increased the observed degree of polymorphism and resulting estimates of diversity. The choice of the depth of coverage also affected diversity estimation and SNP validation: A low threshold decreased the chances of detecting minor alleles while a high threshold increased allelic dropout. SNP validation was better for the low threshold (4×) than for the high threshold (18×) we tested. Using the strategy developed here, we were able to validate more than 80% of the SNPs tested by means of individual genotyping, resulting in a readily usable set of 330 SNPs, suitable for use in population genetics applications.
RESUMO
Noninvasive prenatal testing (NIPT) for fetal aneuploidies using cell-free fetal DNA in maternal plasma has revolutionized the field of prenatal care and methods using massively parallel sequencing are now being implemented almost worldwide. Substantial progress has been made from initially testing for (an)euploidies of chromosomes 13, 18 and 21, to testing for sex chromosome (an)euploidies, additional autosomal aneuploidies as well as partial deletions and duplications genome-wide. Although NIPT is associated with significantly reduced risks for the fetus in comparison to existing invasive prenatal diagnostic methods, it presents several implementation challenges. Here, we review key issues potentially influencing NIPT and illustrate them using both data from literature and in-house data.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/normas , Diagnóstico Pré-Natal/normas , Aberrações Cromossômicas , Mapeamento Cromossômico , Análise Mutacional de DNA , Feminino , Testes Genéticos , Humanos , Técnicas de Diagnóstico Molecular , Polimorfismo de Nucleotídeo Único , Gravidez , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
Glycomics provides an increasingly useful research tool as the genomes and proteomes of more and more animal species are elucidated. In view of the general complexity and heterogeneity of glycans, improved depth-of-coverage and sensitivity are required for glycosylation analysis. In this study, we established the lectin-based isolation/enrichment strategy for total glycomic information. Specific lectins are added onto the filter to capture corresponding glycans prior to release of N-glycans by peptide N-glycosidase F (PNGase F). Non-bound glycans and bound glycans are released and analyzed by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS), respectively. Application of the strategy to chicken ovalbumin, normal mouse mammary epithelial cells (NMuMG), and human serum resulted in detection of 5, 6, and 11 additional N-glycan structures, respectively. The strategy facilitates identification of intact N-glycans in biological samples, and can be extended to detailed analysis of O-glycome or glycoproteome.
Assuntos
Fracionamento Químico/métodos , Glicômica/métodos , Lectinas/química , Polissacarídeos/química , Polissacarídeos/isolamento & purificação , Animais , Linhagem Celular , Camundongos , Polissacarídeos/análiseRESUMO
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150-1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1-100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.