Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
Nucleic Acids Res ; 50(17): e97, 2022 09 23.
Article in English | MEDLINE | ID: mdl-35713566

ABSTRACT

De novo mutations (DNMs) are an important cause of genetic disorders. The accurate identification of DNMs from sequencing data is therefore fundamental to rare disease research and diagnostics. Unfortunately, identifying reliable DNMs remains a major challenge due to sequence errors, uneven coverage, and mapping artifacts. Here, we developed a deep convolutional neural network (CNN) DNM caller (DeNovoCNN), that encodes the alignment of sequence reads for a trio as 160$ \times$164 resolution images. DeNovoCNN was trained on DNMs of 5616 whole exome sequencing (WES) trios achieving total 96.74% recall and 96.55% precision on the test dataset. We find that DeNovoCNN has increased recall/sensitivity and precision compared to existing DNM calling approaches (GATK, DeNovoGear, DeepTrio, Samtools) based on the Genome in a Bottle reference dataset and independent WES and WGS trios. Validations of DNMs based on Sanger and PacBio HiFi sequencing confirm that DeNovoCNN outperforms existing methods. Most importantly, our results suggest that DeNovoCNN is likely robust against different exome sequencing and analyses approaches, thereby allowing the application on other datasets. DeNovoCNN is freely available as a Docker container and can be run on existing alignment (BAM/CRAM) and variant calling (VCF) files from WES and WGS without a need for variant recalling.


Subject(s)
Deep Learning , High-Throughput Nucleotide Sequencing , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA , Exome Sequencing/methods
2.
Genet Med ; 23(8): 1569-1573, 2021 08.
Article in English | MEDLINE | ID: mdl-33846582

ABSTRACT

PURPOSE: Expansions of a subset of short tandem repeats (STRs) have been implicated in approximately 30 different human genetic disorders. Despite extensive application of exome sequencing (ES) in routine diagnostic genetic testing, STRs are not routinely identified from these data. METHODS: We assessed diagnostic utility of STR analysis in exome sequencing by applying ExpansionHunter to 2,867 exomes from movement disorder patients and 35,228 other clinical exomes. RESULTS: We identified 38 movement disorder patients with a possible aberrant STR length. Validation by polymerase chain reaction (PCR) and/or repeat-primed PCR technologies confirmed the presence of aberrant expansion alleles for 13 (34%). For seven of these patients the genotype was compatible with the phenotypic description, resulting in a molecular diagnosis. We subsequently tested the remainder of our diagnostic ES cohort, including over 30 clinically and genetically heterogeneous disorders. Optimized manual curation yielded 167 samples with a likely aberrant STR length. Validations confirmed 93/167 (56%) aberrant expansion alleles, of which 48 were in the pathogenic range and 45 in the premutation range. CONCLUSION: Our work provides guidance for the implementation of STR analysis in clinical ES. Our results show that systematic STR evaluation may increase diagnostic ES yield by 0.2%, and recommend making STR evaluation a routine part of ES interpretation in genetic testing laboratories.


Subject(s)
Exome , Microsatellite Repeats , Alleles , Exome/genetics , Genotype , Humans , Microsatellite Repeats/genetics , Polymerase Chain Reaction
3.
Heliyon ; 10(1): e23611, 2024 Jan 15.
Article in English | MEDLINE | ID: mdl-38173518

ABSTRACT

Background: Machine learning is becoming a common tool in monitoring emotion. However, methodological studies of the processing pipeline are scarce, especially ones using subjective appraisals as ground truth. New method: A novel protocol was used to induce cognitive load and physical discomfort, and emotional dimensions (arousal, valence, and dominance) were reported after each task. The performance of five common ML models with a versatile set of features (physiological features, task performance data, and personality trait) was compared in binary classification of subjectively assessed emotions. Results: The psychophysiological responses proved the protocol was successful in changing the mental state from baseline, also the cognitive and physical tasks were different. The optimization and performance of ML models used for emotion detection were evaluated. Additionally, methods to account for imbalanced classes were applied and shown to improve the classification performance. Comparison with existing methods: Classification of human emotional states often assumes the states are determined by the stimuli. However, individual appraisals vary. None of the past studies have classified subjective emotional dimensions with a set of features including biosignals, personality and behavior. Conclusion: Our data represent a typical setup in affective computing utilizing psychophysiological monitoring: N is low compared to number of features, inter-individual variability is high, and class imbalance cannot be avoided. Our observations are a) if possible, include features representing physiology, behavior and personality, b) use simple models and limited number of features to improve interpretability, c) address the possible imbalance, d) if the data size allows, use nested cross-validation.

4.
Curr Protoc ; 4(7): e1094, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38966883

ABSTRACT

Short tandem repeat (STR) expansions are associated with more than 60 genetic disorders. The size and stability of these expansions correlate with the severity and age of onset of the disease. Therefore, being able to accurately detect the absolute length of STRs is important. Current diagnostic assays include laborious lab experiments, including repeat-primed PCR and Southern blotting, that still cannot precisely determine the exact length of very long repeat expansions. Optical genome mapping (OGM) is a cost-effective and easy-to-use alternative to traditional cytogenetic techniques and allows the comprehensive detection of chromosomal aberrations and structural variants >500 bp in length, including insertions, deletions, duplications, inversions, translocations, and copy number variants. Here, we provide methodological guidance for preparing samples and performing OGM as well as running the analysis pipelines and using the specific repeat expansion workflows to determine the exact repeat length of repeat expansions expanded beyond 500 bp. Together these protocols provide all details needed to analyze the length and stability of any repeat expansion with an expected repeat size difference from the expected wild-type allele of >500 bp. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Genomic ultra-high-molecular-weight DNA isolation, labeling, and staining Basic Protocol 2: Data generation and genome mapping using the Bionano Saphyr® System Basic Protocol 3: Manual De Novo Assembly workflow Basic Protocol 4: Local guided assembly workflow Basic Protocol 5: EnFocus Fragile X workflow Basic Protocol 6: Molecule distance script workflow.


Subject(s)
Chromosome Mapping , Humans , Chromosome Mapping/methods , DNA Repeat Expansion/genetics , Microsatellite Repeats/genetics , DNA/genetics
5.
Genome Med ; 15(1): 34, 2023 05 08.
Article in English | MEDLINE | ID: mdl-37158973

ABSTRACT

BACKGROUND: Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation. Here we evaluate the ability of HiFi reads to detect de novo mutations (DNMs) of all types, which are technically challenging variant types and a major cause of sporadic, severe, early-onset disease. METHODS: We sequenced the genomes of eight parent-child trios using high coverage PacBio HiFi LRS (~ 30-fold coverage) and Illumina short-read sequencing (SRS) (~ 50-fold coverage). De novo substitutions, small indels, short tandem repeats (STRs) and SVs were called in both datasets and compared to each other to assess the accuracy of HiFi LRS. In addition, we determined the parent-of-origin of the small DNMs using phasing. RESULTS: We identified a total of 672 and 859 de novo substitutions/indels, 28 and 126 de novo STRs, and 24 and 1 de novo SVs in LRS and SRS respectively. For the small variants, there was a 92 and 85% concordance between the platforms. For the STRs and SVs, the concordance was 3.6 and 0.8%, and 4 and 100% respectively. We successfully validated 27/54 LRS-unique small variants, of which 11 (41%) were confirmed as true de novo events. For the SRS-unique small variants, we validated 42/133 DNMs and 8 (19%) were confirmed as true de novo event. Validation of 18 LRS-unique de novo STR calls confirmed none of the repeat expansions as true DNM. Confirmation of the 23 LRS-unique SVs was possible for 19 candidate SVs of which 10 (52.6%) were true de novo events. Furthermore, we were able to assign 96% of DNMs to their parental allele with LRS data, as opposed to just 20% with SRS data. CONCLUSIONS: HiFi LRS can now produce the most comprehensive variant dataset obtainable by a single technology in a single laboratory, allowing accurate calling of substitutions, indels, STRs and SVs. The accuracy even allows sensitive calling of DNMs on all variant levels, and also allows for phasing, which helps to distinguish true positive from false positive DNMs.


Subject(s)
High-Throughput Nucleotide Sequencing , INDEL Mutation , Humans , Alleles , Microsatellite Repeats
6.
Eur J Hum Genet ; 31(1): 81-88, 2023 01.
Article in English | MEDLINE | ID: mdl-36114283

ABSTRACT

Genome sequencing (GS) can identify novel diagnoses for patients who remain undiagnosed after routine diagnostic procedures. We tested whether GS is a better first-tier genetic diagnostic test than current standard of care (SOC) by assessing the technical and clinical validity of GS for patients with neurodevelopmental disorders (NDD). We performed both GS and exome sequencing in 150 consecutive NDD patient-parent trios. The primary outcome was diagnostic yield, calculated from disease-causing variants affecting exonic sequence of known NDD genes. GS (30%, n = 45) and SOC (28.7%, n = 43) had similar diagnostic yield. All 43 conclusive diagnoses obtained with SOC testing were also identified by GS. SOC, however, required integration of multiple test results to obtain these diagnoses. GS yielded two more conclusive diagnoses, and four more possible diagnoses than ES-based SOC (35 vs. 31). Interestingly, these six variants detected only by GS were copy number variants (CNVs). Our data demonstrate the technical and clinical validity of GS to serve as routine first-tier genetic test for patients with NDD. Although the additional diagnostic yield from GS is limited, GS comprehensively identified all variants in a single experiment, suggesting that GS constitutes a more efficient genetic diagnostic workflow.


Subject(s)
Neurodevelopmental Disorders , Humans , Neurodevelopmental Disorders/diagnosis , Neurodevelopmental Disorders/genetics , Genetic Testing/methods , Base Sequence , Chromosome Mapping , Exome Sequencing
7.
Eur J Hum Genet ; 29(4): 637-648, 2021 04.
Article in English | MEDLINE | ID: mdl-33257779

ABSTRACT

Long-read sequencing (LRS) has the potential to comprehensively identify all medically relevant genome variation, including variation commonly missed by short-read sequencing (SRS) approaches. To determine this potential, we performed LRS around 15×-40× genome coverage using the Pacific Biosciences Sequel I System for five trios. The respective probands were diagnosed with intellectual disability (ID) whose etiology remained unresolved after SRS exomes and genomes. Systematic assessment of LRS coverage showed that ~35 Mb of the human reference genome was only accessible by LRS and not SRS. Genome-wide structural variant (SV) calling yielded on average 28,292 SV calls per individual, totaling 12.9 Mb of sequence. Trio-based analyses which allowed to study segregation, showed concordance for up to 95% of these SV calls across the genome, and 80% of the LRS SV calls were not identified by SRS. De novo mutation analysis did not identify any de novo SVs, confirming that these are rare events. Because of high sequence coverage, we were also able to call single nucleotide substitutions. On average, we identified 3 million substitutions per genome, with a Mendelian inheritance concordance of up to 97%. Of these, ~100,000 were located in the ~35 Mb of the genome that was only captured by LRS. Moreover, these variants affected the coding sequence of 64 genes, including 32 known Mendelian disease genes. Our data show the potential added value of LRS compared to SRS for identifying medically relevant genome variation.


Subject(s)
Genetic Testing/methods , Intellectual Disability/genetics , Sequence Analysis, DNA/methods , Humans , Intellectual Disability/diagnosis , Mutation , Pedigree , Polymorphism, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL