Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
Am J Hum Genet ; 111(8): 1626-1642, 2024 Aug 08.
Article in English | MEDLINE | ID: mdl-39013459

ABSTRACT

Trithorax-related H3K4 methyltransferases, KMT2C and KMT2D, are critical epigenetic modifiers. Haploinsufficiency of KMT2C was only recently recognized as a cause of neurodevelopmental disorder (NDD), so the clinical and molecular spectrums of the KMT2C-related NDD (now designated as Kleefstra syndrome 2) are largely unknown. We ascertained 98 individuals with rare KMT2C variants, including 75 with protein-truncating variants (PTVs). Notably, ∼15% of KMT2C PTVs were inherited. Although the most highly expressed KMT2C transcript consists of only the last four exons, pathogenic PTVs were found in almost all the exons of this large gene. KMT2C variant interpretation can be challenging due to segmental duplications and clonal hematopoesis-induced artifacts. Using samples from 27 affected individuals, divided into discovery and validation cohorts, we generated a moderate strength disorder-specific KMT2C DNA methylation (DNAm) signature and demonstrate its utility in classifying non-truncating variants. Based on 81 individuals with pathogenic/likely pathogenic variants, we demonstrate that the KMT2C-related NDD is characterized by developmental delay, intellectual disability, behavioral and psychiatric problems, hypotonia, seizures, short stature, and other comorbidities. The facial module of PhenoScore, applied to photographs of 34 affected individuals, reveals that the KMT2C-related facial gestalt is significantly different from the general NDD population. Finally, using PhenoScore and DNAm signatures, we demonstrate that the KMT2C-related NDD is clinically and epigenetically distinct from Kleefstra and Kabuki syndromes. Overall, we define the clinical features, molecular spectrum, and DNAm signature of the KMT2C-related NDD and demonstrate they are distinct from Kleefstra and Kabuki syndromes highlighting the need to rename this condition.


Subject(s)
Abnormalities, Multiple , Chromosome Deletion , Chromosomes, Human, Pair 9 , Craniofacial Abnormalities , DNA Methylation , DNA-Binding Proteins , Face , Hematologic Diseases , Intellectual Disability , Neurodevelopmental Disorders , Vestibular Diseases , Humans , Abnormalities, Multiple/genetics , Vestibular Diseases/genetics , Intellectual Disability/genetics , Face/abnormalities , Face/pathology , DNA-Binding Proteins/genetics , Male , Female , Hematologic Diseases/genetics , Neurodevelopmental Disorders/genetics , Craniofacial Abnormalities/genetics , Chromosomes, Human, Pair 9/genetics , Child , DNA Methylation/genetics , Child, Preschool , Neoplasm Proteins/genetics , Adolescent , Hypertrichosis/genetics , Mutation , Failure to Thrive/genetics , Histone-Lysine N-Methyltransferase/genetics , Heart Defects, Congenital
2.
Nat Methods ; 19(4): 429-440, 2022 04.
Article in English | MEDLINE | ID: mdl-35396482

ABSTRACT

Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.


Subject(s)
Metagenome , Metagenomics , Archaea/genetics , Metagenomics/methods , Reproducibility of Results , Sequence Analysis, DNA , Software
3.
Bioinformatics ; 38(18): 4423-4425, 2022 09 15.
Article in English | MEDLINE | ID: mdl-35904548

ABSTRACT

SUMMARY: Bioinformatics applications increasingly rely on ad hoc disk storage of k-mer sets, e.g. for de Bruijn graphs or alignment indexes. Here, we introduce the K-mer File Format as a general lossless framework for storing and manipulating k-mer sets, realizing space savings of 3-5× compared to other formats, and bringing interoperability across tools. AVAILABILITY AND IMPLEMENTATION: Format specification, C++/Rust API, tools: https://github.com/Kmer-File-Format/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Software , Sequence Analysis, DNA , Compact Disks
4.
Bioinformatics ; 36(12): 3894-3896, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32315402

ABSTRACT

MOTIVATION: Genome assembly is increasingly performed on long, uncorrected reads. Assembly quality may be degraded due to unfiltered chimeric reads; also, the storage of all read overlaps can take up to terabytes of disk space. RESULTS: We introduce two tools: yacrd for chimera removal and read scrubbing, and fpa for filtering out spurious overlaps. We show that yacrd results in higher-quality assemblies and is one hundred times faster than the best available alternative. AVAILABILITY AND IMPLEMENTATION: https://github.com/natir/yacrd and https://github.com/natir/fpa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing , Software , Sequence Analysis, DNA
5.
Bioinformatics ; 35(21): 4239-4246, 2019 11 01.
Article in English | MEDLINE | ID: mdl-30918948

ABSTRACT

MOTIVATION: Long-read genome assembly tools are expected to reconstruct bacterial genomes nearly perfectly; however, they still produce fragmented assemblies in some cases. It would be beneficial to understand whether these cases are intrinsically impossible to resolve, or if assemblers are at fault, implying that genomes could be refined or even finished with little to no additional experimental cost. RESULTS: We propose a set of computational techniques to assist inspection of fragmented bacterial genome assemblies, through careful analysis of assembly graphs. By finding paths of overlapping raw reads between pairs of contigs, we recover potential short-range connections between contigs that were lost during the assembly process. We show that our procedure recovers 45% of missing contig adjacencies in fragmented Canu assemblies, on samples from the NCTC bacterial sequencing project. We also observe that a simple procedure based on enumerating weighted Hamiltonian cycles can suggest likely contig orderings. In our tests, the correct contig order is ranked first in half of the cases and within the top-three predictions in nearly all evaluated cases, providing a direction for finishing fragmented long-read assemblies. AVAILABILITY AND IMPLEMENTATION: https://gitlab.inria.fr/pmarijon/knot . SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome, Bacterial , High-Throughput Nucleotide Sequencing , Algorithms , Bacteria , Sequence Analysis, DNA , Software
7.
Bioinform Adv ; 2(1): vbab028, 2022.
Article in English | MEDLINE | ID: mdl-36699349

ABSTRACT

Summary: Cutevariant is a graphical user interface (GUI)-based desktop application designed to filter variations from annotated VCF file. The application imports data into a local SQLite database where complex filter queries can be built either from GUI controllers or using a domain-specific language called Variant Query Language. Cutevariant provides more features than existing applications and is fully customizable thanks to a complete plugins architecture. Availability and implementation: Cutevariant is distributed as a multiplatform client-side software under an open source license and is available at https://github.com/labsquare/cutevariant.

8.
Nat Biotechnol ; 39(3): 302-308, 2021 03.
Article in English | MEDLINE | ID: mdl-33288906

ABSTRACT

Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.


Subject(s)
Genome, Human , High-Throughput Nucleotide Sequencing/methods , Parents , Sequence Analysis, DNA/methods , Single-Cell Analysis/methods , Algorithms , Haplotypes , Humans , Puerto Rico/ethnology
SELECTION OF CITATIONS
SEARCH DETAIL