ABSTRACT
MOTIVATION: Taxonomic classification of 16S ribosomal RNA gene amplicon is an efficient and economic approach in microbiome analysis. 16S rRNA sequence databases like SILVA, RDP, EzBioCloud and HOMD used in downstream bioinformatic pipelines have limitations on either the sequence redundancy or the delay on new sequence recruitment. To improve the 16S rRNA gene-based taxonomic classification, we merged these widely used databases and a collection of novel sequences systemically into an integrated resource. RESULTS: MetaSquare version 1.0 is an integrated 16S rRNA sequence database. It is composed of more than 6 million sequences and improves taxonomic classification resolution on both long-read and short-read methods. AVAILABILITY AND IMPLEMENTATION: Accessible at https://hub.docker.com/r/lsbnb/metasquare_db and https://github.com/lsbnb/MetaSquare. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Microbiota , Genes, rRNA , Microbiota/genetics , Phylogeny , RNA, Ribosomal, 16S/genetics , Sequence Analysis, DNA/methodsABSTRACT
Cerebral arachnoid cysts (ACs) are one of the most common and poorly understood types of developmental brain lesion. To begin to elucidate AC pathogenesis, we performed an integrated analysis of 617 patient-parent (trio) exomes, 152,898 human brain and mouse meningeal single-cell RNA sequencing transcriptomes and natural language processing data of patient medical records. We found that damaging de novo variants (DNVs) were highly enriched in patients with ACs compared with healthy individuals (P = 1.57 × 10-33). Seven genes harbored an exome-wide significant DNV burden. AC-associated genes were enriched for chromatin modifiers and converged in midgestational transcription networks essential for neural and meningeal development. Unsupervised clustering of patient phenotypes identified four AC subtypes and clinical severity correlated with the presence of a damaging DNV. These data provide insights into the coordinated regulation of brain and meningeal development and implicate epigenomic dysregulation due to DNVs in AC pathogenesis. Our results provide a preliminary indication that, in the appropriate clinical context, ACs may be considered radiographic harbingers of neurodevelopmental pathology warranting genetic testing and neurobehavioral follow-up. These data highlight the utility of a systems-level, multiomics approach to elucidate sporadic structural brain disease.
Subject(s)
Arachnoid Cysts , Multiomics , Humans , Animals , Mice , Arachnoid Cysts/diagnostic imaging , Arachnoid Cysts/genetics , Brain/diagnostic imaging , Exome/genetics , Genetic TestingABSTRACT
To elucidate the pathogenesis of vein of Galen malformations (VOGMs), the most common and severe congenital brain arteriovenous malformation, we performed an integrated analysis of 310 VOGM proband-family exomes and 336,326 human cerebrovasculature single-cell transcriptomes. We found the Ras suppressor p120 RasGAP ( RASA1 ) harbored a genome-wide significant burden of loss-of-function de novo variants (p=4.79×10 -7 ). Rare, damaging transmitted variants were enriched in Ephrin receptor-B4 ( EPHB4 ) (p=1.22×10 -5 ), which cooperates with p120 RasGAP to limit Ras activation. Other probands had pathogenic variants in ACVRL1 , NOTCH1 , ITGB1 , and PTPN11 . ACVRL1 variants were also identified in a multi-generational VOGM pedigree. Integrative genomics defined developing endothelial cells as a key spatio-temporal locus of VOGM pathophysiology. Mice expressing a VOGM-specific EPHB4 kinase-domain missense variant exhibited constitutive endothelial Ras/ERK/MAPK activation and impaired hierarchical development of angiogenesis-regulated arterial-capillary-venous networks, but only when carrying a "second-hit" allele. These results illuminate human arterio-venous development and VOGM pathobiology and have clinical implications.
ABSTRACT
To elucidate the pathogenesis of vein of Galen malformations (VOGMs), the most common and most severe of congenital brain arteriovenous malformations, we performed an integrated analysis of 310 VOGM proband-family exomes and 336,326 human cerebrovasculature single-cell transcriptomes. We found the Ras suppressor p120 RasGAP (RASA1) harbored a genome-wide significant burden of loss-of-function de novo variants (2042.5-fold, p = 4.79 x 10-7). Rare, damaging transmitted variants were enriched in Ephrin receptor-B4 (EPHB4) (17.5-fold, p = 1.22 x 10-5), which cooperates with p120 RasGAP to regulate vascular development. Additional probands had damaging variants in ACVRL1, NOTCH1, ITGB1, and PTPN11. ACVRL1 variants were also identified in a multi-generational VOGM pedigree. Integrative genomic analysis defined developing endothelial cells as a likely spatio-temporal locus of VOGM pathophysiology. Mice expressing a VOGM-specific EPHB4 kinase-domain missense variant (Phe867Leu) exhibited disrupted developmental angiogenesis and impaired hierarchical development of arterial-capillary-venous networks, but only in the presence of a "second-hit" allele. These results illuminate human arterio-venous development and VOGM pathobiology and have implications for patients and their families.
Subject(s)
Vascular Diseases , Vein of Galen Malformations , Humans , Animals , Mice , Vein of Galen Malformations/genetics , Vein of Galen Malformations/pathology , Endothelial Cells/pathology , Mutation , Signal Transduction/genetics , Mutation, Missense , GTPase-Activating Proteins/genetics , Activin Receptors, Type II/genetics , p120 GTPase Activating Protein/geneticsABSTRACT
Rapid methodological advances in statistical and computational genomics have enabled researchers to better identify and interpret both rare and common variants responsible for complex human diseases. As we continue to see an expansion of these advances in the field, it is now imperative for researchers to understand the resources and methodologies available for various data types and study designs. In this review, we provide an overview of recent methods for identifying rare and common variants and understanding their roles in disease etiology. Additionally, we discuss the strategy, challenge, and promise of gene therapy. As computational and statistical approaches continue to improve, we will have an opportunity to translate human genetic findings into personalized health care.