RESUMO
The exploration of genotypic variants impacting phenotypes is a cornerstone in genetics research. The emergence of vast collections containing deeply genotyped and phenotyped families has made it possible to pursue the search for variants associated with complex diseases. However, managing these large-scale datasets requires specialized computational tools tailored to organize and analyze the extensive data. GPF (Genotypes and Phenotypes in Families) is an open-source platform ( https://github.com/iossifovlab/gpf ) that manages genotypes and phenotypes derived from collections of families. The GPF interface allows interactive exploration of genetic variants, enrichment analysis for de novo mutations, and phenotype/genotype association tools. In addition, GPF allows researchers to share their data securely with the broader scientific community. GPF is used to disseminate two large-scale family collection datasets (SSC, SPARK) for the study of autism funded by the SFARI foundation. However, GPF is versatile and can manage genotypic data from other small or large family collections. Our GPF-SFARI GPF instance ( https://gpf.sfari.org/ ) provides protected access to comprehensive genotypic and phenotypic data for the SSC and SPARK. In addition, GPF-SFARI provides public access to an extensive collection of de novo mutations identified in individuals with autism and related disorders and to gene-level statistics of the protected datasets characterizing the genes' roles in autism. Here, we highlight the primary features of GPF within the context of GPF-SFARI.
RESUMO
Whole-exome sequencing of autism spectrum disorder (ASD) probands and unaffected family members has identified many genes harboring de novo variants suspected to play a causal role in the disorder. Of these, chromodomain helicase DNA-binding protein 8 (CHD8) is the most recurrently mutated. Despite the prevalence of CHD8 mutations, we have little insight into how CHD8 loss affects genome organization or the functional consequences of these molecular alterations in neurons. Here, we engineered two isogenic human embryonic stem cell lines with CHD8 loss-of-function mutations and characterized differences in differentiated human cortical neurons. We identified hundreds of genes with altered expression, including many involved in neural development and excitatory synaptic transmission. Field recordings and single-cell electrophysiology revealed a 3-fold decrease in firing rates and synaptic activity in CHD8+/- neurons, as well as a similar firing-rate deficit in primary cortical neurons from Chd8+/- mice. These alterations in neuron and synapse function can be reversed by CHD8 overexpression. Moreover, CHD8+/- neurons displayed a large increase in open chromatin across the genome, where the greatest change in compaction was near autism susceptibility candidate 2 (AUTS2), which encodes a transcriptional regulator implicated in ASD. Genes with changes in chromatin accessibility and expression in CHD8+/- neurons have significant overlap with genes mutated in probands for ASD, intellectual disability, and schizophrenia but not with genes mutated in healthy controls or other disease cohorts. Overall, this study characterizes key molecular alterations in genome structure and expression in CHD8+/- neurons and links these changes to impaired neuronal and synaptic function.
Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Humanos , Animais , Camundongos , Transtorno Autístico/genética , Transtorno do Espectro Autista/genética , Cromatina/genética , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Expressão Gênica , Fatores de Transcrição/genéticaRESUMO
Studying thousands of families, we find siblings concordant for autism share more of their parental genomes than expected by chance, and discordant siblings share less, consistent with a role of transmission in autism incidence. The excess sharing of the father is highly significant (p value of 0.0014), with less significance for the mother (p value of 0.31). To compare parental sharing, we adjust for differences in meiotic recombination to obtain a p value of 0.15 that they are shared equally. These observations are contrary to certain models in which the mother carries a greater load than the father. Nevertheless, we present models in which greater sharing of the father is observed even though the mother carries a greater load. More generally, our observations of sharing establish quantitative constraints that any complete genetic model of autism must satisfy, and our methods may be applicable to other complex disorders.
RESUMO
Exonic variants present some of the strongest links between genotype and phenotype. However, these variants can have significant inter-individual pathogenicity differences, known as variable penetrance. In this study, we propose a model where genetically controlled mRNA splicing modulates the pathogenicity of exonic variants. By first cataloging exonic inclusion from RNA-sequencing data in GTEx V8, we find that pathogenic alleles are depleted on highly included exons. Using a large-scale phased whole genome sequencing data from the TOPMed consortium, we observe that this effect may be driven by common splice-regulatory genetic variants, and that natural selection acts on haplotype configurations that reduce the transcript inclusion of putatively pathogenic variants, especially when limiting to haploinsufficient genes. Finally, we test if this effect may be relevant for autism risk using families from the Simons Simplex Collection, but find that splicing of pathogenic alleles has a penetrance reducing effect here as well. Overall, our results indicate that common splice-regulatory variants may play a role in reducing the damaging effects of rare exonic variants.
Assuntos
Sítios de Splice de RNA , Splicing de RNA , Penetrância , Éxons , Genótipo , RNA Mensageiro/genética , Processamento AlternativoRESUMO
Exonic variants present some of the strongest links between genotype and phenotype. However, these variants can have significant inter-individual pathogenicity differences, known as variable penetrance. In this study, we propose a model where genetically controlled mRNA splicing modulates the pathogenicity of exonic variants. By first cataloging exonic inclusion from RNA-seq data in GTEx v8, we find that pathogenic alleles are depleted on highly included exons. Using a large-scale phased WGS data from the TOPMed consortium, we observe that this effect may be driven by common splice-regulatory genetic variants, and that natural selection acts on haplotype configurations that reduce the transcript inclusion of putatively pathogenic variants, especially when limiting to haploinsufficient genes. Finally, we test if this effect may be relevant for autism risk using families from the Simons Simplex Collection, but find that splicing of pathogenic alleles has a penetrance reducing effect here as well. Overall, our results indicate that common splice-regulatory variants may play a role in reducing the damaging effects of rare exonic variants.
RESUMO
Autism arises in high and low-risk families. De novo mutation contributes to autism incidence in low-risk families as there is a higher incidence in the affected of the simplex families than in their unaffected siblings. But the extent of contribution in low-risk families cannot be determined solely from simplex families as they are a mixture of low and high-risk. The rate of de novo mutation in nearly pure populations of high-risk families, the multiplex families, has not previously been rigorously determined. Moreover, rates of de novo mutation have been underestimated from studies based on low resolution microarrays and whole exome sequencing. Here we report on findings from whole genome sequence (WGS) of both simplex families from the Simons Simplex Collection (SSC) and multiplex families from the Autism Genetic Resource Exchange (AGRE). After removing the multiplex samples with excessive cell-line genetic drift, we find that the contribution of de novo mutation in multiplex is significantly smaller than the contribution in simplex. We use WGS to provide high resolution CNV profiles and to analyze more than coding regions, and revise upward the rate in simplex autism due to an excess of de novo events targeting introns. Based on this study, we now estimate that de novo events contribute to 52-67% of cases of autism arising from low risk families, and 30-39% of cases of all autism.
Assuntos
Transtorno Autístico/epidemiologia , Predisposição Genética para Doença/genética , Mutação , Adulto , Transtorno do Espectro Autista , Transtorno Autístico/genética , Feminino , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , New York/epidemiologia , Fatores de Risco , Adulto JovemRESUMO
The ribosomal p70 S6 Kinase 1 (S6K1) has been implicated in the etiology of complex neurological diseases including autism, depression and dementia. Though no major gene disruption has been reported in humans in RPS6KB1, single nucleotide variants (SNVs) causing missense mutations have been identified, which have not been assessed for their impact on protein function. These S6K1 mutations have the potential to influence disease progression and treatment response. We mined the Simon Simplex Collection (SSC) and SPARK autism database to find inherited SNVs in S6K1 and characterized the effect of two missense SNVs, Asp14Asn (allele frequency = 0.03282%) and Glu44Gln (allele frequency = 0.0008244%), on S6K1 function in HEK293, human ES cells and primary neurons. Expressing Asp14Asn in HEK293 cells resulted in increased basal phosphorylation of downstream targets of S6K1 and increased de novo translation. This variant also showed blunted response to the specific S6K1 inhibitor, FS-115. In human embryonic cell line Shef4, Asp14Asn enhanced spontaneous neural fate specification in the absence of differentiating growth factors. In addition to enhanced translation, neurons expressing Asp14Asn exhibited impaired dendritic arborization and increased levels of phosphorylated ERK 1/2. Finally, in the SSC families tracked, Asp14Asn segregated with lower IQ scores when found in the autistic individual rather than the unaffected sibling. The Glu44Gln mutation showed a milder, but opposite phenotype in HEK cells as compared to Asp14Asn. Although the Glu44Gln mutation displayed increased neuronal translation, it had no impact on neuronal morphology. Our results provide the first characterization of naturally occurring human S6K1 variants on cognitive phenotype, neuronal morphology and maturation, underscoring again the importance of translation control in neural development and plasticity.
Assuntos
Hipocampo/metabolismo , Neurônios/metabolismo , Proteínas Quinases S6 Ribossômicas 70-kDa/metabolismo , Transdução de Sinais/fisiologia , Alelos , Animais , Forma Celular/genética , Frequência do Gene , Células HEK293 , Hipocampo/citologia , Humanos , Mutação , Neurogênese/fisiologia , Neurônios/citologia , Fosforilação , Ratos , Ratos Sprague-Dawley , Proteínas Quinases S6 Ribossômicas 70-kDa/genéticaRESUMO
Determining pathogenicity of genomic variation identified by next-generation sequencing techniques can be supported by recurrent disruptive variants in the same gene in phenotypically similar individuals. However, interpretation of novel variants in a specific gene in individuals with mild-moderate intellectual disability (ID) without recognizable syndromic features can be challenging and reverse phenotyping is often required. We describe 24 individuals with a de novo disease-causing variant in, or partial deletion of, the F-box only protein 11 gene (FBXO11, also known as VIT1 and PRMT9). FBXO11 is part of the SCF (SKP1-cullin-F-box) complex, a multi-protein E3 ubiquitin-ligase complex catalyzing the ubiquitination of proteins destined for proteasomal degradation. Twenty-two variants were identified by next-generation sequencing, comprising 2 in-frame deletions, 11 missense variants, 1 canonical splice site variant, and 8 nonsense or frameshift variants leading to a truncated protein or degraded transcript. The remaining two variants were identified by array-comparative genomic hybridization and consisted of a partial deletion of FBXO11. All individuals had borderline to severe ID and behavioral problems (autism spectrum disorder, attention-deficit/hyperactivity disorder, anxiety, aggression) were observed in most of them. The most relevant common facial features included a thin upper lip and a broad prominent space between the paramedian peaks of the upper lip. Other features were hypotonia and hyperlaxity of the joints. We show that de novo variants in FBXO11 cause a syndromic form of ID. The current series show the power of reverse phenotyping in the interpretation of novel genetic variances in individuals who initially did not appear to have a clear recognizable phenotype.
Assuntos
Anormalidades Múltiplas/genética , Comportamento , Proteínas F-Box/genética , Variação Genética , Deficiência Intelectual/genética , Proteína-Arginina N-Metiltransferases/genética , Deleção de Genes , Humanos , SíndromeRESUMO
Coding variants represent many of the strongest associations between genotype and phenotype; however, they exhibit inter-individual differences in effect, termed 'variable penetrance'. Here, we study how cis-regulatory variation modifies the penetrance of coding variants. Using functional genomic and genetic data from the Genotype-Tissue Expression Project (GTEx), we observed that in the general population, purifying selection has depleted haplotype combinations predicted to increase pathogenic coding variant penetrance. Conversely, in cancer and autism patients, we observed an enrichment of penetrance increasing haplotype configurations for pathogenic variants in disease-implicated genes, providing evidence that regulatory haplotype configuration of coding variants affects disease risk. Finally, we experimentally validated this model by editing a Mendelian single-nucleotide polymorphism (SNP) using CRISPR/Cas9 on distinct expression haplotypes with the transcriptome as a phenotypic readout. Our results demonstrate that joint regulatory and coding variant effects are an important part of the genetic architecture of human traits and contribute to modified penetrance of disease-causing variants.
Assuntos
Doença/genética , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único , Sistemas CRISPR-Cas , Genoma Humano , Haplótipos , Humanos , Fenótipo , Locos de Características Quantitativas , TranscriptomaRESUMO
In individuals with autism spectrum disorder (ASD), de novo mutations have previously been shown to be significantly correlated with lower IQ but not with the core characteristics of ASD: deficits in social communication and interaction and restricted interests and repetitive patterns of behavior. We extend these findings by demonstrating in the Simons Simplex Collection that damaging de novo mutations in ASD individuals are also significantly and convincingly correlated with measures of impaired motor skills. This correlation is not explained by a correlation between IQ and motor skills. We find that IQ and motor skills are distinctly associated with damaging mutations and, in particular, that motor skills are a more sensitive indicator of mutational severity than is IQ, as judged by mutational type and target gene. We use this finding to propose a combined classification of phenotypic severity: mild (little impairment of either), moderate (impairment mainly to motor skills), and severe (impairment of both IQ and motor skills).
Assuntos
Transtorno do Espectro Autista/genética , Destreza Motora/fisiologia , Criança , Feminino , Genótipo , Humanos , Masculino , MutaçãoRESUMO
We develop a method of analysis [affected to discordant sibling pairs (A2DS)] that tests if shared variants contribute to a disorder. Using a standard measure of genetic relation, test individuals are compared with a cohort of discordant sibling pairs (CDS) to derive a comparative similarity score. We ask if a test individual is more similar to an unrelated affected than to the unrelated unaffected sibling from the CDS and then, sum over such individuals and pairs. Statistical significance is judged by randomly permuting the affected status in the CDS. In the analysis of published genotype data from the Simons Simplex Collection (SSC) and the Autism Genetic Resource Exchange (AGRE) cohorts of children with autism spectrum disorder (ASD), we find strong statistical significance that the affected are more similar to the affected than to the unaffected of the CDS (P value â¼ 0.00001). Fathers in multiplex families have marginally greater similarity (P value = 0.02) to unrelated affected individuals. These results do not depend on ethnic matching or gender.
Assuntos
Transtorno Autístico/genética , Transtorno Autístico/fisiopatologia , Irmãos , Transtorno do Espectro Autista/genética , Transtorno do Espectro Autista/fisiopatologia , Criança , Pré-Escolar , Estudos de Coortes , Simulação por Computador , Saúde da Família , Feminino , Genótipo , Humanos , Masculino , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único , Fatores SexuaisRESUMO
As the second most common type of variation in the human genome, insertions and deletions (indels) have been linked to many diseases, but the discovery of indels of more than a few bases in size from short-read sequencing data remains challenging. Scalpel (http://scalpel.sourceforge.net) is an open-source software for reliable indel detection based on the microassembly technique. It has been successfully used to discover mutations in novel candidate genes for autism, and it is extensively used in other large-scale studies of human diseases. This protocol gives an overview of the algorithm and describes how to use Scalpel to perform highly accurate indel calling from whole-genome and whole-exome sequencing data. We provide detailed instructions for an exemplary family-based de novo study, but we also characterize the other two supported modes of operation: single-sample and somatic analysis. Indel normalization, visualization and annotation of the mutations are also illustrated. Using a standard server, indel discovery and characterization in the exonic regions of the example sequencing data can be completed in â¼5 h after read mapping.
Assuntos
Análise Mutacional de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação INDEL , Alelos , Genômica , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo ÚnicoRESUMO
We performed whole-genome sequencing (WGS) of 208 genomes from 53 families affected by simplex autism. For the majority of these families, no copy-number variant (CNV) or candidate de novo gene-disruptive single-nucleotide variant (SNV) had been detected by microarray or whole-exome sequencing (WES). We integrated multiple CNV and SNV analyses and extensive experimental validation to identify additional candidate mutations in eight families. We report that compared to control individuals, probands showed a significant (p = 0.03) enrichment of de novo and private disruptive mutations within fetal CNS DNase I hypersensitive sites (i.e., putative regulatory regions). This effect was only observed within 50 kb of genes that have been previously associated with autism risk, including genes where dosage sensitivity has already been established by recurrent disruptive de novo protein-coding mutations (ARID1B, SCN2A, NR3C2, PRKCA, and DSCAM). In addition, we provide evidence of gene-disruptive CNVs (in DISC1, WNT7A, RBFOX1, and MBD5), as well as smaller de novo CNVs and exon-specific SNVs missed by exome sequencing in neurodevelopmental genes (e.g., CANX, SAE1, and PIK3CA). Our results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.
Assuntos
Transtorno Autístico/genética , DNA/genética , Genoma Humano , Exoma , Feminino , Humanos , Masculino , Linhagem , Polimorfismo de Nucleotídeo ÚnicoRESUMO
We previously computed that genes with de novo (DN) likely gene-disruptive (LGD) mutations in children with autism spectrum disorders (ASD) have high vulnerability: disruptive mutations in many of these genes, the vulnerable autism genes, will have a high likelihood of resulting in ASD. Because individuals with ASD have lower fecundity, such mutations in autism genes would be under strong negative selection pressure. An immediate prediction is that these genes will have a lower LGD load than typical genes in the human gene pool. We confirm this hypothesis in an explicit test by measuring the load of disruptive mutations in whole-exome sequence databases from two cohorts. We use information about mutational load to show that lower and higher intelligence quotients (IQ) affected individuals can be distinguished by the mutational load in their respective gene targets, as well as to help prioritize gene targets by their likelihood of being autism genes. Moreover, we demonstrate that transmission of rare disruptions in genes with a lower LGD load occurs more often to affected offspring; we show transmission originates most often from the mother, and transmission of such variants is seen more often in offspring with lower IQ. A surprising proportion of transmission of these rare events comes from genes expressed in the embryonic brain that show sharply reduced expression shortly after birth.
Assuntos
Transtorno Autístico/genética , Bases de Dados Genéticas , Exoma , Pool Gênico , Modelos Genéticos , Mutação , Criança , Pré-Escolar , Feminino , Humanos , MasculinoRESUMO
Autism spectrum disorders (ASDs) are a group of developmental disabilities that affect social interaction and communication and are characterized by repetitive behaviors. There is now a large body of evidence that suggests a complex role of genetics in ASDs, in which many different loci are involved. Although many current population-scale genomic studies have been demonstrably fruitful, these studies generally focus on analyzing a limited part of the genome or use a limited set of bioinformatics tools. These limitations preclude the analysis of genome-wide perturbations that may contribute to the development and severity of ASD-related phenotypes. To overcome these limitations, we have developed and utilized an integrative clinical and bioinformatics pipeline for generating a more complete and reliable set of genomic variants for downstream analyses. Our study focuses on the analysis of three simplex autism families consisting of one affected child, unaffected parents, and one unaffected sibling. All members were clinically evaluated and widely phenotyped. Genotyping arrays and whole-genome sequencing were performed on each member, and the resulting sequencing data were analyzed using a variety of available bioinformatics tools. We searched for rare variants of putative functional impact that were found to be segregating according to de novo, autosomal recessive, X-linked, mitochondrial, and compound heterozygote transmission models. The resulting candidate variants included three small heterozygous copy-number variations (CNVs), a rare heterozygous de novo nonsense mutation in MYBBP1A located within exon 1, and a novel de novo missense variant in LAMB3. Our work demonstrates how more comprehensive analyses that include rich clinical data and whole-genome sequencing data can generate reliable results for use in downstream investigations.
RESUMO
Congenital heart disease (CHD) patients have an increased prevalence of extracardiac congenital anomalies (CAs) and risk of neurodevelopmental disabilities (NDDs). Exome sequencing of 1213 CHD parent-offspring trios identified an excess of protein-damaging de novo mutations, especially in genes highly expressed in the developing heart and brain. These mutations accounted for 20% of patients with CHD, NDD, and CA but only 2% of patients with isolated CHD. Mutations altered genes involved in morphogenesis, chromatin modification, and transcriptional regulation, including multiple mutations in RBFOX2, a regulator of mRNA splicing. Genes mutated in other cohorts examined for NDD were enriched in CHD cases, particularly those with coexisting NDD. These findings reveal shared genetic contributions to CHD, NDD, and CA and provide opportunities for improved prognostic assessment and early therapeutic intervention in CHD patients.
Assuntos
Cardiopatias Congênitas/diagnóstico , Cardiopatias Congênitas/genética , Malformações do Sistema Nervoso/genética , Neurogênese/genética , Encéfalo/anormalidades , Encéfalo/metabolismo , Criança , Anormalidades Congênitas/genética , Exoma/genética , Humanos , Mutação , Prognóstico , Splicing de RNA/genética , Fatores de Processamento de RNA , RNA Mensageiro/genética , Proteínas de Ligação a RNA/genética , Proteínas Repressoras/genética , Transcrição GênicaRESUMO
Whole exome sequencing has proven to be a powerful tool for understanding the genetic architecture of human disease. Here we apply it to more than 2,500 simplex families, each having a child with an autistic spectrum disorder. By comparing affected to unaffected siblings, we show that 13% of de novo missense mutations and 43% of de novo likely gene-disrupting (LGD) mutations contribute to 12% and 9% of diagnoses, respectively. Including copy number variants, coding de novo mutations contribute to about 30% of all simplex and 45% of female diagnoses. Almost all LGD mutations occur opposite wild-type alleles. LGD targets in affected females significantly overlap the targets in males of lower intelligence quotient (IQ), but neither overlaps significantly with targets in males of higher IQ. We estimate that LGD mutation in about 400 genes can contribute to the joint class of affected females and males of lower IQ, with an overlapping and similar number of genes vulnerable to contributory missense mutation. LGD targets in the joint class overlap with published targets for intellectual disability and schizophrenia, and are enriched for chromatin modifiers, FMRP-associated genes and embryonically expressed genes. Most of the significance for the latter comes from affected females.
Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Predisposição Genética para Doença/genética , Mutação/genética , Fases de Leitura Aberta/genética , Criança , Análise por Conglomerados , Exoma/genética , Feminino , Genes , Humanos , Testes de Inteligência , Masculino , Reprodutibilidade dos TestesRESUMO
BACKGROUND: INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts. METHODS: We characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low-quality INDELs (7% vs. 51%). RESULTS: Simulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5 bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (53%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (84% vs. 57%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data. CONCLUSIONS: Overall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project costs because of the greater accuracy and sensitivity. Finally, we investigate sources of INDEL errors (for example, capture deficiency, PCR amplification, homopolymers) with various data that will serve as a guideline to effectively reduce INDEL errors in genome sequencing.
RESUMO
RATIONALE: Congenital heart disease (CHD) is among the most common birth defects. Most cases are of unknown pathogenesis. OBJECTIVE: To determine the contribution of de novo copy number variants (CNVs) in the pathogenesis of sporadic CHD. METHODS AND RESULTS: We studied 538 CHD trios using genome-wide dense single nucleotide polymorphism arrays and whole exome sequencing. Results were experimentally validated using digital droplet polymerase chain reaction. We compared validated CNVs in CHD cases with CNVs in 1301 healthy control trios. The 2 complementary high-resolution technologies identified 63 validated de novo CNVs in 51 CHD cases. A significant increase in CNV burden was observed when comparing CHD trios with healthy trios, using either single nucleotide polymorphism array (P=7×10(-5); odds ratio, 4.6) or whole exome sequencing data (P=6×10(-4); odds ratio, 3.5) and remained after removing 16% of de novo CNV loci previously reported as pathogenic (P=0.02; odds ratio, 2.7). We observed recurrent de novo CNVs on 15q11.2 encompassing CYFIP1, NIPA1, and NIPA2 and single de novo CNVs encompassing DUSP1, JUN, JUP, MED15, MED9, PTPRE SREBF1, TOP2A, and ZEB2, genes that interact with established CHD proteins NKX2-5 and GATA4. Integrating de novo variants in whole exome sequencing and CNV data suggests that ETS1 is the pathogenic gene altered by 11q24.2-q25 deletions in Jacobsen syndrome and that CTBP2 is the pathogenic gene in 10q subtelomeric deletions. CONCLUSIONS: We demonstrate a significantly increased frequency of rare de novo CNVs in CHD patients compared with healthy controls and suggest several novel genetic loci for CHD.
Assuntos
Variações do Número de Cópias de DNA/genética , Exoma/genética , Frequência do Gene/genética , Cardiopatias Congênitas/genética , Polimorfismo de Nucleotídeo Único/genética , Estudos de Casos e Controles , Estudos de Coortes , Redes Reguladoras de Genes/genética , Cardiopatias Congênitas/diagnóstico , Humanos , Dados de Sequência MolecularRESUMO
We present an open-source algorithm, Scalpel (http://scalpel.sourceforge.net/), which combines mapping and assembly for sensitive and specific discovery of insertions and deletions (indels) in exome-capture data. A detailed repeat analysis coupled with a self-tuning k-mer strategy allows Scalpel to outperform other state-of-the-art approaches for indel discovery, particularly in regions containing near-perfect repeats. We analyzed 593 families from the Simons Simplex Collection and demonstrated Scalpel's power to detect long (≥30 bp) transmitted events and enrichment for de novo likely gene-disrupting indels in autistic children.