RESUMO
BACKGROUND AND AIMS: In the classical form of α1-antitrypsin deficiency, a misfolded variant α1-antitrypsin Z accumulates in the endoplasmic reticulum of liver cells and causes liver cell injury by gain-of-function proteotoxicity in a sub-group of affected homozygotes but relatively little is known about putative modifiers. Here, we carried out genomic sequencing in a uniquely affected family with an index case of liver failure and 2 homozygous siblings with minimal or no liver disease. Their sequences were compared to sequences in well-characterized cohorts of homozygotes with or without liver disease, and then candidate sequence variants were tested for changes in the kinetics of α1-antitrypsin variant Z degradation in iPS-derived hepatocyte-like cells derived from the affected siblings themselves. APPROACH AND RESULTS: Specific variants in autophagy genes MTMR12 and FAM134A could each accelerate the degradation of α1-antitrypsin variant Z in cells from the index patient, but both MTMR12 and FAM134A variants were needed to slow the degradation of α1-antitrypsin variant Z in cells from a protected sib, indicating that inheritance of both variants is needed to mediate the pathogenic effects of hepatic proteotoxicity at the cellular level. Analysis of homozygote cohorts showed that multiple patient-specific variants in proteostasis genes are likely to explain liver disease susceptibility at the population level. CONCLUSIONS: These results validate the concept that genetic variation in autophagy function can determine susceptibility to liver disease in α1-antitrypsin deficiency and provide evidence that polygenic mechanisms and multiple patient-specific variants are likely needed for proteotoxic pathology.
Assuntos
Autofagia , Peptídeos e Proteínas de Sinalização Intracelular , Proteínas de Membrana , Fenótipo , Deficiência de alfa 1-Antitripsina , alfa 1-Antitripsina , Humanos , Deficiência de alfa 1-Antitripsina/genética , Deficiência de alfa 1-Antitripsina/patologia , Proteínas de Membrana/genética , Masculino , Autofagia/genética , Feminino , Peptídeos e Proteínas de Sinalização Intracelular/genética , alfa 1-Antitripsina/genética , alfa 1-Antitripsina/metabolismo , Adulto , Linhagem , Fígado/patologia , Fígado/metabolismoRESUMO
Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power.
Assuntos
Sequenciamento do Exoma , Estudos de Associação Genética/métodos , Predisposição Genética para Doença/genética , Variação Genética/genética , Locos de Características Quantitativas/genética , Alelos , HDL-Colesterol/genética , Análise por Conglomerados , Determinação de Ponto Final , Finlândia , Mapeamento Geográfico , Humanos , Herança Multifatorial/genética , Reprodutibilidade dos TestesRESUMO
An Amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMO
Each human genome includes de novo mutations that arose during gametogenesis. While these germline mutations represent a fundamental source of new genetic diversity, they can also create deleterious alleles that impact fitness. Whereas the rate and patterns of point mutations in the human germline are now well understood, far less is known about the frequency and features that impact de novo structural variants (dnSVs). We report a family-based study of germline mutations among 9,599 human genomes from 33 multigenerational CEPH-Utah families and 2,384 families from the Simons Foundation Autism Research Initiative. We find that de novo structural mutations detected by alignment-based, short-read WGS occur at an overall rate of at least 0.160 events per genome in unaffected individuals, and we observe a significantly higher rate (0.206 per genome) in ASD-affected individuals. In both probands and unaffected samples, nearly 73% of de novo structural mutations arose in paternal gametes, and we predict most de novo structural mutations to be caused by mutational mechanisms that do not require sequence homology. After multiple testing correction, we did not observe a statistically significant correlation between parental age and the rate of de novo structural variation in offspring. These results highlight that a spectrum of mutational mechanisms contribute to germline structural mutations and that these mechanisms most likely have markedly different rates and selective pressures than those leading to point mutations.
Assuntos
Família , Genoma Humano/genética , Células Germinativas , Mutação em Linhagem Germinativa/genética , Taxa de Mutação , Envelhecimento/genética , Transtorno Autístico/genética , Viés , Variações do Número de Cópias de DNA/genética , Análise Mutacional de DNA , Feminino , Humanos , Masculino , Idade Paterna , Mutação Puntual/genéticaRESUMO
The NFIX gene encodes a DNA-binding protein belonging to the nuclear factor one (NFI) family of transcription factors. Pathogenic variants of NFIX are associated with two autosomal dominant Mendelian disorders, Malan syndrome (MIM 614753) and Marshall-Smith syndrome (MIM 602535), which are clinically distinct due to different disease-causing mechanisms. NFIX variants associated with Malan syndrome are missense variants mostly located in exon 2 encoding the N-terminal DNA binding and dimerization domain or are protein-truncating variants that trigger nonsense-mediated mRNA decay (NMD) resulting in NFIX haploinsufficiency. NFIX variants associated with Marshall-Smith syndrome are protein-truncating and are clustered between exons 6 and 10, including a recurrent Alu-mediated deletion of exons 6 and 7, which can escape NMD. The more severe phenotype of Marshall-Smith syndrome is likely due to a dominant-negative effect of these protein-truncating variants that escape NMD. Here, we report a child with clinical features of Malan syndrome who has a de novo NFIX intragenic duplication. Using genome sequencing, exon-level microarray analysis, and RNA sequencing, we show that this duplication encompasses exons 6 and 7 and leads to NFIX haploinsufficiency. To our knowledge, this is the first reported case of Malan Syndrome caused by an intragenic NFIX duplication.
Assuntos
Anormalidades Múltiplas , Doenças do Desenvolvimento Ósseo , Anormalidades Craniofaciais , Deficiência Intelectual , Megalencefalia , Displasia Septo-Óptica , Síndrome de Sotos , Criança , Humanos , Fatores de Transcrição NFI/genética , Síndrome de Sotos/genética , Éxons/genética , Megalencefalia/genética , Deficiência Intelectual/genética , Análise de Sequência de RNARESUMO
BACKGROUND: Identification of deleterious genetic variants using DNA sequencing data relies on increasingly detailed filtering strategies to isolate the small subset of variants that are more likely to underlie a disease phenotype. Datasets reflecting population allele frequencies of different types of variants serve as powerful filtering tools, especially in the context of rare disease analysis. While such population-scale allele frequency datasets now exist for structural variants (SVs), it remains a challenge to match SV calls between multiple datasets, thereby complicating estimates of a putative SV's population allele frequency. RESULTS: We introduce SVAFotate, a software tool that enables the annotation of SVs with variant allele frequency and related information from existing SV datasets. As a result, VCF files annotated by SVAFotate offer a variety of metrics to aid in the stratification of SVs as common or rare in the broader human population. CONCLUSIONS: Here we demonstrate the use of SVAFotate in the classification of SVs with regards to their population frequency and illustrate how SVAFotate's annotations can be used to filter and prioritize SVs. Lastly, we detail how best to utilize these SV annotations in the analysis of genetic variation in studies of rare disease.
Assuntos
Frequência do Gene , Sequenciamento de Nucleotídeos em Larga Escala , Software , Humanos , Doenças RarasRESUMO
The somatic mutation burden in healthy white blood cells (WBCs) is not well known. Based on deep whole-genome sequencing, we estimate that approximately 450 somatic mutations accumulated in the nonrepetitive genome within the healthy blood compartment of a 115-yr-old woman. The detected mutations appear to have been harmless passenger mutations: They were enriched in noncoding, AT-rich regions that are not evolutionarily conserved, and they were depleted for genomic elements where mutations might have favorable or adverse effects on cellular fitness, such as regions with actively transcribed genes. The distribution of variant allele frequencies of these mutations suggests that the majority of the peripheral white blood cells were offspring of two related hematopoietic stem cell (HSC) clones. Moreover, telomere lengths of the WBCs were significantly shorter than telomere lengths from other tissues. Together, this suggests that the finite lifespan of HSCs, rather than somatic mutation effects, may lead to hematopoietic clonal evolution at extreme ages.
Assuntos
Evolução Clonal , Hematopoese , Leucócitos/metabolismo , Longevidade/genética , Mutação , Sequência Rica em At , Idoso de 80 Anos ou mais , Linhagem da Célula , Sequência Conservada , Feminino , Frequência do Gene , Genoma , Células-Tronco Hematopoéticas/citologia , Células-Tronco Hematopoéticas/metabolismo , Células-Tronco Hematopoéticas/fisiologia , Humanos , Leucócitos/citologia , Leucócitos/fisiologia , Telômero/genética , Encurtamento do TelômeroRESUMO
Using five complementary short- and long-read sequencing technologies, we phased and assembled >95% of each diploid human genome in a four-generation, 28-member family (CEPH 1463) allowing us to systematically assess de novo mutations (DNMs) and recombination. From this family, we estimate an average of 192 DNMs per generation, including 75.5 de novo single-nucleotide variants (SNVs), 7.4 non-tandem repeat indels, 79.6 de novo indels or structural variants (SVs) originating from tandem repeats, 7.7 centromeric de novo SVs and SNVs, and 12.4 de novo Y chromosome events per generation. STRs and VNTRs are the most mutable with 32 loci exhibiting recurrent mutation through the generations. We accurately assemble 288 centromeres and six Y chromosomes across the generations, documenting de novo SVs, and demonstrate that the DNM rate varies by an order of magnitude depending on repeat content, length, and sequence identity. We show a strong paternal bias (75-81%) for all forms of germline DNM, yet we estimate that 17% of de novo SNVs are postzygotic in origin with no paternal bias. We place all this variation in the context of a high-resolution recombination map (~3.5 kbp breakpoint resolution). We observe a strong maternal recombination bias (1.36 maternal:paternal ratio) with a consistent reduction in the number of crossovers with increasing paternal (r=0.85) and maternal (r=0.65) age. However, we observe no correlation between meiotic crossover locations and de novo SVs, arguing against non-allelic homologous recombination as a predominant mechanism. The use of multiple orthogonal technologies, near-telomere-to-telomere phased genome assemblies, and a multi-generation family to assess transmission has created the most comprehensive, publicly available "truth set" of all classes of genomic variants. The resource can be used to test and benchmark new algorithms and technologies to understand the most fundamental processes underlying human genetic variation.
RESUMO
The size, shape, and behavior of the modern domesticated dog has been sculpted by artificial selection for at least 14,000 years. The genetic substrates of selective breeding, however, remain largely unknown. Here, we describe a genome-wide scan for selection in 275 dogs from 10 phenotypically diverse breeds that were genotyped for over 21,000 autosomal SNPs. We identified 155 genomic regions that possess strong signatures of recent selection and contain candidate genes for phenotypes that vary most conspicuously among breeds, including size, coat color and texture, behavior, skeletal morphology, and physiology. In addition, we demonstrate a significant association between HAS2 and skin wrinkling in the Shar-Pei, and provide evidence that regulatory evolution has played a prominent role in the phenotypic diversification of modern dog breeds. Our results provide a first-generation map of selection in the dog, illustrate how such maps can rapidly inform the genetic basis of canine phenotypic variation, and provide a framework for delineating the mechanistic basis of how artificial selection promotes rapid and pronounced phenotypic evolution.
Assuntos
Cães/genética , Genoma , Seleção Genética , Animais , Fenótipo , Polimorfismo de Nucleotídeo Único , Especificidade da EspécieRESUMO
BACKGROUND: There are likely to be hundreds of monogenic forms of human male infertility. Whole genome sequencing (WGS) is the most efficient way to make progress in mapping the causative genetic variants, and ultimately improve clinical management of the disease in each patient. Recruitment of consanguineous families is an effective approach to ascertain the genetic forms of many diseases. OBJECTIVES: To apply WGS to large consanguineous families with likely hereditary male infertility and identify potential genetic cases. MATERIALS AND METHODS: We recruited seven large families with clinically diagnosed male infertility from rural Pakistan, including five with a history of consanguinity. We generated WGS data on 26 individuals (3-5 per family) and analyzed the resulting data with a computational pipeline to identify potentially causal single nucleotide variants, indels, and copy number variants. RESULTS: We identified plausible genetic causes in five of the seven families, including a homozygous 10 kb deletion of exon 2 in a well-established male infertility gene (M1AP), and biallelic missense substitutions (SPAG6, CCDC9, TUBA3C) and an in-frame hemizygous deletion (TKTL1) in genes with emerging relevance. DISCUSSION AND CONCLUSION: The rate of genetic findings using the current approach (71%) was much higher than what we recently achieved using whole-exome sequencing (WES) of unrelated singleton cases (20%). Furthermore, we identified a pathogenic single-exon deletion in M1AP that would be undetectable by WES. Screening more families with WGS, especially in underrepresented populations, will further reveal the types of variants underlying male infertility and accelerate the use of genetics in the patient management.
RESUMO
Congenital myasthenic syndrome (CMS) is a group of 32 disorders involving genetic dysfunction at the neuromuscular junction resulting in skeletal muscle weakness that worsens with physical activity. Precise diagnosis and molecular subtype identification are critical for treatment as medication for one subtype may exacerbate disease in another (Engel et al., Lancet Neurol 14: 420 [2015]; Finsterer, Orphanet J Rare Dis 14: 57 [2019]; Prior and Ghosh, J Child Neurol 36: 610 [2021]). The SNAP25-related CMS subtype (congenital myasthenic syndrome 18, CMS18; MIM #616330) is a rare disorder characterized by muscle fatigability, delayed psychomotor development, and ataxia. Herein, we performed rapid whole-genome sequencing (rWGS) on a critically ill newborn leading to the discovery of an unreported pathogenic de novo SNAP25 c.529C > T; p.Gln177Ter variant. In this report, we present a novel case of CMS18 with complex neonatal consequence. This discovery offers unique insight into the extent of phenotypic severity in CMS18, expands the reported SNAP25 variant phenotype, and paves a foundation for personalized management for CMS18.
Assuntos
Síndromes Miastênicas Congênitas , Humanos , Mapeamento Cromossômico , Síndromes Miastênicas Congênitas/diagnóstico , Síndromes Miastênicas Congênitas/genética , Linhagem , Fenótipo , Proteína 25 Associada a Sinaptossoma/genética , Sequenciamento Completo do GenomaRESUMO
BACKGROUND: Genetic disorders contribute to significant morbidity and mortality in critically ill newborns. Despite advances in genome sequencing technologies, a majority of neonatal cases remain unsolved. Complex structural variants (SVs) often elude conventional genome sequencing variant calling pipelines and will explain a portion of these unsolved cases. METHODS: As part of the Utah NeoSeq project, we used a research-based, rapid whole-genome sequencing (WGS) protocol to investigate the genomic etiology for a newborn with a left-sided congenital diaphragmatic hernia (CDH) and cardiac malformations, whose mother also had a history of CDH and atrial septal defect. RESULTS: Using both a novel, alignment-free and traditional alignment-based variant callers, we identified a maternally inherited complex SV on chromosome 8, consisting of an inversion flanked by deletions. This complex inversion, further confirmed using orthogonal molecular techniques, disrupts the ZFPM2 gene, which is associated with both CDH and various congenital heart defects. CONCLUSIONS: Our results demonstrate that complex structural events, which often are unidentifiable or not reported by clinically validated testing procedures, can be discovered and accurately characterized with conventional, short-read sequencing and underscore the utility of WGS as a first-line diagnostic tool.
Assuntos
Hérnias Diafragmáticas Congênitas , Proteínas de Ligação a DNA/genética , Genômica , Hérnias Diafragmáticas Congênitas/genética , Humanos , Recém-Nascido , Fatores de Transcrição/genética , Sequenciamento Completo do Genoma/métodosRESUMO
BACKGROUND: Structural variation contributes to the rich genetic and phenotypic diversity of the modern domestic dog, Canis lupus familiaris, although compared to other organisms, catalogs of canine copy number variants (CNVs) are poorly defined. To this end, we developed a customized high-density tiling array across the canine genome and used it to discover CNVs in nine genetically diverse dogs and a gray wolf. RESULTS: In total, we identified 403 CNVs that overlap 401 genes, which are enriched for defense/immunity, oxidoreductase, protease, receptor, signaling molecule and transporter genes. Furthermore, we performed detailed comparisons between CNVs located within versus outside of segmental duplications (SDs) and find that CNVs in SDs are enriched for gene content and complexity. Finally, we compiled all known dog CNV regions and genotyped them with a custom aCGH chip in 61 dogs from 12 diverse breeds. These data allowed us to perform the first population genetics analysis of canine structural variation and identify CNVs that potentially contribute to breed specific traits. CONCLUSIONS: Our comprehensive analysis of canine CNVs will be an important resource in genetically dissecting canine phenotypic and behavioral variation.
Assuntos
Mapeamento Cromossômico/métodos , Variações do Número de Cópias de DNA/genética , Cães/genética , Genômica/métodos , Animais , Hibridização Genômica Comparativa , Cães/classificação , Genótipo , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo , Duplicações Segmentares Genômicas/genética , Especificidade da EspécieRESUMO
We describe a system that permits the automated analysis of reporter gene expression in Caenorhabditis elegans with cellular resolution continuously during embryogenesis. We demonstrate its utility by defining the expression patterns of reporters for several embryonically expressed transcription factors. The invariant cell lineage permits the automated alignment of multiple expression profiles, allowing direct comparison of the expression of different genes' reporters. We also used this system to monitor perturbations to normal development involving changes both in cell-division timing and in cell fate. Systematic application of this system could reveal the gene activity of each cell throughout development.
Assuntos
Proteínas de Caenorhabditis elegans/análise , Proteínas de Caenorhabditis elegans/genética , Caenorhabditis elegans/embriologia , Caenorhabditis elegans/genética , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica no Desenvolvimento/genética , Animais , Automação , Caenorhabditis elegans/citologia , Linhagem da Célula , Genes Reporter/genética , Especificidade de Órgãos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
BACKGROUND: DNA sequencing has unveiled extensive tumor heterogeneity in several different cancer types, with many exhibiting diverse subclonal populations. Identifying and tracing mutations throughout the expansion and progression of a tumor represents a significant challenge. Furthermore, prioritizing the subset of such mutations most likely to contribute to tumor evolution or that could serve as potential therapeutic targets represents an ongoing problem. RESULTS: Here, we describe OncoGEMINI, a new tool designed for exploring the complex patterns and trajectory of somatic and inherited variation observed in heterogeneous tumors biopsied over the course of treatment. This is accomplished by creating a searchable database of variants that includes tumor sampling time points and allows for filtering methods that reflect specific changes in variant allele frequencies over time. Additionally, by incorporating existing annotations and resources that facilitate the interpretation of cancer mutations (e.g., CIViC, DGIdb), OncoGEMINI enables rapid searches for, and potential identification of, mutations that may be driving subclonal evolution. CONCLUSIONS: By combining relevant genomic annotations alongside specific filtering tools, OncoGEMINI provides powerful and customizable approaches that enable the quick identification of individual tumor variants that meet specified criteria. It can be applied to a wide range of tumor-derived sequence data, but is especially designed for studies with multiple samples, including longitudinal datasets. It is available under an MIT license at github.com/fakedrtom/oncogemini .
Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Variação Genética , Software , Biópsia , Bases de Dados Genéticas , Feminino , Humanos , Estudos Longitudinais , Metástase NeoplásicaRESUMO
BACKGROUND: Why do males and females behave differently? Sexually dimorphic behaviors could arise from sex-specific neurons or by the modification of circuits present in both sexes. C. elegans males exhibit different behaviors than hermaphrodites. Although there is a single class of sex-specific sensory neurons in the head of males, most of their neurons are part of a core nervous system also present in hermaphrodites. Are the behavioral differences due to sex-specific or core neurons? RESULTS: We demonstrate that C. elegans males chemotax to a source of hermaphrodite pheromones. This sexual-attraction behavior depends on a TRPV (transient receptor potential vanilloid) channel encoded by the osm-9, ocr-1, and ocr-2 genes. OSM-9 is required in three classes of sensory neurons: the AWA and AWC olfactory neurons and the male-specific CEM neurons. The absence of OSM-9 from any of these neurons impairs attraction, suggesting that their ensemble output elicits sexual attraction. Likewise, the ablation of any of these classes after sexual maturation impairs attraction behavior. If ablations are performed before sexual maturation, attraction is unimpaired, demonstrating that these neurons compensate for one another. Thus, males lacking sex-specific neurons are still attracted to pheromones, suggesting that core neurons are sexualized. Similarly, transgender nematodes-animals that appear morphologically to be hermaphrodites but have a masculinized core nervous system-are attracted to hermaphrodite pheromones. CONCLUSIONS: Both sexually dimorphic and core sensory neurons are normally required in the adult for sexual attraction, but they can replace each other during sexual maturation if necessary to generate robust male-specific sexual attraction behavior.
Assuntos
Caenorhabditis elegans/fisiologia , Neurônios Aferentes/fisiologia , Animais , Proteínas de Caenorhabditis elegans/fisiologia , Transtornos do Desenvolvimento Sexual , Feminino , Masculino , Proteínas do Tecido Nervoso/fisiologia , Atrativos Sexuais/metabolismo , Canais de Cátion TRPV , Canais de Potencial de Receptor Transitório/fisiologiaRESUMO
SV-plaudit is a framework for rapidly curating structural variant (SV) predictions. For each SV, we generate an image that visualizes the coverage and alignment signals from a set of samples. Images are uploaded to our cloud framework where users assess the quality of each image using a client-side web application. Reports can then be generated as a tab-delimited file or annotated Variant Call Format (VCF) file. As a proof of principle, nine researchers collaborated for 1 hour to evaluate 1,350 SVs each. We anticipate that SV-plaudit will become a standard step in variant calling pipelines and the crowd-sourced curation of other biological results.Code available at https://github.com/jbelyeu/SV-plauditDemonstration video available at https://www.youtube.com/watch?v=ono8kHMKxDs.
Assuntos
Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Informática Médica/métodos , Alinhamento de Sequência , Análise de Sequência de DNA , Reações Falso-Positivas , Variação Genética , Genoma Humano , Humanos , Internet , SoftwareRESUMO
The contribution of genetic and environmental factors to the pathogenesis of sporadic amyotrophic lateral sclerosis (ALS) remains unclear. To investigate the genetic component of the disease, we performed whole genome sequencing on ALS discordant monozygotic twins. Illumina whole genome sequencing on white blood cell DNA of five ALS-discordant monozygotic twin pairs (10 samples in total) yielded â¼30x coverage per individual. All single nucleotide variants, indels, and structural variants (copy number variants, inversions and translocations) were called and evaluated for functional consequence, evolutionary conservation, population frequency and overlap with known ALS associated variants and genes. Results showed that no validated discordant coding or regulatory single nucleotide variants or indels were found, and nor were any genome-wide discordant structural variants detected. Concordant variants of particular interest were: 1) two rare, highly-conserved heterozygous non-synonymous variants in SYT9 and EWSR1, genes previously associated with ALS (out of 2044 rare heterozygous variants detected); 2) three rare homozygous missense variants; and 3) three novel copy number deletions that overlapped genes. In conclusion, no convincing coding or regulatory nucleotide or genome-wide structural differences were found between ALS discordant monozygotic twins. The results suggest that more work is needed to elucidate possible environmental, epigenetic, oligogenic and somatic genetic factors that could underlie susceptibility to sporadic ALS.
Assuntos
Esclerose Lateral Amiotrófica/genética , Variações do Número de Cópias de DNA/genética , Gêmeos Monozigóticos/genética , Idoso , Proteínas de Ligação a Calmodulina/genética , Feminino , Frequência do Gene , Estudos de Associação Genética , Predisposição Genética para Doença , Genoma , Heterozigoto , Humanos , Masculino , Pessoa de Meia-Idade , Proteína EWS de Ligação a RNA , Proteínas de Ligação a RNA/genética , Sinaptotagminas/genéticaRESUMO
Structural variation is an important and abundant source of genetic and phenotypic variation. Here we describe the first systematic and genome-wide analysis of segmental duplications and associated copy number variants (CNVs) in the modern domesticated dog, Canis familiaris, which exhibits considerable morphological, physiological, and behavioral variation. Through computational analyses of the publicly available canine reference sequence, we estimate that segmental duplications comprise approximately 4.21% of the canine genome. Segmental duplications overlap 841 genes and are significantly enriched for specific biological functions such as immunity and defense and KRAB box transcription factors. We designed high-density tiling arrays spanning all predicted segmental duplications and performed aCGH in a panel of 17 breeds and a gray wolf. In total, we identified 3583 CNVs, approximately 68% of which were found in two or more samples that map to 678 unique regions. CNVs span 429 genes that are involved in a wide variety of biological processes such as olfaction, immunity, and gene regulation. Our results provide insight into mechanisms of canine genome evolution and generate a valuable resource for future evolutionary and phenotypic studies.