RESUMEN
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.
Asunto(s)
Genoma Humano , Secuenciación Completa del Genoma , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Mutación INDEL , Masculino , Polimorfismo de Nucleótido SimpleRESUMEN
Endometriosis is a common chronic inflammatory condition causing pelvic pain and infertility in women, with limited treatment options and 50% heritability. We leveraged genetic analyses in two species with spontaneous endometriosis, humans and the rhesus macaque, to uncover treatment targets. We sequenced DNA from 32 human families contributing to a genetic linkage signal on chromosome 7p13-15 and observed significant overrepresentation of predicted deleterious low-frequency coding variants in NPSR1, the gene encoding neuropeptide S receptor 1, in cases (predominantly stage III/IV) versus controls (P = 7.8 × 10-4). Significant linkage to the region orthologous to human 7p13-15 was replicated in a pedigree of 849 rhesus macaques (P = 0.0095). Targeted association analyses in 3194 surgically confirmed, unrelated cases and 7060 controls revealed that a common insertion/deletion variant, rs142885915, was significantly associated with stage III/IV endometriosis (P = 5.2 × 10-5; odds ratio, 1.23; 95% CI, 1.09 to 1.39). Immunohistochemistry, qRT-PCR, and flow cytometry experiments demonstrated that NPSR1 was expressed in glandular epithelium from eutopic and ectopic endometrium, and on monocytes in peritoneal fluid. The NPSR1 inhibitor SHA 68R blocked NPSR1-mediated signaling, proinflammatory TNF-α release, and monocyte chemotaxis in vitro (P < 0.01), and led to a significant reduction of inflammatory cell infiltrate and abdominal pain (P < 0.05) in a mouse model of peritoneal inflammation as well as in a mouse model of endometriosis. We conclude that the NPSR1/NPS system is a genetically validated, nonhormonal target for the treatment of endometriosis with likely increased relevance to stage III/IV disease.
Asunto(s)
Endometriosis , Receptores Acoplados a Proteínas G/genética , Animales , Endometriosis/tratamiento farmacológico , Endometriosis/genética , Endometrio , Femenino , Humanos , Macaca mulatta , Ratones , Factor de Necrosis Tumoral alfaRESUMEN
Neutrophils are short-lived blood cells that play a critical role in host defense against infections. To better comprehend neutrophil functions and their regulation, we provide a complete epigenetic overview, assessing important functional features of their differentiation stages from bone marrow-residing progenitors to mature circulating cells. Integration of chromatin modifications, methylation, and transcriptome dynamics reveals an enforced regulation of differentiation, for cellular functions such as release of proteases, respiratory burst, cell cycle regulation, and apoptosis. We observe an early establishment of the cytotoxic capability, while the signaling components that activate these antimicrobial mechanisms are transcribed at later stages, outside the bone marrow, thus preventing toxic effects in the bone marrow niche. Altogether, these data reveal how the developmental dynamics of the chromatin landscape orchestrate the daily production of a large number of neutrophils required for innate host defense and provide a comprehensive overview of differentiating human neutrophils.
Asunto(s)
Células de la Médula Ósea/citología , Células de la Médula Ósea/metabolismo , Neutrófilos/citología , Neutrófilos/metabolismo , Diferenciación Celular/genética , Diferenciación Celular/fisiología , Cromatina/genética , Cromatina/metabolismo , Regulación de la Expresión Génica/genética , Regulación de la Expresión Génica/fisiología , HumanosRESUMEN
Chronic lymphocytic leukemia (CLL) is a frequent hematological neoplasm in which underlying epigenetic alterations are only partially understood. Here, we analyze the reference epigenome of seven primary CLLs and the regulatory chromatin landscape of 107 primary cases in the context of normal B cell differentiation. We identify that the CLL chromatin landscape is largely influenced by distinct dynamics during normal B cell maturation. Beyond this, we define extensive catalogues of regulatory elements de novo reprogrammed in CLL as a whole and in its major clinico-biological subtypes classified by IGHV somatic hypermutation levels. We uncover that IGHV-unmutated CLLs harbor more active and open chromatin than IGHV-mutated cases. Furthermore, we show that de novo active regions in CLL are enriched for NFAT, FOX and TCF/LEF transcription factor family binding sites. Although most genetic alterations are not associated with consistent epigenetic profiles, CLLs with MYD88 mutations and trisomy 12 show distinct chromatin configurations. Furthermore, we observe that non-coding mutations in IGHV-mutated CLLs are enriched in H3K27ac-associated regulatory elements outside accessible chromatin. Overall, this study provides an integrative portrait of the CLL epigenome, identifies extensive networks of altered regulatory elements and sheds light on the relationship between the genetic and epigenetic architecture of the disease.
Asunto(s)
Cromatina/metabolismo , Epigenómica , Leucemia Linfocítica Crónica de Células B/genética , Linfocitos B/metabolismo , Secuencia de Bases , Estudios de Cohortes , HumanosRESUMEN
The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data-previously only browseable through our FTP site-by focusing on particular samples, populations or data sets of interest.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Variación Genética , Genoma , Genómica/métodos , Navegador WebRESUMEN
BACKGROUND: Genomic studies of endangered species provide insights into their evolution and demographic history, reveal patterns of genomic erosion that might limit their viability, and offer tools for their effective conservation. The Iberian lynx (Lynx pardinus) is the most endangered felid and a unique example of a species on the brink of extinction. RESULTS: We generate the first annotated draft of the Iberian lynx genome and carry out genome-based analyses of lynx demography, evolution, and population genetics. We identify a series of severe population bottlenecks in the history of the Iberian lynx that predate its known demographic decline during the 20th century and have greatly impacted its genome evolution. We observe drastically reduced rates of weak-to-strong substitutions associated with GC-biased gene conversion and increased rates of fixation of transposable elements. We also find multiple signatures of genetic erosion in the two remnant Iberian lynx populations, including a high frequency of potentially deleterious variants and substitutions, as well as the lowest genome-wide genetic diversity reported so far in any species. CONCLUSIONS: The genomic features observed in the Iberian lynx genome may hamper short- and long-term viability through reduced fitness and adaptive potential. The knowledge and resources developed in this study will boost the research on felid evolution and conservation genomics and will benefit the ongoing conservation and management of this emblematic species.
Asunto(s)
Genética de Población , Genoma , Lynx/genética , Animales , Especies en Peligro de Extinción , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Análisis de Secuencia de ADNRESUMEN
DNA methylation and the localization and post-translational modification of nucleosomes are interdependent factors that contribute to the generation of distinct phenotypes from genetically identical cells. With 112 whole-genome bisulfite sequencing datasets from the BLUEPRINT Epigenome Project, we analyzed the global development of DNA methylation patterns during lineage commitment and maturation of a range of immune system effector cells and the cancers that arise from them. We show clear trends in methylation patterns that are distinct in the innate and adaptive arms of the human immune system, both globally and in relation to consistently positioned nucleosomes. Most notable are a progressive loss of methylation in developing lymphocytes and the consistent occurrence of non-CG methylation in specific cell types. Cancer samples from the two lineages are further polarized, suggesting the involvement of distinct lineage-specific epigenetic mechanisms. We anticipate broad utility for this resource as a basis for further comparative epigenetic analyses.
Asunto(s)
Inmunidad Adaptativa/genética , Metilación de ADN/genética , Inmunidad Innata/genética , Linfocitos B/metabolismo , Secuencia de Bases , Sitios de Unión , Factor de Unión a CCCTC , Fosfatos de Dinucleósidos/genética , Exones/genética , Humanos , Linfocitos/metabolismo , Células Mieloides/metabolismo , NucleosomasRESUMEN
The incidence of type 1 diabetes (T1D) has substantially increased over the past decade, suggesting a role for non-genetic factors such as epigenetic mechanisms in disease development. Here we present an epigenome-wide association study across 406,365 CpGs in 52 monozygotic twin pairs discordant for T1D in three immune effector cell types. We observe a substantial enrichment of differentially variable CpG positions (DVPs) in T1D twins when compared with their healthy co-twins and when compared with healthy, unrelated individuals. These T1D-associated DVPs are found to be temporally stable and enriched at gene regulatory elements. Integration with cell type-specific gene regulatory circuits highlight pathways involved in immune cell metabolism and the cell cycle, including mTOR signalling. Evidence from cord blood of newborns who progress to overt T1D suggests that the DVPs likely emerge after birth. Our findings, based on 772 methylomes, implicate epigenetic changes that could contribute to disease pathogenesis in T1D.
Asunto(s)
Metilación de ADN/genética , Diabetes Mellitus Tipo 1/genética , Diabetes Mellitus Tipo 1/inmunología , Islas de CpG/genética , Sangre Fetal/metabolismo , Humanos , Anotación de Secuencia Molecular , Factores de Tiempo , Gemelos Monocigóticos/genéticaRESUMEN
Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14+ monocytes, CD16+ neutrophils, and naive CD4+ T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.
Asunto(s)
Epigenómica , Enfermedades del Sistema Inmune/genética , Monocitos/metabolismo , Neutrófilos/metabolismo , Linfocitos T/metabolismo , Transcripción Genética , Adulto , Anciano , Empalme Alternativo , Femenino , Predisposición Genética a la Enfermedad , Células Madre Hematopoyéticas/metabolismo , Código de Histonas , Humanos , Masculino , Persona de Mediana Edad , Sitios de Carácter Cuantitativo , Adulto JovenRESUMEN
BACKGROUND: Legumes are the third largest family of angiosperms and the second most important crop class. Legume genomes have been shaped by extensive large-scale gene duplications, including an approximately 58 million year old whole genome duplication shared by most crop legumes. RESULTS: We report the genome and the transcription atlas of coding and non-coding genes of a Mesoamerican genotype of common bean (Phaseolus vulgaris L., BAT93). Using a comprehensive phylogenomics analysis, we assessed the past and recent evolution of common bean, and traced the diversification of patterns of gene expression following duplication. We find that successive rounds of gene duplications in legumes have shaped tissue and developmental expression, leading to increased levels of specialization in larger gene families. We also find that many long non-coding RNAs are preferentially expressed in germ-line-related tissues (pods and seeds), suggesting that they play a significant role in fruit development. Our results also suggest that most bean-specific gene family expansions, including resistance gene clusters, predate the split of the Mesoamerican and Andean gene pools. CONCLUSIONS: The genome and transcriptome data herein generated for a Mesoamerican genotype represent a counterpart to the genomic resources already available for the Andean gene pool. Altogether, this information will allow the genetic dissection of the characters involved in the domestication and adaptation of the crop, and their further implementation in breeding strategies for this important crop.
Asunto(s)
Genoma de Planta , Repeticiones de Microsatélite/genética , Phaseolus/genética , Transcriptoma/genética , ADN de Plantas/genética , Duplicación de Gen , Perfilación de la Expresión Génica , Genotipo , Humanos , Filogenia , Semillas/genética , Análisis de Secuencia de ADNRESUMEN
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.
Asunto(s)
Bases de Datos Genéticas , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Invertebrados/genética , Animales , Diploidia , Eucariontes/genética , Variación Genética , Genoma , Poliploidía , Alineación de SecuenciaRESUMEN
Phenotypic plasticity is important in adaptation and shapes the evolution of organisms. However, we understand little about what aspects of the genome are important in facilitating plasticity. Eusocial insect societies produce plastic phenotypes from the same genome, as reproductives (queens) and nonreproductives (workers). The greatest plasticity is found in the simple eusocial insect societies in which individuals retain the ability to switch between reproductive and nonreproductive phenotypes as adults. We lack comprehensive data on the molecular basis of plastic phenotypes. Here, we sequenced genomes, microRNAs (miRNAs), and multiple transcriptomes and methylomes from individual brains in a wasp (Polistes canadensis) and an ant (Dinoponera quadriceps) that live in simple eusocial societies. In both species, we found few differences between phenotypes at the transcriptional level, with little functional specialization, and no evidence that phenotype-specific gene expression is driven by DNA methylation or miRNAs. Instead, phenotypic differentiation was defined more subtly by nonrandom transcriptional network organization, with roles in these networks for both conserved and taxon-restricted genes. The general lack of highly methylated regions or methylome patterning in both species may be an important mechanism for achieving plasticity among phenotypes during adulthood. These findings define previously unidentified hypotheses on the genomic processes that facilitate plasticity and suggest that the molecular hallmarks of social behavior are likely to differ with the level of social complexity.
Asunto(s)
Hormigas/genética , Regulación de la Expresión Génica/genética , Jerarquia Social , Modelos Genéticos , Fenotipo , Conducta Social , Avispas/genética , Animales , Hormigas/fisiología , Secuencia de Bases , Encéfalo/metabolismo , Metilación de ADN/genética , Genoma de los Insectos/genética , Secuenciación de Nucleótidos de Alto Rendimiento , MicroARNs/genética , Datos de Secuencia Molecular , Transcriptoma/genética , Avispas/fisiologíaRESUMEN
Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts.
Asunto(s)
Anopheles/genética , Evolución Molecular , Genoma de los Insectos , Insectos Vectores/genética , Malaria/transmisión , Animales , Anopheles/clasificación , Secuencia de Bases , Cromosomas de Insectos/genética , Drosophila/genética , Humanos , Insectos Vectores/clasificación , Datos de Secuencia Molecular , Filogenia , Alineación de SecuenciaRESUMEN
Schizosaccharomyces pombe displays a large transcriptional response common to several stress conditions, regulated primarily by the transcription factor Atf1. Atf1-dependent promoters contain especially broad nucleosome depleted regions (NDRs) prior to stress imposition. We show here that basal binding of Atf1 to these promoters competes with histones to create wider NDRs at stress genes. Moreover, deletion of atf1 results in nucleosome disorganization specifically at stress coding regions and derepresses antisense transcription. Our data indicate that the transcription factor binding to promoters acts as an effective barrier to fix the +1 nucleosome and phase downstream nucleosome arrays to prevent cryptic transcription.
Asunto(s)
Factor de Transcripción Activador 1/metabolismo , Nucleosomas/metabolismo , Fosfoproteínas/metabolismo , Regiones Promotoras Genéticas , Proteínas de Schizosaccharomyces pombe/metabolismo , Transcripción Genética , Factor de Transcripción Activador 1/química , Sitios de Unión , Genes Fúngicos , Fosfoproteínas/química , Estructura Terciaria de Proteína , Schizosaccharomyces/genética , Proteínas de Schizosaccharomyces pombe/químicaRESUMEN
The Plant Resistance Genes database (PRGdb; http://prgdb.org) is a comprehensive resource on resistance genes (R-genes), a major class of genes in plant genomes that convey disease resistance against pathogens. Initiated in 2009, the database has grown more than 6-fold to recently include annotation derived from recent plant genome sequencing projects. Release 2.0 currently hosts useful biological information on a set of 112 known and 104 310 putative R-genes present in 233 plant species and conferring resistance to 122 different pathogens. Moreover, the website has been completely redesigned with the implementation of Semantic MediaWiki technologies, which makes our repository freely accessed and easily edited by any scientists. To this purpose, we encourage plant biologist experts to join our annotation effort and share their knowledge on resistance-gene biology with the rest of the scientific community.
Asunto(s)
Bases de Datos Genéticas , Resistencia a la Enfermedad/genética , Genes de Plantas , Genoma de Planta , Internet , Modelos GenéticosRESUMEN
We report the genome sequence of melon, an important horticultural crop worldwide. We assembled 375 Mb of the double-haploid line DHL92, representing 83.3% of the estimated melon genome. We predicted 27,427 protein-coding genes, which we analyzed by reconstructing 22,218 phylogenetic trees, allowing mapping of the orthology and paralogy relationships of sequenced plant genomes. We observed the absence of recent whole-genome duplications in the melon lineage since the ancient eudicot triplication, and our data suggest that transposon amplification may in part explain the increased size of the melon genome compared with the close relative cucumber. A low number of nucleotide-binding site-leucine-rich repeat disease resistance genes were annotated, suggesting the existence of specific defense mechanisms in this species. The DHL92 genome was compared with that of its parental lines allowing the quantification of sequence variability in the species. The use of the genome sequence in future investigations will facilitate the understanding of evolution of cucurbits and the improvement of breeding strategies.
Asunto(s)
Evolución Biológica , Cucumis melo/genética , Genoma de Planta/genética , Filogenia , Secuencia de Bases , Mapeo Cromosómico , Cromosomas Artificiales Bacterianos/genética , Elementos Transponibles de ADN/genética , Resistencia a la Enfermedad/genética , Genes Duplicados/genética , Genes de Plantas/genética , Genómica/métodos , Funciones de Verosimilitud , Modelos Genéticos , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Alineación de Secuencia , Análisis de Secuencia de ADNRESUMEN
Forkhead-box protein P2 is a transcription factor that has been associated with intriguing aspects of cognitive function in humans, non-human mammals, and song-learning birds. Heterozygous mutations of the human FOXP2 gene cause a monogenic speech and language disorder. Reduced functional dosage of the mouse version (Foxp2) causes deficient cortico-striatal synaptic plasticity and impairs motor-skill learning. Moreover, the songbird orthologue appears critically important for vocal learning. Across diverse vertebrate species, this well-conserved transcription factor is highly expressed in the developing and adult central nervous system. Very little is known about the mechanisms regulated by Foxp2 during brain development. We used an integrated functional genomics strategy to robustly define Foxp2-dependent pathways, both direct and indirect targets, in the embryonic brain. Specifically, we performed genome-wide in vivo ChIP-chip screens for Foxp2-binding and thereby identified a set of 264 high-confidence neural targets under strict, empirically derived significance thresholds. The findings, coupled to expression profiling and in situ hybridization of brain tissue from wild-type and mutant mouse embryos, strongly highlighted gene networks linked to neurite development. We followed up our genomics data with functional experiments, showing that Foxp2 impacts on neurite outgrowth in primary neurons and in neuronal cell models. Our data indicate that Foxp2 modulates neuronal network formation, by directly and indirectly regulating mRNAs involved in the development and plasticity of neuronal connections.
Asunto(s)
Encéfalo/embriología , Factores de Transcripción Forkhead/genética , Redes Reguladoras de Genes , Neuritas/metabolismo , Proteínas Represoras/genética , Animales , Línea Celular Tumoral , Inmunoprecipitación de Cromatina , Cuerpo Estriado/crecimiento & desarrollo , Perfilación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica , Ratones , Ratones Endogámicos C57BL , Modelos Biológicos , Mutación , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Cultivo Primario de Células , ARN Mensajero/genética , ARN Mensajero/metabolismoRESUMEN
The integrated analysis of genotypic and expression data for association with complex traits could identify novel genetic pathways involved in complex traits. We profiled 19,573 expression probes in Epstein-Barr virus-transformed lymphoblastoid cell lines (LCLs) from 299 twins and correlated these with 44 quantitative traits (QTs). For 939 expressed probes correlating with more than one QT, we investigated the presence of eQTL associations in three datasets of 57 CEU HapMap founders and 86 unrelated twins. Genome-wide association analysis of these probes with 2.2 m SNPs revealed 131 potential eQTLs (1,989 eQTL SNPs) overlapping between the HapMap datasets, five of which were in cis (58 eQTL SNPs). We then tested 535 SNPs tagging the eQTL SNPs, for association with the relevant QT in 2,905 twins. We identified nine potential SNP-QT associations (P<0.01) but none significantly replicated in five large consortia of 1,097-16,129 subjects. We also failed to replicate previous reported eQTL associations with body mass index, plasma low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides levels derived from lymphocytes, adipose and liver tissue. Our results and additional power calculations suggest that proponents may have been overoptimistic in the power of LCLs in eQTL approaches to elucidate regulatory genetic effects on complex traits using the small datasets generated to date. Nevertheless, larger tissue-specific expression data sets relevant to specific traits are becoming available, and should enable the adoption of similar integrated analyses in the near future.
Asunto(s)
Redes Reguladoras de Genes/genética , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Linfocitos/metabolismo , Sitios de Carácter Cuantitativo/genética , Carácter Cuantitativo Heredable , Adulto , Anciano , Anciano de 80 o más Años , Línea Celular , Estudios de Cohortes , Bases de Datos Genéticas , Femenino , Regulación de la Expresión Génica , Haplotipos/genética , Humanos , Patrón de Herencia/genética , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple/genética , Análisis de Componente Principal , Reproducibilidad de los Resultados , Tamaño de la Muestra , Adulto JovenRESUMEN
UNLABELLED: TEQC is an R/Bioconductor package for quality assessment of target enrichment experiments. Quality measures comprise specificity and sensitivity of the capture, enrichment, per-target read coverage and its relation to hybridization probe characteristics, coverage uniformity and reproducibility, and read duplicate analysis. Several diagnostic plots allow visual inspection of the data quality. AVAILABILITY AND IMPLEMENTATION: TEQC is implemented in the R language (version >2.12.0) and is available as a Bioconductor package for Linux, Windows and MacOS from www.bioconductor.org.
Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Sondas de ADN , Hibridación de Ácido Nucleico , Reacción en Cadena de la Polimerasa , Control de Calidad , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
BACKGROUND: Autism spectrum disorders (ASDs) are characterized by social, communication, and behavioral deficits and complex genetic etiology. A recent study of 517 ASD families implicated DOCK4 by single nucleotide polymorphism (SNP) association and a microdeletion in an affected sibling pair. METHODS: The DOCK4 microdeletion on 7q31.1 was further characterized in this family using QuantiSNP analysis of 1M SNP array data and reverse transcription polymerase chain reaction. Extended family members were tested by polymerase chain reaction amplification of junction fragments. DOCK4 dosage was measured in additional samples using SNP arrays. Since QuantiSNP analysis identified a novel CNTNAP5 microdeletion in the same affected sibling pair, this gene was sequenced in 143 additional ASD families. Further polymerase chain reaction-restriction fragment length polymorphism analysis included 380 ASD cases and suitable control subjects. RESULTS: The maternally inherited microdeletion encompassed chr7:110,663,978-111,257,682 and led to a DOCK4-IMMP2L fusion transcript. It was also detected in five extended family members with no ASD. However, six of nine individuals with this microdeletion had poor reading ability, which prompted us to screen 606 other dyslexia cases. This led to the identification of a second DOCK4 microdeletion co-segregating with dyslexia. Assessment of genomic background in the original ASD family detected a paternal 2q14.3 microdeletion disrupting CNTNAP5 that was also transmitted to both affected siblings. Analysis of other ASD cohorts revealed four additional rare missense changes in CNTNAP5. No exonic deletions of DOCK4 or CNTNAP5 were seen in 2091 control subjects. CONCLUSIONS: This study highlights two new risk factors for ASD and dyslexia and demonstrates the importance of performing a high-resolution assessment of genomic background, even after detection of a rare and likely damaging microdeletion using a targeted approach.