RESUMO
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.
Assuntos
Genoma Humano , Sequenciamento Completo do Genoma , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação INDEL , Masculino , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Endometriosis is a common chronic inflammatory condition causing pelvic pain and infertility in women, with limited treatment options and 50% heritability. We leveraged genetic analyses in two species with spontaneous endometriosis, humans and the rhesus macaque, to uncover treatment targets. We sequenced DNA from 32 human families contributing to a genetic linkage signal on chromosome 7p13-15 and observed significant overrepresentation of predicted deleterious low-frequency coding variants in NPSR1, the gene encoding neuropeptide S receptor 1, in cases (predominantly stage III/IV) versus controls (P = 7.8 × 10-4). Significant linkage to the region orthologous to human 7p13-15 was replicated in a pedigree of 849 rhesus macaques (P = 0.0095). Targeted association analyses in 3194 surgically confirmed, unrelated cases and 7060 controls revealed that a common insertion/deletion variant, rs142885915, was significantly associated with stage III/IV endometriosis (P = 5.2 × 10-5; odds ratio, 1.23; 95% CI, 1.09 to 1.39). Immunohistochemistry, qRT-PCR, and flow cytometry experiments demonstrated that NPSR1 was expressed in glandular epithelium from eutopic and ectopic endometrium, and on monocytes in peritoneal fluid. The NPSR1 inhibitor SHA 68R blocked NPSR1-mediated signaling, proinflammatory TNF-α release, and monocyte chemotaxis in vitro (P < 0.01), and led to a significant reduction of inflammatory cell infiltrate and abdominal pain (P < 0.05) in a mouse model of peritoneal inflammation as well as in a mouse model of endometriosis. We conclude that the NPSR1/NPS system is a genetically validated, nonhormonal target for the treatment of endometriosis with likely increased relevance to stage III/IV disease.
Assuntos
Endometriose , Receptores Acoplados a Proteínas G/genética , Animais , Endometriose/tratamento farmacológico , Endometriose/genética , Endométrio , Feminino , Humanos , Macaca mulatta , Camundongos , Fator de Necrose Tumoral alfaRESUMO
Neutrophils are short-lived blood cells that play a critical role in host defense against infections. To better comprehend neutrophil functions and their regulation, we provide a complete epigenetic overview, assessing important functional features of their differentiation stages from bone marrow-residing progenitors to mature circulating cells. Integration of chromatin modifications, methylation, and transcriptome dynamics reveals an enforced regulation of differentiation, for cellular functions such as release of proteases, respiratory burst, cell cycle regulation, and apoptosis. We observe an early establishment of the cytotoxic capability, while the signaling components that activate these antimicrobial mechanisms are transcribed at later stages, outside the bone marrow, thus preventing toxic effects in the bone marrow niche. Altogether, these data reveal how the developmental dynamics of the chromatin landscape orchestrate the daily production of a large number of neutrophils required for innate host defense and provide a comprehensive overview of differentiating human neutrophils.
Assuntos
Células da Medula Óssea/citologia , Células da Medula Óssea/metabolismo , Neutrófilos/citologia , Neutrófilos/metabolismo , Diferenciação Celular/genética , Diferenciação Celular/fisiologia , Cromatina/genética , Cromatina/metabolismo , Regulação da Expressão Gênica/genética , Regulação da Expressão Gênica/fisiologia , HumanosRESUMO
Chronic lymphocytic leukemia (CLL) is a frequent hematological neoplasm in which underlying epigenetic alterations are only partially understood. Here, we analyze the reference epigenome of seven primary CLLs and the regulatory chromatin landscape of 107 primary cases in the context of normal B cell differentiation. We identify that the CLL chromatin landscape is largely influenced by distinct dynamics during normal B cell maturation. Beyond this, we define extensive catalogues of regulatory elements de novo reprogrammed in CLL as a whole and in its major clinico-biological subtypes classified by IGHV somatic hypermutation levels. We uncover that IGHV-unmutated CLLs harbor more active and open chromatin than IGHV-mutated cases. Furthermore, we show that de novo active regions in CLL are enriched for NFAT, FOX and TCF/LEF transcription factor family binding sites. Although most genetic alterations are not associated with consistent epigenetic profiles, CLLs with MYD88 mutations and trisomy 12 show distinct chromatin configurations. Furthermore, we observe that non-coding mutations in IGHV-mutated CLLs are enriched in H3K27ac-associated regulatory elements outside accessible chromatin. Overall, this study provides an integrative portrait of the CLL epigenome, identifies extensive networks of altered regulatory elements and sheds light on the relationship between the genetic and epigenetic architecture of the disease.
Assuntos
Cromatina/metabolismo , Epigenômica , Leucemia Linfocítica Crônica de Células B/genética , Linfócitos B/metabolismo , Sequência de Bases , Estudos de Coortes , HumanosRESUMO
The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data-previously only browseable through our FTP site-by focusing on particular samples, populations or data sets of interest.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Variação Genética , Genoma , Genômica/métodos , NavegadorRESUMO
BACKGROUND: Genomic studies of endangered species provide insights into their evolution and demographic history, reveal patterns of genomic erosion that might limit their viability, and offer tools for their effective conservation. The Iberian lynx (Lynx pardinus) is the most endangered felid and a unique example of a species on the brink of extinction. RESULTS: We generate the first annotated draft of the Iberian lynx genome and carry out genome-based analyses of lynx demography, evolution, and population genetics. We identify a series of severe population bottlenecks in the history of the Iberian lynx that predate its known demographic decline during the 20th century and have greatly impacted its genome evolution. We observe drastically reduced rates of weak-to-strong substitutions associated with GC-biased gene conversion and increased rates of fixation of transposable elements. We also find multiple signatures of genetic erosion in the two remnant Iberian lynx populations, including a high frequency of potentially deleterious variants and substitutions, as well as the lowest genome-wide genetic diversity reported so far in any species. CONCLUSIONS: The genomic features observed in the Iberian lynx genome may hamper short- and long-term viability through reduced fitness and adaptive potential. The knowledge and resources developed in this study will boost the research on felid evolution and conservation genomics and will benefit the ongoing conservation and management of this emblematic species.
Assuntos
Genética Populacional , Genoma , Lynx/genética , Animais , Espécies em Perigo de Extinção , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Análise de Sequência de DNARESUMO
DNA methylation and the localization and post-translational modification of nucleosomes are interdependent factors that contribute to the generation of distinct phenotypes from genetically identical cells. With 112 whole-genome bisulfite sequencing datasets from the BLUEPRINT Epigenome Project, we analyzed the global development of DNA methylation patterns during lineage commitment and maturation of a range of immune system effector cells and the cancers that arise from them. We show clear trends in methylation patterns that are distinct in the innate and adaptive arms of the human immune system, both globally and in relation to consistently positioned nucleosomes. Most notable are a progressive loss of methylation in developing lymphocytes and the consistent occurrence of non-CG methylation in specific cell types. Cancer samples from the two lineages are further polarized, suggesting the involvement of distinct lineage-specific epigenetic mechanisms. We anticipate broad utility for this resource as a basis for further comparative epigenetic analyses.
Assuntos
Imunidade Adaptativa/genética , Metilação de DNA/genética , Imunidade Inata/genética , Linfócitos B/metabolismo , Sequência de Bases , Sítios de Ligação , Fator de Ligação a CCCTC , Fosfatos de Dinucleosídeos/genética , Éxons/genética , Humanos , Linfócitos/metabolismo , Células Mieloides/metabolismo , NucleossomosRESUMO
The incidence of type 1 diabetes (T1D) has substantially increased over the past decade, suggesting a role for non-genetic factors such as epigenetic mechanisms in disease development. Here we present an epigenome-wide association study across 406,365 CpGs in 52 monozygotic twin pairs discordant for T1D in three immune effector cell types. We observe a substantial enrichment of differentially variable CpG positions (DVPs) in T1D twins when compared with their healthy co-twins and when compared with healthy, unrelated individuals. These T1D-associated DVPs are found to be temporally stable and enriched at gene regulatory elements. Integration with cell type-specific gene regulatory circuits highlight pathways involved in immune cell metabolism and the cell cycle, including mTOR signalling. Evidence from cord blood of newborns who progress to overt T1D suggests that the DVPs likely emerge after birth. Our findings, based on 772 methylomes, implicate epigenetic changes that could contribute to disease pathogenesis in T1D.
Assuntos
Metilação de DNA/genética , Diabetes Mellitus Tipo 1/genética , Diabetes Mellitus Tipo 1/imunologia , Ilhas de CpG/genética , Sangue Fetal/metabolismo , Humanos , Anotação de Sequência Molecular , Fatores de Tempo , Gêmeos Monozigóticos/genéticaRESUMO
Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14+ monocytes, CD16+ neutrophils, and naive CD4+ T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.
Assuntos
Epigenômica , Doenças do Sistema Imunitário/genética , Monócitos/metabolismo , Neutrófilos/metabolismo , Linfócitos T/metabolismo , Transcrição Gênica , Adulto , Idoso , Processamento Alternativo , Feminino , Predisposição Genética para Doença , Células-Tronco Hematopoéticas/metabolismo , Código das Histonas , Humanos , Masculino , Pessoa de Meia-Idade , Locos de Características Quantitativas , Adulto JovemRESUMO
BACKGROUND: Legumes are the third largest family of angiosperms and the second most important crop class. Legume genomes have been shaped by extensive large-scale gene duplications, including an approximately 58 million year old whole genome duplication shared by most crop legumes. RESULTS: We report the genome and the transcription atlas of coding and non-coding genes of a Mesoamerican genotype of common bean (Phaseolus vulgaris L., BAT93). Using a comprehensive phylogenomics analysis, we assessed the past and recent evolution of common bean, and traced the diversification of patterns of gene expression following duplication. We find that successive rounds of gene duplications in legumes have shaped tissue and developmental expression, leading to increased levels of specialization in larger gene families. We also find that many long non-coding RNAs are preferentially expressed in germ-line-related tissues (pods and seeds), suggesting that they play a significant role in fruit development. Our results also suggest that most bean-specific gene family expansions, including resistance gene clusters, predate the split of the Mesoamerican and Andean gene pools. CONCLUSIONS: The genome and transcriptome data herein generated for a Mesoamerican genotype represent a counterpart to the genomic resources already available for the Andean gene pool. Altogether, this information will allow the genetic dissection of the characters involved in the domestication and adaptation of the crop, and their further implementation in breeding strategies for this important crop.
Assuntos
Genoma de Planta , Repetições de Microssatélites/genética , Phaseolus/genética , Transcriptoma/genética , DNA de Plantas/genética , Duplicação Gênica , Perfilação da Expressão Gênica , Genótipo , Humanos , Filogenia , Sementes/genética , Análise de Sequência de DNARESUMO
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.
Assuntos
Bases de Dados Genéticas , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Invertebrados/genética , Animais , Diploide , Eucariotos/genética , Variação Genética , Genoma , Poliploidia , Alinhamento de SequênciaRESUMO
Phenotypic plasticity is important in adaptation and shapes the evolution of organisms. However, we understand little about what aspects of the genome are important in facilitating plasticity. Eusocial insect societies produce plastic phenotypes from the same genome, as reproductives (queens) and nonreproductives (workers). The greatest plasticity is found in the simple eusocial insect societies in which individuals retain the ability to switch between reproductive and nonreproductive phenotypes as adults. We lack comprehensive data on the molecular basis of plastic phenotypes. Here, we sequenced genomes, microRNAs (miRNAs), and multiple transcriptomes and methylomes from individual brains in a wasp (Polistes canadensis) and an ant (Dinoponera quadriceps) that live in simple eusocial societies. In both species, we found few differences between phenotypes at the transcriptional level, with little functional specialization, and no evidence that phenotype-specific gene expression is driven by DNA methylation or miRNAs. Instead, phenotypic differentiation was defined more subtly by nonrandom transcriptional network organization, with roles in these networks for both conserved and taxon-restricted genes. The general lack of highly methylated regions or methylome patterning in both species may be an important mechanism for achieving plasticity among phenotypes during adulthood. These findings define previously unidentified hypotheses on the genomic processes that facilitate plasticity and suggest that the molecular hallmarks of social behavior are likely to differ with the level of social complexity.
Assuntos
Formigas/genética , Regulação da Expressão Gênica/genética , Hierarquia Social , Modelos Genéticos , Fenótipo , Comportamento Social , Vespas/genética , Animais , Formigas/fisiologia , Sequência de Bases , Encéfalo/metabolismo , Metilação de DNA/genética , Genoma de Inseto/genética , Sequenciamento de Nucleotídeos em Larga Escala , MicroRNAs/genética , Dados de Sequência Molecular , Transcriptoma/genética , Vespas/fisiologiaRESUMO
Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts.
Assuntos
Anopheles/genética , Evolução Molecular , Genoma de Inseto , Insetos Vetores/genética , Malária/transmissão , Animais , Anopheles/classificação , Sequência de Bases , Cromossomos de Insetos/genética , Drosophila/genética , Humanos , Insetos Vetores/classificação , Dados de Sequência Molecular , Filogenia , Alinhamento de SequênciaRESUMO
Schizosaccharomyces pombe displays a large transcriptional response common to several stress conditions, regulated primarily by the transcription factor Atf1. Atf1-dependent promoters contain especially broad nucleosome depleted regions (NDRs) prior to stress imposition. We show here that basal binding of Atf1 to these promoters competes with histones to create wider NDRs at stress genes. Moreover, deletion of atf1 results in nucleosome disorganization specifically at stress coding regions and derepresses antisense transcription. Our data indicate that the transcription factor binding to promoters acts as an effective barrier to fix the +1 nucleosome and phase downstream nucleosome arrays to prevent cryptic transcription.
Assuntos
Fator 1 Ativador da Transcrição/metabolismo , Nucleossomos/metabolismo , Fosfoproteínas/metabolismo , Regiões Promotoras Genéticas , Proteínas de Schizosaccharomyces pombe/metabolismo , Transcrição Gênica , Fator 1 Ativador da Transcrição/química , Sítios de Ligação , Genes Fúngicos , Fosfoproteínas/química , Estrutura Terciária de Proteína , Schizosaccharomyces/genética , Proteínas de Schizosaccharomyces pombe/químicaRESUMO
The Plant Resistance Genes database (PRGdb; http://prgdb.org) is a comprehensive resource on resistance genes (R-genes), a major class of genes in plant genomes that convey disease resistance against pathogens. Initiated in 2009, the database has grown more than 6-fold to recently include annotation derived from recent plant genome sequencing projects. Release 2.0 currently hosts useful biological information on a set of 112 known and 104 310 putative R-genes present in 233 plant species and conferring resistance to 122 different pathogens. Moreover, the website has been completely redesigned with the implementation of Semantic MediaWiki technologies, which makes our repository freely accessed and easily edited by any scientists. To this purpose, we encourage plant biologist experts to join our annotation effort and share their knowledge on resistance-gene biology with the rest of the scientific community.
Assuntos
Bases de Dados Genéticas , Resistência à Doença/genética , Genes de Plantas , Genoma de Planta , Internet , Modelos GenéticosRESUMO
We report the genome sequence of melon, an important horticultural crop worldwide. We assembled 375 Mb of the double-haploid line DHL92, representing 83.3% of the estimated melon genome. We predicted 27,427 protein-coding genes, which we analyzed by reconstructing 22,218 phylogenetic trees, allowing mapping of the orthology and paralogy relationships of sequenced plant genomes. We observed the absence of recent whole-genome duplications in the melon lineage since the ancient eudicot triplication, and our data suggest that transposon amplification may in part explain the increased size of the melon genome compared with the close relative cucumber. A low number of nucleotide-binding site-leucine-rich repeat disease resistance genes were annotated, suggesting the existence of specific defense mechanisms in this species. The DHL92 genome was compared with that of its parental lines allowing the quantification of sequence variability in the species. The use of the genome sequence in future investigations will facilitate the understanding of evolution of cucurbits and the improvement of breeding strategies.
Assuntos
Evolução Biológica , Cucumis melo/genética , Genoma de Planta/genética , Filogenia , Sequência de Bases , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos/genética , Elementos de DNA Transponíveis/genética , Resistência à Doença/genética , Genes Duplicados/genética , Genes de Plantas/genética , Genômica/métodos , Funções Verossimilhança , Modelos Genéticos , Anotação de Sequência Molecular , Dados de Sequência Molecular , Alinhamento de Sequência , Análise de Sequência de DNARESUMO
Forkhead-box protein P2 is a transcription factor that has been associated with intriguing aspects of cognitive function in humans, non-human mammals, and song-learning birds. Heterozygous mutations of the human FOXP2 gene cause a monogenic speech and language disorder. Reduced functional dosage of the mouse version (Foxp2) causes deficient cortico-striatal synaptic plasticity and impairs motor-skill learning. Moreover, the songbird orthologue appears critically important for vocal learning. Across diverse vertebrate species, this well-conserved transcription factor is highly expressed in the developing and adult central nervous system. Very little is known about the mechanisms regulated by Foxp2 during brain development. We used an integrated functional genomics strategy to robustly define Foxp2-dependent pathways, both direct and indirect targets, in the embryonic brain. Specifically, we performed genome-wide in vivo ChIP-chip screens for Foxp2-binding and thereby identified a set of 264 high-confidence neural targets under strict, empirically derived significance thresholds. The findings, coupled to expression profiling and in situ hybridization of brain tissue from wild-type and mutant mouse embryos, strongly highlighted gene networks linked to neurite development. We followed up our genomics data with functional experiments, showing that Foxp2 impacts on neurite outgrowth in primary neurons and in neuronal cell models. Our data indicate that Foxp2 modulates neuronal network formation, by directly and indirectly regulating mRNAs involved in the development and plasticity of neuronal connections.
Assuntos
Encéfalo/embriologia , Fatores de Transcrição Forkhead/genética , Redes Reguladoras de Genes , Neuritos/metabolismo , Proteínas Repressoras/genética , Animais , Linhagem Celular Tumoral , Imunoprecipitação da Cromatina , Corpo Estriado/crescimento & desenvolvimento , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Camundongos , Camundongos Endogâmicos C57BL , Modelos Biológicos , Mutação , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Cultura Primária de Células , RNA Mensageiro/genética , RNA Mensageiro/metabolismoRESUMO
The integrated analysis of genotypic and expression data for association with complex traits could identify novel genetic pathways involved in complex traits. We profiled 19,573 expression probes in Epstein-Barr virus-transformed lymphoblastoid cell lines (LCLs) from 299 twins and correlated these with 44 quantitative traits (QTs). For 939 expressed probes correlating with more than one QT, we investigated the presence of eQTL associations in three datasets of 57 CEU HapMap founders and 86 unrelated twins. Genome-wide association analysis of these probes with 2.2 m SNPs revealed 131 potential eQTLs (1,989 eQTL SNPs) overlapping between the HapMap datasets, five of which were in cis (58 eQTL SNPs). We then tested 535 SNPs tagging the eQTL SNPs, for association with the relevant QT in 2,905 twins. We identified nine potential SNP-QT associations (P<0.01) but none significantly replicated in five large consortia of 1,097-16,129 subjects. We also failed to replicate previous reported eQTL associations with body mass index, plasma low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides levels derived from lymphocytes, adipose and liver tissue. Our results and additional power calculations suggest that proponents may have been overoptimistic in the power of LCLs in eQTL approaches to elucidate regulatory genetic effects on complex traits using the small datasets generated to date. Nevertheless, larger tissue-specific expression data sets relevant to specific traits are becoming available, and should enable the adoption of similar integrated analyses in the near future.
Assuntos
Redes Reguladoras de Genes/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Linfócitos/metabolismo , Locos de Características Quantitativas/genética , Característica Quantitativa Herdável , Adulto , Idoso , Idoso de 80 Anos ou mais , Linhagem Celular , Estudos de Coortes , Bases de Dados Genéticas , Feminino , Regulação da Expressão Gênica , Haplótipos/genética , Humanos , Padrões de Herança/genética , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único/genética , Análise de Componente Principal , Reprodutibilidade dos Testes , Tamanho da Amostra , Adulto JovemRESUMO
UNLABELLED: TEQC is an R/Bioconductor package for quality assessment of target enrichment experiments. Quality measures comprise specificity and sensitivity of the capture, enrichment, per-target read coverage and its relation to hybridization probe characteristics, coverage uniformity and reproducibility, and read duplicate analysis. Several diagnostic plots allow visual inspection of the data quality. AVAILABILITY AND IMPLEMENTATION: TEQC is implemented in the R language (version >2.12.0) and is available as a Bioconductor package for Linux, Windows and MacOS from www.bioconductor.org.
Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Software , Sondas de DNA , Hibridização de Ácido Nucleico , Reação em Cadeia da Polimerase , Controle de Qualidade , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
BACKGROUND: Autism spectrum disorders (ASDs) are characterized by social, communication, and behavioral deficits and complex genetic etiology. A recent study of 517 ASD families implicated DOCK4 by single nucleotide polymorphism (SNP) association and a microdeletion in an affected sibling pair. METHODS: The DOCK4 microdeletion on 7q31.1 was further characterized in this family using QuantiSNP analysis of 1M SNP array data and reverse transcription polymerase chain reaction. Extended family members were tested by polymerase chain reaction amplification of junction fragments. DOCK4 dosage was measured in additional samples using SNP arrays. Since QuantiSNP analysis identified a novel CNTNAP5 microdeletion in the same affected sibling pair, this gene was sequenced in 143 additional ASD families. Further polymerase chain reaction-restriction fragment length polymorphism analysis included 380 ASD cases and suitable control subjects. RESULTS: The maternally inherited microdeletion encompassed chr7:110,663,978-111,257,682 and led to a DOCK4-IMMP2L fusion transcript. It was also detected in five extended family members with no ASD. However, six of nine individuals with this microdeletion had poor reading ability, which prompted us to screen 606 other dyslexia cases. This led to the identification of a second DOCK4 microdeletion co-segregating with dyslexia. Assessment of genomic background in the original ASD family detected a paternal 2q14.3 microdeletion disrupting CNTNAP5 that was also transmitted to both affected siblings. Analysis of other ASD cohorts revealed four additional rare missense changes in CNTNAP5. No exonic deletions of DOCK4 or CNTNAP5 were seen in 2091 control subjects. CONCLUSIONS: This study highlights two new risk factors for ASD and dyslexia and demonstrates the importance of performing a high-resolution assessment of genomic background, even after detection of a rare and likely damaging microdeletion using a targeted approach.