RESUMO
Rare copy-number variation (CNV) is an important source of risk for autism spectrum disorders (ASDs). We analyzed 2,446 ASD-affected families and confirmed an excess of genic deletions and duplications in affected versus control groups (1.41-fold, p = 1.0 × 10(-5)) and an increase in affected subjects carrying exonic pathogenic CNVs overlapping known loci associated with dominant or X-linked ASD and intellectual disability (odds ratio = 12.62, p = 2.7 × 10(-15), â¼3% of ASD subjects). Pathogenic CNVs, often showing variable expressivity, included rare de novo and inherited events at 36 loci, implicating ASD-associated genes (CHD2, HDAC4, and GDI1) previously linked to other neurodevelopmental disorders, as well as other genes such as SETD5, MIR137, and HDAC9. Consistent with hypothesized gender-specific modulators, females with ASD were more likely to have highly penetrant CNVs (p = 0.017) and were also overrepresented among subjects with fragile X syndrome protein targets (p = 0.02). Genes affected by de novo CNVs and/or loss-of-function single-nucleotide variants converged on networks related to neuronal signaling and development, synapse function, and chromatin regulation.
Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Variações do Número de Cópias de DNA , Redes e Vias Metabólicas/genética , Criança , Feminino , Redes Reguladoras de Genes , Humanos , Masculino , Família Multigênica , Linhagem , Deleção de SequênciaRESUMO
The autism spectrum disorders (ASDs) are a group of conditions characterized by impairments in reciprocal social interaction and communication, and the presence of restricted and repetitive behaviours. Individuals with an ASD vary greatly in cognitive development, which can range from above average to intellectual disability. Although ASDs are known to be highly heritable ( approximately 90%), the underlying genetic determinants are still largely unknown. Here we analysed the genome-wide characteristics of rare (<1% frequency) copy number variation in ASD using dense genotyping arrays. When comparing 996 ASD individuals of European ancestry to 1,287 matched controls, cases were found to carry a higher global burden of rare, genic copy number variants (CNVs) (1.19 fold, P = 0.012), especially so for loci previously implicated in either ASD and/or intellectual disability (1.69 fold, P = 3.4 x 10(-4)). Among the CNVs there were numerous de novo and inherited events, sometimes in combination in a given family, implicating many novel ASD genes such as SHANK2, SYNGAP1, DLGAP2 and the X-linked DDX53-PTCHD1 locus. We also discovered an enrichment of CNVs disrupting functional gene sets involved in cellular proliferation, projection and motility, and GTPase/Ras signalling. Our results reveal many new genetic and functional targets in ASD that may lead to final connected pathways.
Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Transtornos Globais do Desenvolvimento Infantil/fisiopatologia , Variações do Número de Cópias de DNA/genética , Dosagem de Genes/genética , Predisposição Genética para Doença/genética , Estudos de Casos e Controles , Movimento Celular , Criança , Transtornos Globais do Desenvolvimento Infantil/patologia , Citoproteção , Europa (Continente)/etnologia , Estudo de Associação Genômica Ampla , Humanos , Transdução de Sinais , Comportamento SocialRESUMO
While it is apparent that rare variation can play an important role in the genetic architecture of autism spectrum disorders (ASDs), the contribution of common variation to the risk of developing ASD is less clear. To produce a more comprehensive picture, we report Stage 2 of the Autism Genome Project genome-wide association study, adding 1301 ASD families and bringing the total to 2705 families analysed (Stages 1 and 2). In addition to evaluating the association of individual single nucleotide polymorphisms (SNPs), we also sought evidence that common variants, en masse, might affect the risk. Despite genotyping over a million SNPs covering the genome, no single SNP shows significant association with ASD or selected phenotypes at a genome-wide level. The SNP that achieves the smallest P-value from secondary analyses is rs1718101. It falls in CNTNAP2, a gene previously implicated in susceptibility for ASD. This SNP also shows modest association with age of word/phrase acquisition in ASD subjects, of interest because features of language development are also associated with other variation in CNTNAP2. In contrast, allele scores derived from the transmission of common alleles to Stage 1 cases significantly predict case status in the independent Stage 2 sample. Despite being significant, the variance explained by these allele scores was small (Vm< 1%). Based on results from individual SNPs and their en masse effect on risk, as inferred from the allele score results, it is reasonable to conclude that common variants affect the risk for ASD but their individual effects are modest.
Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Proteínas de Membrana/genética , Proteínas do Tecido Nervoso/genética , Alelos , Criança , Transtornos Globais do Desenvolvimento Infantil/fisiopatologia , Feminino , Frequência do Gene , Genótipo , Humanos , Desenvolvimento da Linguagem , Masculino , Polimorfismo de Nucleotídeo Único , Fatores de RiscoRESUMO
INTRODUCTION: Autism is a common neurodevelopmental condition with a complex genetic aetiology that includes contributions from monogenic and polygenic factors. Many autistic people have unmet healthcare needs that could be served by genomics-informed research and clinical trials. The primary aim of the European Autism GEnomics Registry (EAGER) is to establish a registry of participants with a diagnosis of autism or an associated rare genetic condition who have undergone whole-genome sequencing. The registry can facilitate recruitment for future clinical trials and research studies, based on genetic, clinical and phenotypic profiles, as well as participant preferences. The secondary aim of EAGER is to investigate the association between mental and physical health characteristics and participants' genetic profiles. METHODS AND ANALYSIS: EAGER is a European multisite cohort study and registry and is part of the AIMS-2-TRIALS consortium. EAGER was developed with input from the AIMS-2-TRIALS Autism Representatives and representatives from the rare genetic conditions community. 1500 participants with a diagnosis of autism or an associated rare genetic condition will be recruited at 13 sites across 8 countries. Participants will be given a blood or saliva sample for whole-genome sequencing and answer a series of online questionnaires. Participants may also consent to the study to access pre-existing clinical data. Participants will be added to the EAGER registry and data will be shared externally through established AIMS-2-TRIALS mechanisms. ETHICS AND DISSEMINATION: To date, EAGER has received full ethical approval for 11 out of the 13 sites in the UK (REC 23/SC/0022), Germany (S-375/2023), Portugal (CE-085/2023), Spain (HCB/2023/0038, PIC-164-22), Sweden (Dnr 2023-06737-01), Ireland (230907) and Italy (CET_62/2023, CEL-IRCCS OASI/24-01-2024/EM01, EM 2024-13/1032 EAGER). Findings will be disseminated via scientific publications and conferences but also beyond to participants and the wider community (eg, the AIMS-2-TRIALS website, stakeholder meetings, newsletters).
Assuntos
Transtorno Autístico , Genômica , Sistema de Registros , Sequenciamento Completo do Genoma , Criança , Humanos , Masculino , Transtorno Autístico/genética , Estudos de Coortes , Europa (Continente) , Estudos Multicêntricos como Assunto , Projetos de PesquisaRESUMO
Although autism spectrum disorders (ASDs) have a substantial genetic basis, most of the known genetic risk has been traced to rare variants, principally copy number variants (CNVs). To identify common risk variation, the Autism Genome Project (AGP) Consortium genotyped 1558 rigorously defined ASD families for 1 million single-nucleotide polymorphisms (SNPs) and analyzed these SNP genotypes for association with ASD. In one of four primary association analyses, the association signal for marker rs4141463, located within MACROD2, crossed the genome-wide association significance threshold of P < 5 × 10(-8). When a smaller replication sample was analyzed, the risk allele at rs4141463 was again over-transmitted; yet, consistent with the winner's curse, its effect size in the replication sample was much smaller; and, for the combined samples, the association signal barely fell below the P < 5 × 10(-8) threshold. Exploratory analyses of phenotypic subtypes yielded no significant associations after correction for multiple testing. They did, however, yield strong signals within several genes, KIAA0564, PLD5, POU6F2, ST8SIA2 and TAF1C.
Assuntos
Transtorno Autístico/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Alelos , Variações do Número de Cópias de DNA , Bases de Dados Genéticas , Variação Genética , Genoma Humano , Genótipo , Humanos , Fatores de Risco , População Branca/genéticaRESUMO
Autism spectrum disorder (ASD) is a highly heritable disorder of complex and heterogeneous aetiology. It is primarily characterized by altered cognitive ability including impaired language and communication skills and fundamental deficits in social reciprocity. Despite some notable successes in neuropsychiatric genetics, overall, the high heritability of ASD (~90%) remains poorly explained by common genetic risk variants. However, recent studies suggest that rare genomic variation, in particular copy number variation, may account for a significant proportion of the genetic basis of ASD. We present a large scale analysis to identify candidate genes which may contain low-frequency recessive variation contributing to ASD while taking into account the potential contribution of population differences to the genetic heterogeneity of ASD. Our strategy, homozygous haplotype (HH) mapping, aims to detect homozygous segments of identical haplotype structure that are shared at a higher frequency amongst ASD patients compared to parental controls. The analysis was performed on 1,402 Autism Genome Project trios genotyped for 1 million single nucleotide polymorphisms (SNPs). We identified 25 known and 1,218 novel ASD candidate genes in the discovery analysis including CADM2, ABHD14A, CHRFAM7A, GRIK2, GRM3, EPHA3, FGF10, KCND2, PDZK1, IMMP2L and FOXP2. Furthermore, 10 of the previously reported ASD genes and 300 of the novel candidates identified in the discovery analysis were replicated in an independent sample of 1,182 trios. Our results demonstrate that regions of HH are significantly enriched for previously reported ASD candidate genes and the observed association is independent of gene size (odds ratio 2.10). Our findings highlight the applicability of HH mapping in complex disorders such as ASD and offer an alternative approach to the analysis of genome-wide association data.
Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Haplótipos/genética , Adulto , Criança , Análise por Conglomerados , Estudos de Coortes , Variações do Número de Cópias de DNA , Feminino , Genótipo , Homozigoto , Humanos , Desequilíbrio de Ligação , Masculino , Pessoa de Meia-Idade , Núcleo Familiar , Polimorfismo de Nucleotídeo ÚnicoRESUMO
OBJECTIVES: The main objective of the research is an application of the clustering and cluster validity methods to estimate the number of clusters in cancer tumor datasets. A weighed voting technique is going to be used to improve the prediction of the number of clusters based on different data mining techniques. These tools may be used for the identification of new tumour classes using DNA microarray datasets. This estimation approach may perform a useful tool to support biological and biomedical knowledge discovery. METHODS: Three clustering and two validation algorithms were applied to two cancer tumor datasets. Recent studies confirm that there is no universal pattern recognition and clustering model to predict molecular profiles across different datasets. Thus, it is useful not to rely on one single clustering or validation method, but to apply a variety of approaches. Therefore, combination of these methods may be successfully used for the estimation of the number of clusters. RESULTS: The methods implemented in this research may contribute to the validation of clustering results and the estimation of the number of clusters. The results show that this estimation approach may represent an effective tool to support biomedical knowledge discovery and healthcare applications. CONCLUSION: The methods implemented in this research may be successfully used for the estimation of the number of clusters. The methods implemented in this research may contribute to the validation of clustering results and the estimation of the number of clusters. These tools may be used for the identification of new tumour classes using gene expression profiles.
Assuntos
Análise de Sequência com Séries de Oligonucleotídeos , Neoplasias do Sistema Nervoso Central/genética , Análise por Conglomerados , Humanos , Leucemia/genética , Reino UnidoRESUMO
In a gene expression data matrix, a bicluster is a submatrix of genes and conditions that exhibits a high correlation of expression activity across both rows and columns. The problem of locating the most significant bicluster has been shown to be NP-complete. Heuristic approaches such as Cheng and Church's greedy node deletion algorithm have been previously employed. It is to be expected that stochastic search techniques such as evolutionary algorithms or simulated annealing might improve upon such greedy techniques. In this paper we show that an approach based on simulated annealing is well suited to this problem, and we present a comparative evaluation of simulated annealing and node deletion on a variety of datasets. We show that simulated annealing discovers more significant biclusters in many cases. Furthermore, we also test the ability of our technique to locate biologically verifiable biclusters within an annotated set of genes.
Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Expressão Gênica/fisiologia , Modelos Biológicos , Família Multigênica/fisiologia , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Animais , Inteligência Artificial , Análise por Conglomerados , Simulação por Computador , Humanos , Armazenamento e Recuperação da Informação/métodosRESUMO
cluML is a new markup language for microarray data clustering and cluster validity assessment. The XML-based format has been designed to address some of the limitations observed in traditional formats, such as inability to store multiple clustering (including biclustering) and validation results within a dataset. cluML is an effective tool to support biomedical knowledge representation in gene expression data analysis. Although cluML was developed for DNA microarray analysis applications, it can be effectively used for the representation of clustering and for the validation of other biomedical and physical data that has no limitations.
Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Linguagens de Programação , Algoritmos , Computadores , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Armazenamento e Recuperação da Informação , Reconhecimento Automatizado de Padrão , SoftwareRESUMO
BACKGROUND: There is an urgent need for expanding and enhancing autism spectrum disorder (ASD) samples, in order to better understand causes of ASD. METHODS: In a unique public-private partnership, 13 sites with extensive experience in both the assessment and diagnosis of ASD embarked on an ambitious, 2-year program to collect samples for genetic and phenotypic research and begin analyses on these samples. The program was called The Autism Simplex Collection (TASC). TASC sample collection began in 2008 and was completed in 2010, and included nine sites from North America and four sites from Western Europe, as well as a centralized Data Coordinating Center. RESULTS: Over 1,700 trios are part of this collection, with DNA from transformed cells now available through the National Institute of Mental Health (NIMH). Autism Diagnostic Interview-Revised (ADI-R) and Autism Diagnostic Observation Schedule-Generic (ADOS-G) measures are available for all probands, as are standardized IQ measures, Vineland Adaptive Behavioral Scales (VABS), the Social Responsiveness Scale (SRS), Peabody Picture Vocabulary Test (PPVT), and physical measures (height, weight, and head circumference). At almost every site, additional phenotypic measures were collected, including the Broad Autism Phenotype Questionnaire (BAPQ) and Repetitive Behavior Scale-Revised (RBS-R), as well as the non-word repetition scale, Communication Checklist (Children's or Adult), and Aberrant Behavior Checklist (ABC). Moreover, for nearly 1,000 trios, the Autism Genome Project Consortium (AGP) has carried out Illumina 1 M SNP genotyping and called copy number variation (CNV) in the samples, with data being made available through the National Institutes of Health (NIH). Whole exome sequencing (WES) has been carried out in over 500 probands, together with ancestry matched controls, and this data is also available through the NIH. Additional WES is being carried out by the Autism Sequencing Consortium (ASC), where the focus is on sequencing complete trios. ASC sequencing for the first 1,000 samples (all from whole-blood DNA) is complete and data will be released in 2014. Data is being made available through NIH databases (database of Genotypes and Phenotypes (dbGaP) and National Database for Autism Research (NDAR)) with DNA released in Dist 11.0. Primary funding for the collection, genotyping, sequencing and distribution of TASC samples was provided by Autism Speaks and the NIH, including the National Institute of Mental Health (NIMH) and the National Human Genetics Research Institute (NHGRI). CONCLUSIONS: TASC represents an important sample set that leverages expert sites. Similar approaches, leveraging expert sites and ongoing studies, represent an important path towards further enhancing available ASD samples.
RESUMO
UNLABELLED: This paper presents an approach to assessing cluster validity based on similarity knowledge extracted from the Gene Ontology. AVAILABILITY: The program is freely available for non-profit use on request from the authors.
Assuntos
Inteligência Artificial , Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Software , Interface Usuário-Computador , Algoritmos , Análise por Conglomerados , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMO
UNLABELLED: In this paper we present a data mining system, which allows the application of different clustering and cluster validity algorithms for DNA microarray data. This tool may improve the quality of the data analysis results, and may support the prediction of the number of relevant clusters in the microarray datasets. This systematic evaluation approach may significantly aid genome expression analyses for knowledge discovery applications. The developed software system may be effectively used for clustering and validating not only DNA microarray expression analysis applications but also other biomedical and physical data with no limitations. AVAILABILITY: The program is freely available for non-profit use on request at http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html CONTACT: Nadia.Bolshakova@cs.tcd.ie.
Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Software , Interface Usuário-Computador , Benchmarking/métodos , Análise por Conglomerados , Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação , Integração de SistemasRESUMO
UNLABELLED: This paper presents a cluster validation tool for gene expression data. Machaon CVE (Clustering and Validation Environment) system aims to partition samples or genes into groups characterized by similar expression patterns, and to evaluate the quality of the clusters obtained. AVAILABILITY: The program is freely available for non-profit use on request at http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html SUPPLEMENTARY INFORMATION: http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html