Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder.
Pac Symp Biocomput
; 24: 260-271, 2019.
Article
en En
| MEDLINE
| ID: mdl-30864328
Autism spectrum disorder (ASD) is a heritable neurodevelopmental disorder affecting 1 in 59 children. While noncoding genetic variation has been shown to play a major role in many complex disorders, the contribution of these regions to ASD susceptibility remains unclear. Genetic analyses of ASD typically use unaffected family members as controls; however, we hypothesize that this method does not effectively elevate variant signal in the noncoding region due to family members having subclinical phenotypes arising from common genetic mechanisms. In this study, we use a separate, unrelated outgroup of individuals with progressive supranuclear palsy (PSP), a neurodegenerative condition with no known etiological overlap with ASD, as a control population. We use whole genome sequencing data from a large cohort of 2182 children with ASD and 379 controls with PSP, sequenced at the same facility with the same machines and variant calling pipeline, in order to investigate the role of noncoding variation in the ASD phenotype. We analyze seven major types of noncoding variants: microRNAs, human accelerated regions, hypersensitive sites, transcription factor binding sites, DNA repeat sequences, simple repeat sequences, and CpG islands. After identifying and removing batch effects between the two groups, we trained an â1-regularized logistic regression classifier to predict ASD status from each set of variants. The classifier trained on simple repeat sequences performed well on a held-out test set (AUC-ROC = 0.960); this classifier was also able to differentiate ASD cases from controls when applied to a completely independent dataset (AUC-ROC = 0.960). This suggests that variation in simple repeat regions is predictive of the ASD phenotype and may contribute to ASD risk. Our results show the importance of the noncoding region and the utility of independent control groups in effectively linking genetic variation to disease phenotype for complex disorders.
Texto completo:
1
Bases de datos:
MEDLINE
Asunto principal:
Variación Genética
/
ADN
/
Trastorno del Espectro Autista
/
Aprendizaje Automático
Tipo de estudio:
Etiology_studies
/
Incidence_studies
/
Observational_studies
/
Prognostic_studies
/
Risk_factors_studies
Límite:
Child
/
Female
/
Humans
/
Male
Idioma:
En
Revista:
Pac Symp Biocomput
Asunto de la revista:
BIOTECNOLOGIA
/
INFORMATICA MEDICA
Año:
2019
Tipo del documento:
Article
País de afiliación:
Estados Unidos