Búsqueda | Portal Regional de la BVS

Identifying crossovers and shared genetic material in whole genome sequencing data from families.

Paskov, Kelley; Chrisman, Brianna; Stockham, Nathaniel; Washington, Peter Yigitcan; Dunlap, Kaitlyn; Jung, Jae-Yoon; Wall, Dennis P.

Genome Res ; 33(10): 1747-1756, 2023 10.

Artículo en Inglés | MEDLINE | ID: mdl-37879861

RESUMEN

Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special inheritance status of the X Chromosome presents challenges. We have developed a hidden Markov model that addresses these issues by modeling the inheritance of variants in families in the presence of error-prone regions and inherited deletions. We call our method PhasingFamilies. We validate PhasingFamilies using the platinum genome family NA1281 (precision: 0.81; recall: 0.97), as well as simulated genomes with known crossover positions (precision: 0.93; recall: 0.92). Using 1925 quads from the Simons Simplex Collection, we found that PhasingFamilies resolves crossovers to a median resolution of 3527.5 bp. These crossovers recapitulate existing recombination rate maps, including for the X Chromosome; produce sibling pair IBD that matches expected distributions; and are validated by the haplotype estimation tool SHAPEIT. We provide an efficient, open-source implementation of PhasingFamilies that can be used to identify crossovers from family sequencing data.

Asunto(s)

Genoma , Patrón de Herencia , Humanos , Secuenciación Completa del Genoma , Haplotipos

A Method for Localizing Non-Reference Sequences to the Human Genome.

Chrisman, Brianna Sierra; Paskov, Kelley M; He, Chloe; Jung, Jae-Yoon; Stockham, Nate; Washington, Peter Yigitcan; Wall, Dennis Paul.

Pac Symp Biocomput ; 27: 313-324, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-34890159

RESUMEN

As the last decade of human genomics research begins to bear the fruit of advancements in precision medicine, it is important to ensure that genomics' improvements in human health are distributed globally and equitably. An important step to ensuring health equity is to improve the human reference genome to capture global diversity by including a wide variety of alternative haplotypes, sequences that are not currently captured on the reference genome.We present a method that localizes 100 basepair (bp) long sequences extracted from short-read sequencing that can ultimately be used to identify what regions of the human genome non-reference sequences belong to.We extract reads that don't align to the reference genome, and compute the population's distribution of 100-mers found within the unmapped reads. We use genetic data from families to identify shared genetic material between siblings and match the distribution of unmapped k-mers to these inheritance patterns to determine the the most likely genomic region of a k-mer. We perform this localization with two highly interpretable methods of artificial intelligence: a computationally tractable Hidden Markov Model coupled to a Maximum Likelihood Estimator. Using a set of alternative haplotypes with known locations on the genome, we show that our algorithm is able to localize 96% of k-mers with over 90% accuracy and less than 1Mb median resolution. As the collection of sequenced human genomes grows larger and more diverse, we hope that this method can be used to improve the human reference genome, a critical step in addressing precision medicine's diversity crisis.

Asunto(s)

Inteligencia Artificial , Genoma Humano , Biología Computacional , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN

Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder.

Varma, Maya; Paskov, Kelley Marie; Jung, Jae-Yoon; Sierra Chrisman, Brianna; Stockham, Nate Tyler; Washington, Peter Yigitcan; Wall, Dennis Paul.

Pac Symp Biocomput ; 24: 260-271, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-30864328

RESUMEN

Autism spectrum disorder (ASD) is a heritable neurodevelopmental disorder affecting 1 in 59 children. While noncoding genetic variation has been shown to play a major role in many complex disorders, the contribution of these regions to ASD susceptibility remains unclear. Genetic analyses of ASD typically use unaffected family members as controls; however, we hypothesize that this method does not effectively elevate variant signal in the noncoding region due to family members having subclinical phenotypes arising from common genetic mechanisms. In this study, we use a separate, unrelated outgroup of individuals with progressive supranuclear palsy (PSP), a neurodegenerative condition with no known etiological overlap with ASD, as a control population. We use whole genome sequencing data from a large cohort of 2182 children with ASD and 379 controls with PSP, sequenced at the same facility with the same machines and variant calling pipeline, in order to investigate the role of noncoding variation in the ASD phenotype. We analyze seven major types of noncoding variants: microRNAs, human accelerated regions, hypersensitive sites, transcription factor binding sites, DNA repeat sequences, simple repeat sequences, and CpG islands. After identifying and removing batch effects between the two groups, we trained an â1-regularized logistic regression classifier to predict ASD status from each set of variants. The classifier trained on simple repeat sequences performed well on a held-out test set (AUC-ROC = 0.960); this classifier was also able to differentiate ASD cases from controls when applied to a completely independent dataset (AUC-ROC = 0.960). This suggests that variation in simple repeat regions is predictive of the ASD phenotype and may contribute to ASD risk. Our results show the importance of the noncoding region and the utility of independent control groups in effectively linking genetic variation to disease phenotype for complex disorders.

Asunto(s)

Trastorno del Espectro Autista/genética , ADN/genética , Variación Genética , Aprendizaje Automático , Estudios de Casos y Controles , Niño , Estudios de Cohortes , Biología Computacional , Islas de CpG , Femenino , Redes Reguladoras de Genes , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Humanos , Modelos Logísticos , Masculino , MicroARNs/genética , Repeticiones de Microsatélite , Fenotipo , Polimorfismo de Nucleótido Simple , ARN no Traducido/genética , Parálisis Supranuclear Progresiva/genética , Secuenciación Completa del Genoma

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA