De Novo Genome Assemblies From Two Indigenous Americans from Arizona Identify New Polymorphisms in Non-Reference Sequences.
Genome Biol Evol
; 16(9)2024 Sep 03.
Article
in En
| MEDLINE
| ID: mdl-39190003
ABSTRACT
There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. In this study, using long-read sequencing technology, we constructed de novo genome assemblies from two indigenous Americans from Arizona (IAZ). Each assembly included â¼17â
Mb of DNA sequence not present [nonreference sequence (NRS)] in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240â
kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with whole-genome sequencing (WGS) sequencing data from 387 IAZ using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified â¼50,000 single-nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187â
bp NRS found in both de novo assemblies. The NRS is located in HCN2 79â
bp downstream of Exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (minor allele frequency = 0.45) compared to other reference populations tested. This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an underrepresented ethnic groups and thereby lead to the discovery of previously missed common variations.
Key words
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Main subject:
Genome, Human
Limits:
Humans
Country/Region as subject:
America do norte
Language:
En
Journal:
Genome Biol Evol
Journal subject:
BIOLOGIA
/
BIOLOGIA MOLECULAR
Year:
2024
Document type:
Article
Affiliation country:
United States
Country of publication:
United kingdom