Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 64
Filtrar
1.
Am J Hum Genet ; 111(5): 979-989, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38604166

RESUMO

Genotype imputation is now fundamental for genome-wide association studies but lacks fairness due to the underrepresentation of references from non-European ancestries. The state-of-the-art imputation reference panel released by the Trans-Omics for Precision Medicine (TOPMed) initiative improved the imputation of admixed African-ancestry and Hispanic/Latino samples, but imputation for populations primarily residing outside of North America may still fall short in performance due to persisting underrepresentation. To illustrate this point, we imputed the genotypes of over 43,000 individuals across 123 populations around the world and identified numerous populations where imputation accuracy paled in comparison to that of European-ancestry populations. For instance, the mean imputation r-squared (Rsq) for variants with minor allele frequencies between 1% and 5% in Saudi Arabians (n = 1,061), Vietnamese (n = 1,264), Thai (n = 2,435), and Papua New Guineans (n = 776) were 0.79, 0.78, 0.76, and 0.62, respectively, compared to 0.90-0.93 for comparable European populations matched in sample size and SNP array content. Outside of Africa and Latin America, Rsq appeared to decrease as genetic distances to European-ancestry reference increased, as predicted. Using sequencing data as ground truth, we also showed that Rsq may over-estimate imputation accuracy for non-European populations more than European populations, suggesting further disparity in accuracy between populations. Using 1,496 sequenced individuals from Taiwan Biobank as a second reference panel to TOPMed, we also assessed a strategy to improve imputation for non-European populations with meta-imputation, but this design did not improve accuracy across frequency spectra. Taken together, our analyses suggest that we must ultimately strive to increase diversity and size to promote equity within genetics research.


Assuntos
Frequência do Gene , Genética Populacional , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Genótipo , Genoma Humano , População Branca/genética
2.
Cell Genom ; 4(4): 100526, 2024 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-38537633

RESUMO

Hispanic/Latino children have the highest risk of acute lymphoblastic leukemia (ALL) in the US compared to other racial/ethnic groups, yet the basis of this remains incompletely understood. Through genetic fine-mapping analyses, we identified a new independent childhood ALL risk signal near IKZF1 in self-reported Hispanic/Latino individuals, but not in non-Hispanic White individuals, with an effect size of ∼1.44 (95% confidence interval = 1.33-1.55) and a risk allele frequency of ∼18% in Hispanic/Latino populations and <0.5% in European populations. This risk allele was positively associated with Indigenous American ancestry, showed evidence of selection in human history, and was associated with reduced IKZF1 expression. We identified a putative causal variant in a downstream enhancer that is most active in pro-B cells and interacts with the IKZF1 promoter. This variant disrupts IKZF1 autoregulation at this enhancer and results in reduced enhancer activity in B cell progenitors. Our study reveals a genetic basis for the increased ALL risk in Hispanic/Latino children.


Assuntos
Predisposição Genética para Doença , Leucemia-Linfoma Linfoblástico de Células Precursoras , Humanos , Criança , Predisposição Genética para Doença/genética , Polimorfismo de Nucleotídeo Único , Fatores de Transcrição/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Hispânico ou Latino/genética , Fator de Transcrição Ikaros/genética
3.
Genome Biol Evol ; 16(2)2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38242694

RESUMO

The ancestral recombination graph (ARG) is a structure that represents the history of coalescent and recombination events connecting a set of sequences (Hudson RR. In: Futuyma D, Antonovics J, editors. Gene genealogies and the coalescent process. In: Oxford Surveys in Evolutionary Biology; 1991. p. 1 to 44.). The full ARG can be represented as a set of genealogical trees at every locus in the genome, annotated with recombination events that change the topology of the trees between adjacent loci and the mutations that occurred along the branches of those trees (Griffiths RC, Marjoram P. An ancestral recombination graph. In: Donnelly P, Tavare S, editors. Progress in population genetics and human evolution. Springer; 1997. p. 257 to 270.). Valuable insights can be gained into past evolutionary processes, such as demographic events or the influence of natural selection, by studying the ARG. It is regarded as the "holy grail" of population genetics (Hubisz M, Siepel A. Inference of ancestral recombination graphs using ARGweaver. In: Dutheil JY, editors. Statistical population genomics. New York, NY: Springer US; 2020. p. 231-266.) since it encodes the processes that generate all patterns of allelic and haplotypic variation from which all commonly used summary statistics in population genetic research (e.g. heterozygosity and linkage disequilibrium) can be derived. Many previous evolutionary inferences relied on summary statistics extracted from the genotype matrix. Evolutionary inferences using the ARG represent a significant advancement as the ARG is a representation of the evolutionary history of a sample that shows the past history of recombination, coalescence, and mutation events across a particular sequence. This representation in theory contains as much information, if not more, than the combination of all independent summary statistics that could be derived from the genotype matrix. Consistent with this idea, some of the first ARG-based analyses have proven to be more powerful than summary statistic-based analyses (Speidel L, Forest M, Shi S, Myers SR. A method for genome-wide genealogy estimation for thousands of samples. Nat Genet. 2019:51(9):1321 to 1329.; Stern AJ, Wilton PR, Nielsen R. An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data. PLoS Genet. 2019:15(9):e1008384.; Hubisz MJ, Williams AL, Siepel A. Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph. PLoS Genet. 2020:16(8):e1008895.; Fan C, Mancuso N, Chiang CWK. A genealogical estimate of genetic relationships. Am J Hum Genet. 2022:109(5):812-824.; Fan C, Cahoon JL, Dinh BL, Ortega-Del Vecchyo D, Huber C, Edge MD, Mancuso N, Chiang CWK. A likelihood-based framework for demographic inference from genealogical trees. bioRxiv. 2023.10.10.561787. 2023.; Hejase HA, Mo Z, Campagna L, Siepel A. A deep-learning approach for inference of selective sweeps from the ancestral recombination graph. Mol Biol Evol. 2022:39(1):msab332.; Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. bioRxiv. 2023.04.07.536093. 2023.; Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, Palamara PF. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat Genet. 2023:55(5):768-776.). As such, there has been significant interest in the field to investigate 2 main problems related to the ARG: (i) How can we estimate the ARG based on genomic data, and (ii) how can we extract information of past evolutionary processes from the ARG? In this perspective, we highlight 3 topics that pertain to these main issues: The development of computational innovations that enable the estimation of the ARG; remaining challenges in estimating the ARG; and methodological advances for deducing evolutionary forces and mechanisms using the ARG. This perspective serves to introduce the readers to the types of questions that can be explored using the ARG and to highlight some of the most pressing issues that must be addressed in order to make ARG-based inference an indispensable tool for evolutionary research.


Assuntos
Algoritmos , Recombinação Genética , Humanos , Funções Verossimilhança , Mapeamento Cromossômico , Mutação , Modelos Genéticos
5.
Hum Genet ; 143(1): 85-99, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38157018

RESUMO

Recombination events establish the patterns of haplotypic structure in a population and estimates of recombination rates are used in several downstream population and statistical genetic analyses. Using suboptimal maps from distantly related populations may reduce the efficacy of genomic analyses, particularly for underrepresented populations such as the Native Hawaiians. To overcome this challenge, we constructed recombination maps using genome-wide array data from two study samples of Native Hawaiians: one reflecting the current admixed state of Native Hawaiians (NH map) and one based on individuals of enriched Polynesian ancestries (PNS map) with the potential to be used for less admixed Polynesian populations such as the Samoans. We found the recombination landscape to be less correlated with those from other continental populations (e.g. Spearman's rho = 0.79 between PNS and CEU (Utah residents with Northern and Western European ancestry) compared to 0.92 between YRI (Yoruba in Ibadan, Nigeria) and CEU at 50 kb resolution), likely driven by the unique demographic history of the Native Hawaiians. PNS also shared the fewest recombination hotspots with other populations (e.g. 8% of hotspots shared between PNS and CEU compared to 27% of hotspots shared between YRI and CEU). We found that downstream analyses in the Native Hawaiian population, such as local ancestry inference, imputation, and IBD segment and relatedness detections, would achieve similar efficacy when using the NH map compared to an omnibus map. However, for genome scans of adaptive loci using integrated haplotype scores, we found several loci with apparent genome-wide significant signals (|Z-score|> 4) in Native Hawaiians that would not have been significant when analyzed using NH-specific maps. Population-specific recombination maps may therefore improve the robustness of haplotype-based statistics and help us better characterize the evolutionary history that may underlie Native Hawaiian-specific health conditions that persist today.


Assuntos
Genômica , Havaiano Nativo ou Outro Ilhéu do Pacífico , Recombinação Genética , Humanos , Havaí/epidemiologia , Havaiano Nativo ou Outro Ilhéu do Pacífico/genética
6.
Am J Hum Genet ; 110(12): 2077-2091, 2023 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-38065072

RESUMO

Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.


Assuntos
Genética Populacional , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Humanos , Mapeamento Cromossômico/métodos , Modelos Genéticos , Fenótipo , Locos de Características Quantitativas/genética , Havaiano Nativo ou Outro Ilhéu do Pacífico/genética
7.
bioRxiv ; 2023 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-37873208

RESUMO

The demographic history of a population drives the pattern of genetic variation and is encoded in the gene-genealogical trees of the sampled alleles. However, existing methods to infer demographic history from genetic data tend to use relatively low-dimensional summaries of the genealogy, such as allele frequency spectra. As a step toward capturing more of the information encoded in the genome-wide sequence of genealogical trees, here we propose a novel framework called the genealogical likelihood (gLike), which derives the full likelihood of a genealogical tree under any hypothesized demographic history. Employing a graph-based structure, gLike summarizes across independent trees the relationships among all lineages in a tree with all possible trajectories of population memberships through time and efficiently computes the exact marginal probability under a parameterized demographic model. Through extensive simulations and empirical applications on populations that have experienced multiple admixtures, we showed that gLike can accurately estimate dozens of demographic parameters when the true genealogy is known, including ancestral population sizes, admixture timing, and admixture proportions. Moreover, when using genealogical trees inferred from genetic data, we showed that gLike outperformed conventional demographic inference methods that leverage only the allele-frequency spectrum and yielded parameter estimates that align with established historical knowledge of the past demographic histories for populations like Latino Americans and Native Hawaiians. Furthermore, our framework can trace ancestral histories by analyzing a sample from the admixed population without proxies for its source populations, removing the need to sample ancestral populations that may no longer exist. Taken together, our proposed gLike framework harnesses underutilized genealogical information to offer exceptional sensitivity and accuracy in inferring complex demographies for humans and other species, particularly as estimation of genome-wide genealogies improves.

8.
medRxiv ; 2023 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-37873425

RESUMO

The neural tissue is rich in polyunsaturated fatty acids (PUFAs), components that are indispensable for the proper functioning of neurons, such as neurotransmission. PUFA nutritional deficiency and imbalance have been linked to a variety of chronic brain disorders, including major depressive disorder (MDD), anxiety, and anorexia. However, the effects of PUFAs on brain disorders remain inconclusive, and the extent of their shared genetic determinants is largely unknown. Here, we used genome-wide association summary statistics to systematically examine the shared genetic basis between six phenotypes of circulating PUFAs (N = 114,999) and 20 brain disorders (N = 9,725-762,917), infer their potential causal relationships, identify colocalized regions, and pinpoint shared genetic variants. Genetic correlation and polygenic overlap analyses revealed a widespread shared genetic basis for 77 trait pairs between six PUFA phenotypes and 16 brain disorders. Two-sample Mendelian randomization analysis indicated potential causal relationships for 16 pairs of PUFAs and brain disorders, including alcohol consumption, bipolar disorder (BIP), and MDD. Colocalization analysis identified 40 shared loci (13 unique) among six PUFAs and ten brain disorders. Twenty-two unique variants were statistically inferred as candidate shared causal variants, including rs1260326 (GCKR), rs174564 (FADS2) and rs4818766 (ADARB1). These findings reveal a widespread shared genetic basis between PUFAs and brain disorders, pinpoint specific shared variants, and provide support for the potential effects of PUFAs on certain brain disorders, especially MDD, BIP, and alcohol consumption.

9.
Am J Hum Genet ; 110(11): 1853-1862, 2023 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-37875120

RESUMO

The heritability explained by local ancestry markers in an admixed population (hγ2) provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present heritability estimation from admixture mapping summary statistics (HAMSTA), an approach that uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ∼5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hˆγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hˆγ2 = 0.012 ± 9.2 × 10-4), which translates to hˆ2 ranging from 0.062 to 0.85 (mean hˆ2 = 0.30 ± 0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 ± 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.


Assuntos
Negro ou Afro-Americano , Genética Populacional , Humanos , Mapeamento Cromossômico , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
10.
HGG Adv ; 4(4): 100239, 2023 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-37710962

RESUMO

The utility of polygenic risk score (PRS) models has not been comprehensively evaluated for childhood acute lymphoblastic leukemia (ALL), the most common type of cancer in children. Previous PRS models for ALL were based on significant loci observed in genome-wide association studies (GWASs), even though genomic PRS models have been shown to improve prediction performance for a number of complex diseases. In the United States, Latino (LAT) children have the highest risk of ALL, but the transferability of PRS models to LAT children has not been studied. In this study, we constructed and evaluated genomic PRS models based on either non-Latino White (NLW) GWAS or a multi-ancestry GWAS. We found that the best PRS models performed similarly between held-out NLW and LAT samples (PseudoR2 = 0.086 ± 0.023 in NLW vs. 0.060 ± 0.020 in LAT), and can be improved for LAT if we performed GWAS in LAT-only (PseudoR2 = 0.116 ± 0.026) or multi-ancestry samples (PseudoR2 = 0.131 ± 0.025). However, the best genomic models currently do not have better prediction accuracy than a conventional model using all known ALL-associated loci in the literature (PseudoR2 = 0.166 ± 0.025), which includes loci from GWAS populations that we could not access to train genomic PRS models. Our results suggest that larger and more inclusive GWASs may be needed for genomic PRS to be useful for ALL. Moreover, the comparable performance between populations may suggest a more oligogenic architecture for ALL, where some large effect loci may be shared between populations. Future PRS models that move away from the infinite causal loci assumption may further improve PRS for ALL.


Assuntos
Estratificação de Risco Genético , Leucemia-Linfoma Linfoblástico de Células Precursoras , Criança , Humanos , Estados Unidos/epidemiologia , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Genômica , Hispânico ou Latino/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética
11.
medRxiv ; 2023 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-37398036

RESUMO

The utility of polygenic risk score (PRS) models has not been comprehensively evaluated for childhood acute lymphoblastic leukemia (ALL), the most common type of cancer in children. Previous PRS models for ALL were based on significant loci observed in genome-wide association studies (GWAS), even though genomic PRS models have been shown to improve prediction performance for a number of complex diseases. In the United States, Latino (LAT) children have the highest risk of ALL, but the transferability of PRS models to LAT children has not been studied. In this study we constructed and evaluated genomic PRS models based on either non-Latino white (NLW) GWAS or a multi-ancestry GWAS. We found that the best PRS models performed similarly between held-out NLW and LAT samples (PseudoR 2 = 0.086 ± 0.023 in NLW vs. 0.060 ± 0.020 in LAT), and can be improved for LAT if we performed GWAS in LAT-only (PseudoR 2 = 0.116 ± 0.026) or multi-ancestry samples (PseudoR 2 = 0.131 ± 0.025). However, the best genomic models currently do not have better prediction accuracy than a conventional model using all known ALL-associated loci in the literature (PseudoR 2 = 0.166 ± 0.025), which includes loci from GWAS populations that we could not access to train genomic PRS models. Our results suggest that larger and more inclusive GWAS may be needed for genomic PRS to be useful for ALL. Moreover, the comparable performance between populations may suggest a more oligo-genic architecture for ALL, where some large effect loci may be shared between populations. Future PRS models that move away from the infinite causal loci assumption may further improve PRS for ALL.

12.
bioRxiv ; 2023 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-37503129

RESUMO

Recombination events establish the patterns of haplotypic structure in a population and estimates of recombination rates are used in several downstream population and statistical genetic analyses. Using suboptimal maps from distantly related populations may reduce the efficacy of genomic analyses, particularly for underrepresented populations such as the Native Hawaiians. To overcome this challenge, we constructed recombination maps using genome-wide array data from two study samples of Native Hawaiians: one reflecting the current admixed state of Native Hawaiians (NH map), and one based on individuals of enriched Polynesian ancestries (PNS map) with the potential to be used for less admixed Polynesian populations such as the Samoans. We found the recombination landscape to be less correlated with those from other continental populations (e.g. Spearman's rho = 0.79 between PNS and CEU (Utah residents with Northern and Western European ancestry) compared to 0.92 between YRI (Yoruba in Ibadan, Nigeria) and CEU at 50 kb resolution), likely driven by the unique demographic history of the Native Hawaiians. PNS also shared the fewest recombination hotspots with other populations (e.g. 8% of hotspots shared between PNS and CEU compared to 27% of hotspots shared between YRI and CEU). We found that downstream analyses in the Native Hawaiian population, such as local ancestry inference, imputation, and IBD segment and relatedness detections, would achieve similar efficacy when using the NH map compared to an omnibus map. However, for genome scans of adaptive loci using integrated haplotype scores, we found several loci with apparent genome-wide significant signals (|Z-score| > 4) in Native Hawaiians that would not have been significant when analyzed using NH-specific maps. Population-specific recombination maps may therefore improve the robustness of haplotype-based statistics and help us better characterize the evolutionary history that may underlie Native Hawaiian-specific health conditions that persist today.

13.
bioRxiv ; 2023 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-37292811

RESUMO

Genotype imputation is now fundamental for genome-wide association studies but lacks fairness due to the underrepresentation of populations with non-European ancestries. The state-of-the-art imputation reference panel released by the Trans-Omics for Precision Medicine (TOPMed) initiative contains a substantial number of admixed African-ancestry and Hispanic/Latino samples to impute these populations with nearly the same accuracy as European-ancestry cohorts. However, imputation for populations primarily residing outside of North America may still fall short in performance due to persisting underrepresentation. To illustrate this point, we curated genome-wide array data from 23 publications published between 2008 to 2021. In total, we imputed over 43k individuals across 123 populations around the world. We identified a number of populations where imputation accuracy paled in comparison to that of European-ancestry populations. For instance, the mean imputation r-squared (Rsq) for 1-5% alleles in Saudi Arabians (N=1061), Vietnamese (N=1264), Thai (N=2435), and Papua New Guineans (N=776) were 0.79, 0.78, 0.76, and 0.62, respectively. In contrast, the mean Rsq ranged from 0.90 to 0.93 for comparable European populations matched in sample size and SNP content. Outside of Africa and Latin America, Rsq appeared to decrease as genetic distances to European reference increased, as predicted. Further analysis using sequencing data as ground truth suggested that imputation software may over-estimate imputation accuracy for non-European populations than European populations, suggesting further disparity between populations. Using 1496 whole genome sequenced individuals from Taiwan Biobank as a reference, we also assessed a strategy to improve imputation for non-European populations with meta-imputation, which can combine results from TOPMed with smaller population-specific reference panels. We found that meta-imputation in this design did not improve Rsq genome-wide. Taken together, our analysis suggests that with the current size of alternative reference panels, meta-imputation alone cannot improve imputation efficacy for underrepresented cohorts and we must ultimately strive to increase diversity and size to promote equity within genetics research.

14.
bioRxiv ; 2023 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-37131817

RESUMO

The heritability explained by local ancestry markers in an admixed population hγ2 provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present a novel approach, Heritability estimation from Admixture Mapping Summary STAtistics (HAMSTA), which uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ~5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hˆγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hˆγ2=0.012+/-9.2×10-4), which translates to hˆ2 ranging from 0.062 to 0.85 (mean hˆ2=0.30+/-0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 +/- 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.

15.
bioRxiv ; 2023 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-37066144

RESUMO

Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide Association Studies (GWAS) are a powerful way to find genetic loci associated with phenotypes. GWAS are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix given the ARG (local eGRM). Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to identify a large-effect BMI locus, the CREBRF gene, in a sample of Native Hawaiians in which it was not previously detectable by GWAS because of a lack of population-specific imputation resources. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.

17.
HGG Adv ; 4(1): 100159, 2023 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-36465187

RESUMO

Over the last two decades, the human reference genome has undergone multiple updates as we complete a linear representation of our genome. Two versions of human references are currently used in the biomedical literature, GRCh37/hg19 and GRCh38. Conversions between these versions are critical for quality control, imputation, and association analysis. In the present study, we show that single-nucleotide variants (SNVs) in regions inverted between different builds of the reference genome are often mishandled bioinformatically. Depending on the array type, SNVs are found in approximately 2-5 Mb of the genome that are inverted between reference builds. Coordinate conversions of these variants are mishandled by both the TOPMed imputation server as well as routine in-house quality control pipelines, leading to underrecognized downstream analytical consequences. Specifically, we observe that undetected allelic conversion errors for palindromic (i.e., A/T or C/G) variants in these inverted regions would destabilize the local haplotype structure, leading to loss of imputation accuracy and power in association analyses. Though only a small proportion of the genome is affected, these regions include important disease susceptibility variants that would be affected. For example, the p value of a known locus associated with prostate cancer on chromosome 10 (chr10) would drop from 2.86 × 10-7 to 0.0011 in a case-control analysis of 20,286 Africans and African Americans (10,643 cases and 9,643 controls). We devise a straight-forward heuristic based on the popular tool, liftOver, that can easily detect and correct these variants in the inverted regions between genome builds to locally improve imputation accuracy.


Assuntos
Estudo de Associação Genômica Ampla , Genômica , Masculino , Humanos , Genoma Humano/genética , Haplótipos/genética , Negro ou Afro-Americano
18.
medRxiv ; 2023 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-38234828

RESUMO

Polygenic scores (PGS) are promising in stratifying individuals based on the genetic susceptibility to complex diseases or traits. However, the accuracy of PGS models, typically trained in European- or East Asian-ancestry populations, tend to perform poorly in other ethnic minority populations, and their accuracies have not been evaluated for Native Hawaiians. Using body mass index, height, and type-2 diabetes as examples of highly polygenic traits, we evaluated the prediction accuracies of PGS models in a large Native Hawaiian sample from the Multiethnic Cohort with up to 5,300 individuals. We evaluated both publicly available PGS models or genome-wide PGS models trained in this study using the largest available GWAS. We found evidence of lowered prediction accuracies for the PGS models in some cases, particularly for height. We also found that using the Native Hawaiian samples as an optimization cohort during training did not consistently improve PGS performance. Moreover, even the best performing PGS models among Native Hawaiians would have lowered prediction accuracy among the subset of individuals most enriched with Polynesian ancestry. Our findings indicate that factors such as admixture histories, sample size and diversity in GWAS can influence PGS performance for complex traits among Native Hawaiian samples. This study provides an initial survey of PGS performance among Native Hawaiians and exposes the current gaps and challenges associated with improving polygenic prediction models for underrepresented minority populations.

19.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36056746

RESUMO

Identifying genomic regions influenced by natural selection provides fundamental insights into the genetic basis of local adaptation. However, it remains challenging to detect loci under complex spatially varying selection. We propose a deep learning-based framework, DeepGenomeScan, which can detect signatures of spatially varying selection. We demonstrate that DeepGenomeScan outperformed principal component analysis- and redundancy analysis-based genome scans in identifying loci underlying quantitative traits subject to complex spatial patterns of selection. Noticeably, DeepGenomeScan increases statistical power by up to 47.25% under nonlinear environmental selection patterns. We applied DeepGenomeScan to a European human genetic dataset and identified some well-known genes under selection and a substantial number of clinically important genes that were not identified by SPA, iHS, Fst and Bayenv when applied to the same dataset.


Assuntos
Aprendizado Profundo , Genoma , Genômica , Humanos , Polimorfismo de Nucleotídeo Único , Seleção Genética
20.
PLoS Genet ; 18(9): e1010388, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36070312

RESUMO

BACKGROUND: Pilocytic astrocytoma (PA) is the most common pediatric brain tumor. PA has at least a 50% higher incidence in populations of European ancestry compared to other ancestral groups, which may be due in part to genetic differences. METHODS: We first compared the global proportions of European, African, and Amerindian ancestries in 301 PA cases and 1185 controls of self-identified Latino ethnicity from the California Biobank. We then conducted admixture mapping analysis to assess PA risk with local ancestry. RESULTS: We found PA cases had a significantly higher proportion of global European ancestry than controls (case median = 0.55, control median = 0.51, P value = 3.5x10-3). Admixture mapping identified 13 SNPs in the 6q14.3 region (SNX14) contributing to risk, as well as three other peaks approaching significance on chromosomes 7, 10 and 13. Downstream fine mapping in these regions revealed several SNPs potentially contributing to childhood PA risk. CONCLUSIONS: There is a significant difference in genomic ancestry associated with Latino PA risk and several genomic loci potentially mediating this risk.


Assuntos
Astrocitoma , Estudo de Associação Genômica Ampla , Astrocitoma/genética , Criança , Mapeamento Cromossômico , Hispânico ou Latino/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...