RESUMEN
The genetic basis of phenotypic variation across populations has not been well explained for most traits. Several factors may cause disparities, from variation in environments to divergent population genetic structure. We hypothesized that a population-level polygenic risk score (PRS) can explain phenotypic variation among geographic populations based solely on risk allele frequencies. We applied a population-specific PRS (psPRS) to 26 populations from the 1000 Genomes to four phenotypes: lactase persistence (LP), melanoma, multiple sclerosis (MS) and height. Our models assumed additive genetic architecture among the polymorphisms in the psPRSs, as is convention. Linear psPRSs explained a significant proportion of trait variance ranging from 0.32 for height in men to 0.88 for melanoma. The best models for LP and height were linear, while those for melanoma and MS were nonlinear. As not all variants in a PRS may confer similar, or even any, risk among diverse populations, we also filtered out SNPs to assess whether variance explained was improved using psPRSs with fewer SNPs. Variance explained usually improved with fewer SNPs in the psPRS and was as high as 0.99 for height in men using only 548 of the initial 4208 SNPs. That reducing SNPs improves psPRSs performance may indicate that missing heritability is partially due to complex architecture that does not mandate additivity, undiscovered variants or spurious associations in the databases. We demonstrated that PRS-based analyses can be used across diverse populations and phenotypes for population prediction and that these comparisons can identify the universal risk variants.