A benchmark study on current GWAS models in admixed populations.

Yang, Zikun; Huaman, Basilio Cieza; Reyes-Dumeyer, Dolly; Montesinos, Rosa; Soto-Añari, Marcio; Custodio, Nilton; Tosto, Giuseppe

Yang, Zikun; Huaman, Basilio Cieza; Reyes-Dumeyer, Dolly; Montesinos, Rosa; Soto-Añari, Marcio; Custodio, Nilton; Tosto, Giuseppe.

Afiliación

Yang Z; Taub Institute for Research on Alzheimer's Disease and the Aging Brain, College of Physicians and Surgeons, Columbia University. 630 West 168 Street, New York, NY 10032, USA.
Huaman BC; The Gertrude H. Sergievsky Center, College of Physicians and Surgeons, Columbia University. 630 West 168 Street, New York, NY 10032, USA.
Reyes-Dumeyer D; Taub Institute for Research on Alzheimer's Disease and the Aging Brain, College of Physicians and Surgeons, Columbia University. 630 West 168 Street, New York, NY 10032, USA.
Montesinos R; The Gertrude H. Sergievsky Center, College of Physicians and Surgeons, Columbia University. 630 West 168 Street, New York, NY 10032, USA.
Soto-Añari M; Taub Institute for Research on Alzheimer's Disease and the Aging Brain, College of Physicians and Surgeons, Columbia University. 630 West 168 Street, New York, NY 10032, USA.
Custodio N; The Gertrude H. Sergievsky Center, College of Physicians and Surgeons, Columbia University. 630 West 168 Street, New York, NY 10032, USA.
Tosto G; Department of Neurology, College of Physicians and Surgeons, Columbia University and the New York Presbyterian Hospital. 710 West 168 Street, New York, NY 10032, USA.

bioRxiv ; 2023 Apr 30.

Article en En | MEDLINE | ID: mdl-37163101

RESUMEN

Objective: The performances of popular Genome-wide association study (GWAS) models haven't been examined yet in a consistent manner under the scenario of genetic admixture, which introduces several challenging aspects such as heterogeneity of minor allele frequency (MAF), a wide spectrum of case-control ratio, and varying effect sizes etc. Methods: We generated a cohort of synthetic individuals (N=19,234) that simulates 1) a large sample size; 2) two-way admixture [Native American-European ancestry] and 3) a binary phenotype. We then examined the inflation factors produced by three popular GWAS tools: GMMAT, SAIGE, and Tractor. We also computed power calculations under different MAFs, case-control ratios, and varying ancestry percentages. Then, we employed a cohort of Peruvians (N=249) to further examine the performances of the testing models on 1) real genetic data and 2) small sample sizes. Finally, we validated these findings using an independent Peruvian cohort (N=109) included in 1000 Genome project (1000G). Results: In the synthetic cohort, SAIGE performed better than GMMAT and Tractor in terms of type-I error rate, especially under severe unbalanced case-control ratio. On the contrary, power analysis identified Tractor as the best method to pinpoint ancestry-specific causal variants, but showed decreased power when no adequate heterogeneity of the true effect sizes was simulated between ancestries. The real Peruvian data showed that Tractor is severely affected by small sample sizes, and produced severely inflated statistics, which we replicated in the 1000G Peruvian cohort. Discussion: The current study illustrates the limitations of available GWAS tools under different scenarios of genetic admixture. We urge caution when interpreting results under complex population scenarios.

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Revista: BioRxiv Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google