Your browser doesn't support javascript.
loading
Assessing polygenic risk score models for applications in populations with under-represented genomics data: an example of Vietnam.
Pham, Duy; Truong, Buu; Tran, Khai; Ni, Guiyan; Nguyen, Dat; Tran, Trang T H; Tran, Mai H; Nguyen Thuy, Duong; Vo, Nam S; Nguyen, Quan.
Afiliação
  • Pham D; Institute for Molecular Bioscience, The University of Queensland, Carmody Rd, 4072, Queensland, Australia.
  • Truong B; UniSA STEM, University of South Australia, Mawson Lakes, 5095, South Australia, Australia.
  • Tran K; Center for Biomedical Informatics, Vingroup Big Data Institute, 458 Minh Khai , 10000, Hanoi, Vietnam.
  • Ni G; Institute for Molecular Bioscience, The University of Queensland, Carmody Rd, 4072, Queensland, Australia.
  • Nguyen D; Center for Biomedical Informatics, Vingroup Big Data Institute, 458 Minh Khai , 10000, Hanoi, Vietnam.
  • Vo NS; Center for Biomedical Informatics, Vingroup Big Data Institute, 458 Minh Khai , 10000, Hanoi, Vietnam.
  • Nguyen Q; Institute for Molecular Bioscience, The University of Queensland, Carmody Rd, 4072, Queensland, Australia.
Brief Bioinform ; 23(6)2022 11 19.
Article em En | MEDLINE | ID: mdl-36326078
ABSTRACT
Most polygenic risk score (PRS)models have been based on data from populations of European origins (accounting for the majority of the large genomics datasets, e.g. >78% in the UK Biobank and >85% in the GTEx project). Although several large-scale Asian biobanks were initiated (e.g. Japanese, Korean, Han Chinese biobanks), most other Asian countries have little or near-zero genomics data. To implement PRS models for under-represented populations, we explored transfer learning approaches, assuming that information from existing large datasets can compensate for the small sample size that can be feasibly obtained in developing countries, like Vietnam. Here, we benchmark 13 common PRS methods in meta-population strategy (combining individual genotype data from multiple populations) and multi-population strategy (combining summary statistics from multiple populations). Our results highlight the complementarity of different populations and the choice of methods should depend on the target population. Based on these results, we discussed a set of guidelines to help users select the best method for their datasets. We developed a robust and comprehensive software to allow for benchmarking comparisons between methods and proposed a computational framework for improving PRS performance in a dataset with a small sample size. This work is expected to inform the development of genomics applications in under-represented populations. PRSUP framework is available at https//github.com/BiomedicalMachineLearning/VGP.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Herança Multifatorial / Estudo de Associação Genômica Ampla Tipo de estudo: Etiology_studies / Prognostic_studies / Risk_factors_studies Limite: Humans País/Região como assunto: Asia Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Austrália

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Herança Multifatorial / Estudo de Associação Genômica Ampla Tipo de estudo: Etiology_studies / Prognostic_studies / Risk_factors_studies Limite: Humans País/Região como assunto: Asia Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Austrália