Your browser doesn't support javascript.
loading
Small-group originating model: Optimized individual-level GWAS simulation featured by SLiM and using open-access data.
Cui, Zuxi; Schumacher, Fredrick R.
Affiliation
  • Cui Z; Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA.
  • Schumacher FR; Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA; Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA. Electronic address: frs2@case.edu.
Comput Biol Chem ; 112: 108147, 2024 Oct.
Article in En | MEDLINE | ID: mdl-39033733
ABSTRACT
The development of analytical methods for Genome-wide Association Studies (GWAS) has outpaced the evolution of simulation techniques and pipelines. This disparity underscores the importance of innovative simulation methods that can keep pace with the rapidly increasing scale of GWAS. The median sample size of GWAS over the past ten years has exceeded 50,000 individuals, a trend that emphasizes the need for simulation tools capable of generating data on a similar or larger scale. This paper introduces a novel method, the small-group originating (SGO) model, utilizing the SLiM software for simulating individual-level GWAS data. Our standardized protocol facilitates the generation of tens of thousands of pseudo-individuals with millions of variants from small (30-90) open-access datasets. SGO stands out, especially when compared to the widely-used resampling method in HapGen, showcasing superior simulation efficiency for large sample sizes (> 13,000) of unrelated individuals. This capability is particularly relevant given the current trajectory towards larger GWAS, necessitating tools that can simulate datasets reflective of this growth. Additionally, SGO provides customization options and can model dynamic life cycles and mating across generations, positioning it as a highly promising alternative for GWAS simulations. In a case study, sensitivity analyses of chromosome-level principal component analysis and kinship coefficient estimation were conducted. The results highlighted the poor robustness of chromosome-level quality control (QC) indexes and the uneven distribution of population structure across chromosomes and ancestries, advocating for the caution against relying solely on chromosome-level QC statistics. With its flexible and efficient approach to generating pseudo GWAS data, our standardized SGO protocol emerges as a crucial asset for method development, power analysis, and benchmarking in GWAS research. It is especially vital in the context of accommodating the demands for large-scale simulations, aligning with the current and future scale of GWAS.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Genome-Wide Association Study Limits: Humans Language: En Journal: Comput Biol Chem Journal subject: BIOLOGIA / INFORMATICA MEDICA / QUIMICA Year: 2024 Document type: Article Country of publication: United kingdom

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Genome-Wide Association Study Limits: Humans Language: En Journal: Comput Biol Chem Journal subject: BIOLOGIA / INFORMATICA MEDICA / QUIMICA Year: 2024 Document type: Article Country of publication: United kingdom