Your browser doesn't support javascript.
loading
Two-phase sample selection strategies for design and analysis in post-genome-wide association fine-mapping studies.
Espin-Garcia, Osvaldo; Craiu, Radu V; Bull, Shelley B.
Afiliação
  • Espin-Garcia O; Department of Biostatistics, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
  • Craiu RV; Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.
  • Bull SB; Department of Statistical Sciences, Faculty of Arts and Sciences, University of Toronto, Toronto, Ontario, Canada.
Stat Med ; 40(30): 6792-6817, 2021 12 30.
Article em En | MEDLINE | ID: mdl-34596256
ABSTRACT
Post-GWAS analysis, in many cases, focuses on fine-mapping targeted genetic regions discovered at GWAS-stage; that is, the aim is to pinpoint potential causal variants and susceptibility genes for complex traits and disease outcomes using next-generation sequencing (NGS) technologies. Large-scale GWAS cohorts are necessary to identify target regions given the typically modest genetic effect sizes. In this context, two-phase sampling design and analysis is a cost-reduction technique that utilizes data collected during phase 1 GWAS to select an informative subsample for phase 2 sequencing. The main goal is to make inference for genetic variants measured via NGS by efficiently combining data from phases 1 and 2. We propose two approaches for selecting a phase 2 design under a budget constraint. The first method identifies sampling fractions that select a phase 2 design yielding an asymptotic variance covariance matrix with certain optimal characteristics, for example, smallest trace, via Lagrange multipliers (LM). The second relies on a genetic algorithm (GA) with a defined fitness function to identify exactly a phase 2 subsample. We perform comprehensive simulation studies to evaluate the empirical properties of the proposed designs for a genetic association study of a quantitative trait. We compare our methods against two ranked designs residual-dependent sampling and a recently identified optimal design. Our findings demonstrate that the proposed designs, GA in particular, can render competitive power in combined phase 1 and 2 analysis compared with alternative designs while preserving type 1 error control. These results are especially evident under the more practical scenario where design values need to be defined a priori and are subject to misspecification. We illustrate the proposed methods in a study of triglyceride levels in the North Finland Birth Cohort of 1966. R code to reproduce our results is available at github.com/egosv/TwoPhase_postGWAS.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Polimorfismo de Nucleotídeo Único / Estudo de Associação Genômica Ampla Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Polimorfismo de Nucleotídeo Único / Estudo de Associação Genômica Ampla Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2021 Tipo de documento: Article