Your browser doesn't support javascript.
loading
Computationally efficient whole-genome regression for quantitative and binary traits.
Mbatchou, Joelle; Barnard, Leland; Backman, Joshua; Marcketta, Anthony; Kosmicki, Jack A; Ziyatdinov, Andrey; Benner, Christian; O'Dushlaine, Colm; Barber, Mathew; Boutkov, Boris; Habegger, Lukas; Ferreira, Manuel; Baras, Aris; Reid, Jeffrey; Abecasis, Goncalo; Maxwell, Evan; Marchini, Jonathan.
Afiliação
  • Mbatchou J; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Barnard L; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Backman J; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Marcketta A; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Kosmicki JA; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Ziyatdinov A; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Benner C; Regeneron Genetics Center, Tarrytown, NY, USA.
  • O'Dushlaine C; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Barber M; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Boutkov B; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Habegger L; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Ferreira M; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Baras A; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Reid J; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Abecasis G; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Maxwell E; Regeneron Genetics Center, Tarrytown, NY, USA.
  • Marchini J; Regeneron Genetics Center, Tarrytown, NY, USA. jonathan.marchini@regeneron.com.
Nat Genet ; 53(7): 1097-1103, 2021 07.
Article em En | MEDLINE | ID: mdl-34017140
ABSTRACT
Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Biologia Computacional / Genômica / Estudo de Associação Genômica Ampla Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Biologia Computacional / Genômica / Estudo de Associação Genômica Ampla Idioma: En Ano de publicação: 2021 Tipo de documento: Article