Your browser doesn't support javascript.
loading
Accelerating Genome- and Phenome-Wide Association Studies using GPUs - A case study using data from the Million Veteran Program.
Rodriguez, Alex; Kim, Youngdae; Nandi, Tarak Nath; Keat, Karl; Kumar, Rachit; Bhukar, Rohan; Conery, Mitchell; Liu, Molei; Hessington, John; Maheshwari, Ketan; Schmidt, Drew; Begoli, Edmon; Tourassi, Georgia; Muralidhar, Sumitra; Natarajan, Pradeep; Voight, Benjamin F; Cho, Kelly; Gaziano, J Michael; Damrauer, Scott M; Liao, Katherine P; Zhou, Wei; Huffman, Jennifer E; Verma, Anurag; Madduri, Ravi K.
Affiliation
  • Rodriguez A; Data Science and Learning, Argonne National Laboratory, Lemont, IL, 60439, USA.
  • Kim Y; Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, 60439, USA.
  • Nandi TN; Data Science and Learning, Argonne National Laboratory, Lemont, IL, 60439, USA.
  • Keat K; Institute for Biomedical Informatics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA.
  • Kumar R; Institute for Biomedical Informatics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA.
  • Bhukar R; Program in Medical and Population Genetics, Cambridge, MA, 02142, USA.
  • Conery M; Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, 02114, USA.
  • Liu M; Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA.
  • Hessington J; Department of Biostatistics, Columbia University's Mailman School of Public Health, New York, NY, 10032, USA.
  • Maheshwari K; Information systems, University of Pennsylvania, Philadelphia, PA, 19104, USA.
  • Schmidt D; Oak Ridge National Laboratory, Oak Ridge, TN, USA.
  • Tourassi G; Oak Ridge National Laboratory, Oak Ridge, TN, USA.
  • Muralidhar S; Computing and Computational Sciences Directorate, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA.
  • Natarajan P; Office of Research and Development, Department of Veterans Affairs, Washington, DC, 20420, USA.
  • Voight BF; Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, 02114, USA.
  • Cho K; Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA.
  • Gaziano JM; Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
  • Damrauer SM; Cardiology Division, Massachusetts General Hospital, Boston, MA, 02114, USA.
  • Liao KP; Corporal Michael Crescenz VA Medical Center, Philadelphia, PA, 19104, USA.
  • Zhou W; Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA.
  • Huffman JE; Department of Genetics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA.
  • Verma A; Institute of Translational Medicine and Therapeutics, University of Pennsylvania - Perelman School of Medicine, Philadelphia, PA, 19104, USA.
  • Madduri RK; MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, 02111, USA.
bioRxiv ; 2024 May 22.
Article in En | MEDLINE | ID: mdl-38826407
ABSTRACT
The expansion of biobanks has significantly propelled genomic discoveries yet the sheer scale of data within these repositories poses formidable computational hurdles, particularly in handling extensive matrix operations required by prevailing statistical frameworks. In this work, we introduce computational optimizations to the SAIGE (Scalable and Accurate Implementation of Generalized Mixed Model) algorithm, notably employing a GPU-based distributed computing approach to tackle these challenges. We applied these optimizations to conduct a large-scale genome-wide association study (GWAS) across 2,068 phenotypes derived from electronic health records of 635,969 diverse participants from the Veterans Affairs (VA) Million Veteran Program (MVP). Our strategies enabled scaling up the analysis to over 6,000 nodes on the Department of Energy (DOE) Oak Ridge Leadership Computing Facility (OLCF) Summit High-Performance Computer (HPC), resulting in a 20-fold acceleration compared to the baseline model. We also provide a Docker container with our optimizations that was successfully used on multiple cloud infrastructures on UK Biobank and All of Us datasets where we showed significant time and cost benefits over the baseline SAIGE model.

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: BioRxiv Year: 2024 Document type: Article Affiliation country: Country of publication:

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: BioRxiv Year: 2024 Document type: Article Affiliation country: Country of publication: