Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services.
J Am Med Inform Assoc
; 27(9): 1425-1430, 2020 09 01.
Article
in En
| MEDLINE
| ID: mdl-32719837
OBJECTIVE: Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. METHODS: We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. RESULTS: Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. CONCLUSIONS: We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost?
Key words
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Main subject:
Genome-Wide Association Study
/
Cloud Computing
Type of study:
Evaluation_studies
/
Health_economic_evaluation
/
Risk_factors_studies
Limits:
Humans
Language:
En
Journal:
J Am Med Inform Assoc
Journal subject:
INFORMATICA MEDICA
Year:
2020
Document type:
Article
Country of publication:
United kingdom