A Case Study and Methodology for OpenSWATH Parameter Optimization Using the ProCan90 Data Set and 45â¯810 Computational Analysis Runs.

Peters, Sean; Hains, Peter G; Lucas, Natasha; Robinson, Phillip J; Tully, Brett

A Case Study and Methodology for OpenSWATH Parameter Optimization Using the ProCan90 Data Set and 45â¯810 Computational Analysis Runs.

Peters, Sean; Hains, Peter G; Lucas, Natasha; Robinson, Phillip J; Tully, Brett.

Afiliación

Peters S; ProCan , Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney , Westmead , NSW 2145 , Australia.
Hains PG; ProCan , Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney , Westmead , NSW 2145 , Australia.
Lucas N; Cell Signalling Unit , Children's Medical Research Institute, The University of Sydney , Westmead , NSW 2145 , Australia.
Robinson PJ; ProCan , Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney , Westmead , NSW 2145 , Australia.
Tully B; ProCan , Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney , Westmead , NSW 2145 , Australia.

J Proteome Res ; 18(3): 1019-1031, 2019 03 01.

Article en En | MEDLINE | ID: mdl-30652484

ABSTRACT

ABSTRACT

In the current study, we show how ProCan90, a curated data set of HEK293 technical replicates, can be used to optimize the configuration options for algorithms in the OpenSWATH pipeline. Furthermore, we use this case study as a proof of concept for horizontal scaling of such a pipeline to allow 45â¯810 computational analysis runs of OpenSWATH to be completed within four and a half days on a budget of US $10â¯000. Through the use of Amazon Web Services (AWS), we have successfully processed each of the ProCan 90 files with 506 combinations of input parameters. In total, the project consumed more than 340â¯000 core hours of compute and generated in excess of 26 TB of data. Using the resulting data and a set of quantitative metrics, we show an analysis pathway that allows the calculation of two optimal parameter sets, one for a compute rich environment (where run time is not a constraint), and another for a compute poor environment (where run time is optimized). For the same input files and the compute rich parameter set, we show a 29.8% improvement in the number of quality protein (>2 peptide) identifications found compared to the current OpenSWATH defaults, with negligible adverse effects on quantification reproducibility or drop in identification confidence, and a median run time of 75 min (103% increase). For the compute poor parameter set, we find a 55% improvement in the run time from the default parameter set, at the expense of a 3.4% decrease in the number of quality protein identifications, and an intensity CV decrease from 14.0% to 13.7%.

Asunto(s)

Biología Computacional/métodos; Bases de Datos de Proteínas/normas; Conjuntos de Datos como Asunto/normas; Células HEK293; Humanos; Proteínas/análisis; Proteómica/métodos; Reproducibilidad de los Resultados; Factores de Tiempo

Palabras clave

Amazon Web Services; HEK293; OpenSWATH; ProCan; big data; mass spectrometry; parameter optimization; proteomics; scalability; sensitivity analysis

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Biología Computacional / Bases de Datos de Proteínas / Conjuntos de Datos como Asunto Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: J Proteome Res Asunto de la revista: BIOQUIMICA Año: 2019 Tipo del documento: Article País de afiliación: Australia

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google