ABSTRACT
Lung cancer is the most common cause of cancer-related mortality worldwide, characterized by late clinical presentation (49-53% of patients are diagnosed at stage IV) and consequently poor outcomes. One challenge in identifying biomarkers of early disease is the collection of samples from patients prior to symptomatic presentation. We used blood collected during surgical resection of lung tumors in an iTRAQ isobaric tagging experiment to identify proteins effluxing from tumors into pulmonary veins. Forty proteins were identified as having an increased abundance in the vein draining from the tumor compared to "healthy" pulmonary veins. These protein markers were then assessed in a second cohort that utilized the mass spectrometry (MS) technique: Sequential window acquisition of all theoretical fragment ion spectra (SWATH) MS. SWATH-MS was used to measure proteins in serum samples taken from 25 patients <50 months prior to and at lung cancer diagnosis and 25 matched controls. The SWATH-MS analysis alone produced an 11 protein marker panel. A machine learning classification model was generated that could discriminate patient samples from patients within 12 months of lung cancer diagnosis and control samples. The model was evaluated as having a mean AUC of 0.89, with an accuracy of 0.89. This panel was combined with the SWATH-MS data from one of the markers from the first cohort to create a 12 protein panel. The proteome signature developed for lung cancer risk can now be developed on further cohorts.