Búsqueda | Portal Regional de la BVS

Optimizing the PROTREC network-based missing protein prediction algorithm.

Wu, Wenshan; Huang, Zelu; Kong, Weijia; Peng, Hui; Goh, Wilson Wen Bin.

Proteomics ; 24(1-2): e2200332, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-37876146

RESUMEN

This article summarizes the PROTREC method and investigates the impact that the different hyper-parameters have on the task of missing protein prediction using PROTREC. We evaluate missing protein recovery rates using different PROTREC score selection approaches (MAX, MIN, MEDIAN, and MEAN), different PROTREC score thresholds, as well as different complex size thresholds. In addition, we included two additional cancer datasets in our analysis and introduced a new validation method to check both the robustness of the PROTREC method as well as the correctness of our analysis. Our analysis showed that the missing protein recovery rate can be improved by adopting PROTREC score selection operations of MIN, MEDIAN, and MEAN instead of the default MAX. However, this may come at a cost of reduced numbers of proteins predicted and validated. The users should therefore choose their hyper-parameters carefully to find a balance in the accuracy-quantity trade-off. We also explored the possibility of combining PROTREC with a p-value-based method (FCS) and demonstrated that PROTREC is able to perform well independently without any help from a p-value-based method. Furthermore, we conducted a downstream enrichment analysis to understand the biological pathways and protein networks within the cancerous tissues using the recovered proteins. Missing protein recovery rate using PROTREC can be improved by selecting a different PROTREC score selection method. Different PROTREC score selection methods and other hyper-parameters such as PROTREC score threshold and complex size threshold introduce accuracy-quantity trade-off. PROTREC is able to perform well independently of any filtering using a p-value-based method. Verification of the PROTREC method on additional cancer datasets. Downstream Enrichment Analysis to understand the biological pathways and protein networks in cancerous tissues.

Asunto(s)

Algoritmos , Neoplasias , Humanos

Proteomic datasets of HeLa and SiHa cell lines acquired by DDA-PASEF and diaPASEF.

Huang, Zelu; Kong, Weijia; Wong, Bertrand Jernhan; Gao, Huanhuan; Guo, Tiannan; Liu, Xianming; Du, Xiaoxian; Wong, Limsoon; Goh, Wilson Wen Bin.

Data Brief ; 41: 107919, 2022 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-35198691

RESUMEN

We present four datasets on proteomics profiling of HeLa and SiHa cell lines associated with the research described in the paper "PROTREC: A probability-based approach for recovering missing proteins based on biological networks" [1]. Proteins in each cell line were acquired by two different data acquisition methods. The first was Data Dependent Acquisition-Parallel Accumulation Serial Fragmentation (DDA-PASEF) and the second was Parallel Accumulation-Serial Fragmentation combined with data-independent acquisition (diaPASEF) [2], [3]. Protein assembly was performed following search against the Swiss-Prot Human database using Peaks Studio for DDA datasets and Spectronaut for DIA datasets. The assembled result contains identified PSMs, peptides and proteins that are above threshold for each HeLa and SiHa sample. Coverage-wise, for DDA-PASEF, approximately 6,090 and 7,298 proteins were quantified for HeLa and SiHA sample, while13,339 and 8,773 proteins were quantified by diaPASEF for HeLa for SiHa sample, respectively. Consistency-wise, diaPASEF has fewer missing values (â¼ 2%) compared to its DDA counterparts (â¼5-7%). The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the iProX partner repository [4] with the dataset identifier PXD029773.

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA