<i>IsoBayes</i>: a Bayesian approach for single-isoform proteomics inference.

Bollon, Jordy; Shortreed, Michael R; Jordan, Ben T; Miller, Rachel; Jeffery, Erin; Cavalli, Andrea; Smith, Lloyd M; Dewey, Colin; Sheynkman, Gloria M; Tiberi, Simone

IsoBayes: a Bayesian approach for single-isoform proteomics inference.

Bollon, Jordy; Shortreed, Michael R; Jordan, Ben T; Miller, Rachel; Jeffery, Erin; Cavalli, Andrea; Smith, Lloyd M; Dewey, Colin; Sheynkman, Gloria M; Tiberi, Simone.

Affiliation

Bollon J; Computational and Chemical Biology, Italian Institute of Technology, CMPVdA, Aosta, Italy.
Shortreed MR; Astronomical Observatory of the Autonomous Region of the Aosta Valley (OAVdA), Nus, Italy.
Jordan BT; Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA.
Miller R; Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
Jeffery E; Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA.
Cavalli A; Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA.
Smith LM; Computational and Chemical Biology, Italian Institute of Technology, CMPVdA, Aosta, Italy.
Dewey C; Centre Européen de Calcul Atomique et Moléculaire, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
Sheynkman GM; Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA.
Tiberi S; Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA.

bioRxiv ; 2024 Jun 11.

Article in En | MEDLINE | ID: mdl-38915658

ABSTRACT

ABSTRACT

Studying protein isoforms is an essential step in biomedical research; at present, the main approach for analyzing proteins is via bottom-up mass spectrometry proteomics, which return peptide identifications, that are indirectly used to infer the presence of protein isoforms. However, the detection and quantification processes are noisy; in particular, peptides may be erroneously detected, and most peptides, known as shared peptides, are associated to multiple protein isoforms. As a consequence, studying individual protein isoforms is challenging, and inferred protein results are often abstracted to the gene-level or to groups of protein isoforms. Here, we introduce IsoBayes, a novel statistical method to perform inference at the isoform level. Our method enhances the information available, by integrating mass spectrometry proteomics and transcriptomics data in a Bayesian probabilistic framework. To account for the uncertainty in the measurement process, we propose a two-layer latent variable

approach:

first, we sample if a peptide has been correctly detected (or, alternatively filter peptides); second, we allocate the abundance of such selected peptides across the protein(s) they are compatible with. This enables us, starting from peptide-level data, to recover protein-level data; in particular, we i) infer the presence/absence of each protein isoform (via a posterior probability), ii) estimate its abundance (and credible interval), and iii) target isoforms where transcript and protein relative abundances significantly differ. We benchmarked our approach in simulations, and in two multi-protease real datasets our method displays good sensitivity and specificity when detecting protein isoforms, its estimated abundances highly correlate with the ground truth, and can detect changes between protein and transcript relative abundances. IsoBayes is freely distributed as a Bioconductor R package, and is accompanied by an example usage vignette.

Key words

Bayesian inference; Latent variables; Mass spectrometry proteomics; Single-isoform inference; Statistical software tool

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: BioRxiv Year: 2024 Document type: Article Affiliation country: Italia Country of publication: Estados Unidos

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google