Your browser doesn't support javascript.
loading
Biosurfer for systematic tracking of regulatory mechanisms leading to protein isoform diversity.
Murali, Mayank; Saquing, Jamie; Lu, Senbao; Gao, Ziyang; Jordan, Ben; Wakefield, Zachary Peters; Fiszbein, Ana; Cooper, David R; Castaldi, Peter J; Korkin, Dmitry; Sheynkman, Gloria.
Afiliación
  • Murali M; Broad Institute of MIT and Harvard University, Cambridge, MA, USA.
  • Saquing J; Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA.
  • Lu S; Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA.
  • Gao Z; Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA.
  • Jordan B; Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA.
  • Wakefield ZP; Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA.
  • Fiszbein A; Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA.
  • Cooper DR; Bioinformatics Program, Boston University, Boston, MA, USA.
  • Castaldi PJ; Department of Biology, Boston University, Boston, MA, USA.
  • Korkin D; Bioinformatics Program, Boston University, Boston, MA, USA.
  • Sheynkman G; Department of Biology, Boston University, Boston, MA, USA.
bioRxiv ; 2024 Mar 17.
Article en En | MEDLINE | ID: mdl-38559226
ABSTRACT
Long-read RNA sequencing has shed light on transcriptomic complexity, but questions remain about the functionality of downstream protein products. We introduce Biosurfer, a computational approach for comparing protein isoforms, while systematically tracking the transcriptional, splicing, and translational variations that underlie differences in the sequences of the protein products. Using Biosurfer, we analyzed the differences in 32,799 pairs of GENCODE annotated protein isoforms, finding a majority (70%) of variable N-termini are due to the alternative transcription start sites, while only 9% arise from 5' UTR alternative splicing. Biosurfer's detailed tracking of nucleotide-to-residue relationships helped reveal an uncommonly tracked source of single amino acid residue changes arising from the codon splits at junctions. For 17% of internal sequence changes, such split codon patterns lead to single residue differences, termed "ragged codons". Of variable C-termini, 72% involve splice- or intron retention-induced reading frameshifts. We found an unusual pattern of reading frame changes, in which the first frameshift is closely followed by a distinct second frameshift that restores the original frame, which we term a "snapback" frameshift. We analyzed long read RNA-seq-predicted proteome of a human cell line and found similar trends as compared to our GENCODE analysis, with the exception of a higher proportion of isoforms predicted to undergo nonsense-mediated decay. Biosurfer's comprehensive characterization of long-read RNA-seq datasets should accelerate insights of the functional role of protein isoforms, providing mechanistic explanation of the origins of the proteomic diversity driven by the alternative splicing. Biosurfer is available as a Python package at https//github.com/sheynkman-lab/biosurfer.
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos