Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning.
J Proteome Res
; 15(6): 1747-53, 2016 06 03.
Article
in En
| MEDLINE
| ID: mdl-27142340
ABSTRACT
The vast majority of human multiexon genes undergo alternative splicing and produce a variety of splice variant transcripts and proteins, which can perform different functions. These protein-coding splice variants (PCSVs) greatly increase the functional diversity of proteins. Most functional annotation algorithms have been developed at the gene level; the lack of isoform-level gold standards is an important intellectual limitation for currently available machine learning algorithms. The accumulation of a large amount of RNA-seq data in the public domain greatly increases our ability to examine the functional annotation of genes at isoform level. In the present study, we used a multiple instance learning (MIL)-based approach for predicting the function of PCSVs. We used transcript-level expression values and gene-level functional associations from the Gene Ontology database. A support vector machine (SVM)-based 5-fold cross-validation technique was applied. Comparatively, genes with multiple PCSVs performed better than single PCSV genes, and performance also improved when more examples were available to train the models. We demonstrated our predictions using literature evidence of ADAM15, LMNA/C, and DMXL2 genes. All predictions have been implemented in a web resource called "IsoFunc", which is freely available for the global scientific community through http//guanlab.ccmb.med.umich.edu/isofunc .
Key words
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Main subject:
Protein Isoforms
/
Molecular Sequence Annotation
Type of study:
Prognostic_studies
Limits:
Humans
Language:
En
Journal:
J Proteome Res
Journal subject:
BIOQUIMICA
Year:
2016
Type:
Article
Affiliation country:
United States