|Title||Genome-wide Functional Annotation of Human Protein-coding Splice Variants Using Multiple Instance Learning.|
|Publication Type||Journal Article|
|Year of Publication||2016|
|Authors||Panwar B, Menon R, Eksi R, Li H, Omenn GS, Guan Y|
|Journal||J Proteome Res|
|Date Published||2016 May 4|
The vast majority of human multi-exon genes undergo alternative splicing and produce a variety of splice variant transcripts and proteins, which can perform different functions. These protein-coding splice variants (PCSVs) greatly increase the functional diversity of proteins. Most functional annotation algorithms have been developed at the gene-level; the lack of isoform-level gold standards is an important intellectual limitation for currently available machine learning algorithms. The accumulation of a large amount of RNA-seq data in the public domain greatly increases our ability to examine the functional annotation of genes at isoform-level. In the present study, we used a multiple instance learning (MIL) based approach for predicting the function of PCSVs. We used transcript-level expression values and gene-level functional associations from the Gene Ontology database. A support vector machine (SVM)-based five-fold cross-validation technique was applied. Comparatively, genes with multiple PCSVs performed better than single PCSV genes and performance also improved when more examples were available to train the models. We demonstrated our predictions using literature evidence of ADAM15, LMNA/C, and DMXL2 genes. All predictions have been implemented in a web resource called 'IsoFunc', which is freely available for the global scientific community through http://guanlab.ccmb.med.umich.edu/isofunc.
|Alternate Journal||J. Proteome Res.|