|Title||Revisiting the identification of canonical splice isoforms through integration of functional genomics and proteomics evidence.|
|Publication Type||Journal Article|
|Year of Publication||2014|
|Authors||Li H-D, Menon R, Omenn GS, Guan Y|
|Volume||(Hong-dong Li was selected as the winner of the AB SCIEX Young Investigator Award for this work.)|
|Date Published||2014 Sep 29|
Canonical isoforms in different databases have been defined as the most prevalent, most conserved, most expressed, longest, or the one with the clearest description of domains or post-translational modifications. In this article, we revisit these definitions of canonical isoforms based on functional genomics and proteomics evidence, focusing on mouse data. We report a novel functional relationship network-based approach for identifying the Highest Connected Isoforms (HCIs). We show that 46% of these HCIs are not the longest transcripts. In addition, this approach revealed many genes that have more than one highly connected isoforms. Averaged across 175 RNA-seq datasets covering diverse tissues and conditions, 65% of the HCIs show higher expression levels than non-highest connected isoforms (NCIs) at the transcript level. At the protein level, these HCIs highly overlap with the expressed splice variants, based on proteomic data from eight different normal tissues. These results suggest that a more confident definition of canonical isoforms can be made through integration of multiple lines of evidence, including highest connected isoforms defined by biological processes and pathways, expression prevalence at the transcript level, and relative or absolute abundance at the protein level. This integrative proteogenomics approach can successfully identify principal isoforms that are responsible for the canonical functions of genes. This article is protected by copyright. All rights reserved.