Khan IK et al. (2017), DextMP: deep dive into text for predicting moon...

XB-ART-54002

Bioinformatics 2017 Jul 15;3314:i83-i91. doi: 10.1093/bioinformatics/btx231.

Show Gene links Show Anatomy links

DextMP: deep dive into text for predicting moonlighting proteins.

Khan IK , Bhuiyan M , Kihara D .

???displayArticle.abstract???
Motivation: Moonlighting proteins (MPs) are an important class of proteins that perform more than one independent cellular function. MPs are gaining more attention in recent years as they are found to play important roles in various systems including disease developments. MPs also have a significant impact in computational function prediction and annotation in databases. Currently MPs are not labeled as such in biological databases even in cases where multiple distinct functions are known for the proteins. In this work, we propose a novel method named DextMP, which predicts whether a protein is a MP or not based on its textual features extracted from scientific literature and the UniProt database. Results: DextMP extracts three categories of textual information for a protein: titles, abstracts from literature, and function description in UniProt. Three language models were applied and compared: a state-of-the-art deep unsupervised learning algorithm along with two other language models of different types, Term Frequency-Inverse Document Frequency in the bag-of-words and Latent Dirichlet Allocation in the topic modeling category. Cross-validation results on a dataset of known MPs and non-MPs showed that DextMP successfully predicted MPs with over 91% accuracy with significant improvement over existing MP prediction methods. Lastly, we ran DextMP with the best performing language models and text-based feature combinations on three genomes, human, yeast and Xenopus laevis , and found that about 2.5-35% of the proteomes are potential MPs. Availability and Implementation: Code available at http://kiharalab.org/DextMP . Contact: dkihara@purdue.edu.

???displayArticle.pubmedLink??? 28881966
???displayArticle.pmcLink??? PMC5870774
???displayArticle.link??? Bioinformatics
???displayArticle.grants??? [+]

Species referenced: Xenopus laevis
GO keywords: kinase activity

???attribute.lit??? ???displayArticles.show???

References [+] :

Campbell, Endocrine peptides 'moonlighting' as immune modulators: roles for somatostatin and GH-releasing factor. 1995, Pubmed

	Fig. 1. Distribution of the number of abstracts per protein. Black, MP; gray, non-MP in the control dataset. The first bar is for 1 and 2 abstracts, next bar is for 3 and 4 and so on
	Fig. 2. Schematic diagram of DextMP. The upper panel shows the text prediction process while the bottom panel is for the prediction model that uses predicted text labels to make the final MP/non-MP classification. P1, Protein 1, CL: Class Label
	Fig. 3. Word clouds of text information of moonlighting protein dataset. The size of a word in the visualization is proportional to the number of times the word appears in the input text. (A–C): titles, function descriptions and abstracts, respectively. The images were generated at http://www.wordle.net/
	Fig. 4. Protein-level cross-validation F-scores for weighted and non-weighted majority votes. Results for 21 (text type)-(language model)-(classifier) combinations are compared