gSemSim: Semantic Similarity Measure for Intra Gene Ontology Terms

Full Text (PDF, 356KB), PP.32-40

Views: 0 Downloads: 0

Author(s)

Muhammad Naeem 1,* Saira Gillani 2 Muhammad Abdul Qadir 1 Sohail Asghar 3

1. Department of Computer Science, M. A. Jinnah University, Islamabad, Pakistan

2. Centre of Research in Networks & Telecom (CoReNeT), M. A. Jinnah University Islamabad, Pakistan

3. University Institute of IT PMAS-Arid Agriculture University, Rawalpindi Pakistan

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2013.06.05

Received: 11 Aug. 2012 / Revised: 8 Jan. 2013 / Accepted: 26 Feb. 2013 / Published: 8 May 2013

Index Terms

Semantic Similarity Measures, Intra-Ontology Similarity, Gene Annotation

Abstract

Gene Ontology (GO) is an important bioinformatics scheme to unify the representation of gene and gene product attributes across all species. Measuring similarity or distance between GO terms is a key step for determining hidden relationship between genes. The notion of similarity between GO terms is a usual step in knowledge discovery related tasks. In literature various similarity measures between GO terms have been proposed. We have introduced a novel similarity measure scheme to improve three conventional similarity measures to reduce their limitations. The salient feature of the proposed GO Semantic Similarity (gSemSim) measure is its ability to show more realistic similarity between concepts in perspective of domain knowledge. A comparative result with other technique has also been presented that showing an improved contextual meaning of the proposed semantic similarity. This study is expected to assist the community of bio informaticians in the selection of better similarity measure required for correct annotations of genes in gene ontology.

Cite This Paper

Muhammad Naeem, Saira Gillani, Muhammad Abdul Qadir, Sohail Asghar, "gSemSim: Semantic Similarity Measure for Intra Gene Ontology Terms", International Journal of Information Technology and Computer Science(IJITCS), vol.5, no.6, pp.32-40, 2013. DOI:10.5815/ijitcs.2013.06.05

Reference

[1]The Gene Ontology, http://www.geneontology.org/, accessed on June, 2012.

[2]F. Cuoto, M. Silva, “Mining the BioLiterature: towards automatic annotation of genes and proteins”, Advanced Data Mining Technologies in Bioinformatics, Idea Group Inc., 2006.

[3]Altschul, S. F., Madden, T. L., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17), 389–402.

[4]Al-Shahrour, F., Diaz-Uriarte, R., & Dopazo, J. (2004). Fatigo: A web tool for finding significant associations of gene ontology terms with groups of genes. Bioinformatics, 20, 578–580.

[5]King, O. D., Lee, J. C., Dudley, A. M., Janse, D. M., Church, G. M., & Roth, F. P. (2003). Predicting phenotype from patterns of annotation. Bioinformatics, 19(Suppl. 1), 183–189.

[6]Chang, J., Raychaudhuri, S., & Altman, R. (2001). Including biological literature improves homology search. Pacific Symposium on Biocomputing, 6, 374–383.

[7]Potential Benefits of Human Genome Project Research, Department of Energy, Human Genome Project Information. 2009-10-09. accessed on June, 2011.

[8]A. Tversky, “Features of Similarity”, Psychological Review, 84 (2), pp. 327-352, 1977.

[9]P. Resnik, “Using information content to evaluate semantic similarity in a taxonomy”, Proceedings of the 14th International Joint Conference on Artificial Intelligence, 1995.

[10]J. Jiang, D. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy”, Proceedings of the 10th International Conference on Research on Computational Linguistics, 1997.

[11]D. Lin, “An information-theoretic definition of similarity”, Proceedings of the 15th International Conference on Machine Learning, 1998.

[12]G. Hirst and D. St-Onge, “Lexical chains as representations of context for the detection and correction of malapropisms”, Fellbaum 1998, pp.305–332.

[13]A. Amir, D. Lipika, “A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set”, Pattern Recognition Letters Vol. 28 (2007) Pp. 110–118

[14]V. Ganti, J.E. Gekhre, R. Ramakrishnan, “CACTUS-Clustering data using summaries”. 2nd International Conf. on Knowledge Discovery and Data Mining (KDD), 1999, pp. 311–314.

[15]C. Pesquita, D. Pessoa, D. Faria, F. Couto, CESSM: “Collaborative Evaluation of Semantic Similarity Measures”, BMC Bioinformatics 9. 2009

[16]C. Pesquita, D. Faria, H. Bastos, A.O. Falcao, F. Couto, “Metrics for GO-based protein semantic similarity: a systematic evaluation”, BMC Bioinformatics 9 (2008).

[17]M.A. Rodr´ıguez, M.J. Egenhofer, “Comparing geospatial entity classes: An asymmetric and context-dependent similarity measure”, International Journal of Geographical Information Science 18(3) (April-May 2004)

[18]P. Lord, R. Stevens, A. Brass, C. Goble, “Semantic similarity measures as tools for exploring the Gene Ontology”, Proceedings of the 8th Pacific Symposium on Biocomputing, 2003.

[19]P. Lord, R. Stevens, A. Brass, C. Goble, “Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation”, Bioinformatics 19 (10) (2003) 1275–1283.

[20]C. Keßler, “Similarity measurement in context”, 6th International and Interdisciplinary Conference, CONTEXT 2007, B. Kokinov, D. C.

[21]M. Ricklefs and E. Blomqvist, “Ontology-Based Relevance Assessment: An Evaluation of Different Semantic Similarity Measures”, OTM 2008, Part II, LNCS 5332, pp. 1235–1252

[22]P. Raftopoulou, E. Petrakis, “Semantic Similarity Measures: a Comparison Study. Technical report”, Technical University of Crete. Department of Electronic and Computer Engineering (January 2005)

[23]S.A.H. Shah, A. Khalid, M.A. Qadir, “OntoFetcher: An Approach for Query Generation to Gather Ontologies and Ranking hem by Ensuring User's Context”, 2008 International Conference on Emerging Technologies IEEE-ICET 2008 Rawalpindi, Pakistan, 18-19 October, 

[24]Hvidsten, T., Lagreid, A., & Komorowski, J. (2003). Learning rule-based models of biological process from gene expression time profiles using Gene Ontology. Bioinformatics, 19, 1116–1123.