IJISA Vol. 7, No. 1, 8 Dec. 2014
Cover page and Table of Contents: PDF (size: 425KB)
Urdu Prediction Interface, N-Gram Language Model, QASKU, Word and Sequence Prediction, Corpus Based Application
This work lays down a foundation for text prediction of an inflected and under-resourced language Urdu. The interface developed is not limited to a T9 (Text on 9 keys) application used in embedded devices, which can only predict a word after typing initial characters. It is capable of predicting a word like T9 and also a sequence of word after a word in a continuous manner for fast document typing. It is based on N-gram language model. This stochastic interface deals with three N-gram levels from unary to ternary independently. The uni-gram mode is being in use for applications like T9, while the bi-gram and tri-gram modes are being in use for sentence prediction. The measures include a percentage of keystrokes saved, keystrokes until completion and a percentage of time saved during the typing. Two different corpora are merged to build a sufficient amount of data. The test data is divided into a test and a held out data equally for an experimental purpose. This whole exercise enables the QASKU system outperforms the FastType with almost 15% more saved keystrokes.
Qaiser Abbas, "A Stochastic Prediction Interface for Urdu", International Journal of Intelligent Systems and Applications(IJISA), vol.7, no.1, pp.94-100, 2015. DOI:10.5815/ijisa.2015.01.09
[1]Q. Abbas, “Building a Hierarchical Annotated Corpus of Urdu: The URDU.KON-TB Treebank”, Lecture Notes in Computer Science (LNCS), Vol. 7181(1), P 66-79, ISSN 0302-9743, Springer-Verlag Berlin/Heidelberg, 2012.
[2]Q. Abbas and A. N. Khan, “Lexical functional grammar for Urdu modal verbs” In Proceedings of 5th IEEE International Conference on Engineering and Technology (ICET), 2009.
[3]Q. Abbas, N. Karamat and S. Niazi, “Development Of Tree-Bank Based Probabilistic Grammar For Urdu Language” International Journal of Electrical & Computer Science, Vol. 9(09), pp. 231–235, 2009.
[4]C. Aliprandi, N. Carmignani, and P. Mancarella, “An Inflected-Sensitive Letter And Word Prediction System”, International Journal of Computing & Information Sciences, Vol. 5(2), pp. 79-85, 2007.
[5]C. Aliprandi, N. Carmignani, N. Deha, P. Mancarella, and M. Rubino, “Advances In NLP Applied To Word Prediction”, Langtech, 2008.
[6]C. Aliprandi, N. Carmignani, P. Mancarella, and M. Rubino, “A Word Predictor For Inflected Languages: System Design And User-Centric Interface”, In Proceedings of the 2nd IASTED International Conference on Human-Computer Interaction, March 2007.
[7]T. Bögel, M. Butt, and S. Sulger, “Urdu Ezafe And The Morphology-Syntax Interface”, In proceedings of LFG08, 2008.
[8]M. Butt, and T. Ahmed, “The Redevelopment Of Indo-Aryan Case Systems From A Lexical Semantic Perspective”, Morphology, Vol. 21(3-4), pp. 545-572, 2011.
[9]M. Butt, and T. H. King, “Restriction For Morphological Valency Alternations: The Urdu Causative” Intelligent linguistic architectures: Variations on themes by Ronald M. Kaplan, pp. 235-258, 2006.
[10]M. Butt, and G. Ramchand, “Complex Aspectual Structure In Hindi/Urdu” M. Liakata, B. Jensen, & D. Maillat, Eds, pp. 1-30, 2001.
[11]N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth, “How To Use Expert Advice”, Journal of the ACM (JACM), Vol. 44(3), pp. 427-485, 1997.
[12]A. DeSantis, G. Markowsky, and M. N. Wegman, “Learning Probabilistic Prediction Functions”, In 29th Annual Symposium on IEEE, Foundations of Computer Science, pp. 110-119, October 1988.
[13]A. Fazly, and G. Hirst, “Testing The Efficacy Of Part-Of-Speech Information In Word Completion”, In Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods, Association for Computational Linguistics, pp. 9-16, April 2003.
[14]M. Ijaz, “Urdu 5000 Most Frequent Words”, Technical report, Center for Research and Urdu Language Processing, National University of Computer & Emerging Sciences, Lahore, PK, 2007.
[15]C. D. Manning, and H. Schütze, Foundations Of Statistical Natural Language Processing, Cambridge: MIT press, 1999.
[16]F. C. Pereira, Y. Singer, and N. Tishby, “Beyond Word N-Grams”, In Natural Language Processing Using Very Large Corpora, pp. 121-136, Springer Netherlands, 1999.
[17]G. Raza, Sub-categorization Acquisition And Classes Of Predication In Urdu, PhD Thesis, 2011.
[18]D. D. Sleator, and R. E. Tarjan, “Self-Adjusting Binary Search Trees”, Journal of the ACM (JACM), Vol. 32(3), pp. 652-686, 1985.
[19]F. M. Willems, “The Context-Tree Weighting Method: Extensions”, IEEE Transactions on Information Theory, Vol. 44(2), pp. 792-798, 1998.
[20]I. J. Good, “The Population Frequencies Of Species And The Estimation Of Population Parameters”, Biometrika, Vol. 40(3-4), pp. 237-264, 1953.
[21]Jaber Karimpour,Ali A. Noroozi,Adeleh Abadi, "The Impact of Feature Selection on Web Spam Detection", IJISA, vol.4, no.9, pp.61-67, 2012.
[22]Souleymane KOUSSOUBE, Roger NOUSSI, Balira O. KONFE, "Using Description Logics to specify a Document Synthesis System", IJISA, vol.5, no.3, pp.13-22, 2013.DOI: 10.5815/ijisa.2013.03.02
[23]Leandro Luiz de Almeida, Maria Stela V. de Paiva, Francisco Assis da Silva, Almir Olivette Artero, "Super-resolution Image Created from a Sequence of Images with Application of Character Recognition", IJISA, vol.6, no.1, pp.11-19, 2014. DOI: 10.5815/ijisa.2014.01.02
[24]Q. Abbas, G. Raza, “A computational classification of Urdu dynamic copula verb”, International Journal of Computer Applications (IJCA), Vol. 85 (10), pp. 1-12, ISSN: 0975 - 8887, Published by Foundation of Computer Science, New York, USA, 2014.
[25]Q. Abbas, “Semi-Semantic Part of Speech Annotation and Evaluation”, In Proceedings of ACL 8th Linguistic Annotation Workshop held in conjunction with COLING, Association of Computational Linguistics, P 75-81, Ireland, 2014.
[26]Q. Abbas, M. S. Ahmed, S. Niazi, “Language Identifier for Languages of Pakistan Including Arabic and Persian”, International Journal of Computational Linguistics (IJCL), Vol. 01(03), P 27-35, ISSN 2180-1266, 2010.