International Journal of Intelligent Systems and Applications(IJISA)
ISSN: 2074-904X (Print), ISSN: 2074-9058 (Online)
Published By: MECS Press
IJISA Vol.7, No.1, Dec. 2014
A Stochastic Prediction Interface for Urdu
Full Text (PDF, 424KB), PP.94-100
This work lays down a foundation for text prediction of an inflected and under-resourced language Urdu. The interface developed is not limited to a T9 (Text on 9 keys) application used in embedded devices, which can only predict a word after typing initial characters. It is capable of predicting a word like T9 and also a sequence of word after a word in a continuous manner for fast document typing. It is based on N-gram language model. This stochastic interface deals with three N-gram levels from unary to ternary independently. The uni-gram mode is being in use for applications like T9, while the bi-gram and tri-gram modes are being in use for sentence prediction. The measures include a percentage of keystrokes saved, keystrokes until completion and a percentage of time saved during the typing. Two different corpora are merged to build a sufficient amount of data. The test data is divided into a test and a held out data equally for an experimental purpose. This whole exercise enables the QASKU system outperforms the FastType with almost 15% more saved keystrokes.
Cite This Paper
Qaiser Abbas,"A Stochastic Prediction Interface for Urdu", International Journal of Intelligent Systems and Applications(IJISA), vol.7, no.1, pp.94-100, 2015. DOI: 10.5815/ijisa.2015.01.09
Q. Abbas, “Building a Hierarchical Annotated Corpus of Urdu: The URDU.KON-TB Treebank”, Lecture Notes in Computer Science (LNCS), Vol. 7181(1), P 66-79, ISSN 0302-9743, Springer-Verlag Berlin/Heidelberg, 2012.
Q. Abbas and A. N. Khan, “Lexical functional grammar for Urdu modal verbs” In Proceedings of 5th IEEE International Conference on Engineering and Technology (ICET), 2009.
Q. Abbas, N. Karamat and S. Niazi, “Development Of Tree-Bank Based Probabilistic Grammar For Urdu Language” International Journal of Electrical & Computer Science, Vol. 9(09), pp. 231–235, 2009.
C. Aliprandi, N. Carmignani, and P. Mancarella, “An Inflected-Sensitive Letter And Word Prediction System”, International Journal of Computing & Information Sciences, Vol. 5(2), pp. 79-85, 2007.
C. Aliprandi, N. Carmignani, N. Deha, P. Mancarella, and M. Rubino, “Advances In NLP Applied To Word Prediction”, Langtech, 2008.
C. Aliprandi, N. Carmignani, P. Mancarella, and M. Rubino, “A Word Predictor For Inflected Languages: System Design And User-Centric Interface”, In Proceedings of the 2nd IASTED International Conference on Human-Computer Interaction, March 2007.
T. Bögel, M. Butt, and S. Sulger, “Urdu Ezafe And The Morphology-Syntax Interface”, In proceedings of LFG08, 2008.
M. Butt, and T. Ahmed, “The Redevelopment Of Indo-Aryan Case Systems From A Lexical Semantic Perspective”, Morphology, Vol. 21(3-4), pp. 545-572, 2011.
M. Butt, and T. H. King, “Restriction For Morphological Valency Alternations: The Urdu Causative” Intelligent linguistic architectures: Variations on themes by Ronald M. Kaplan, pp. 235-258, 2006.
M. Butt, and G. Ramchand, “Complex Aspectual Structure In Hindi/Urdu” M. Liakata, B. Jensen, & D. Maillat, Eds, pp. 1-30, 2001.
N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth, “How To Use Expert Advice”, Journal of the ACM (JACM), Vol. 44(3), pp. 427-485, 1997.
A. DeSantis, G. Markowsky, and M. N. Wegman, “Learning Probabilistic Prediction Functions”, In 29th Annual Symposium on IEEE, Foundations of Computer Science, pp. 110-119, October 1988.
A. Fazly, and G. Hirst, “Testing The Efficacy Of Part-Of-Speech Information In Word Completion”, In Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods, Association for Computational Linguistics, pp. 9-16, April 2003.
M. Ijaz, “Urdu 5000 Most Frequent Words”, Technical report, Center for Research and Urdu Language Processing, National University of Computer & Emerging Sciences, Lahore, PK, 2007.
C. D. Manning, and H. Schütze, Foundations Of Statistical Natural Language Processing, Cambridge: MIT press, 1999.
F. C. Pereira, Y. Singer, and N. Tishby, “Beyond Word N-Grams”, In Natural Language Processing Using Very Large Corpora, pp. 121-136, Springer Netherlands, 1999.
G. Raza, Sub-categorization Acquisition And Classes Of Predication In Urdu, PhD Thesis, 2011.
D. D. Sleator, and R. E. Tarjan, “Self-Adjusting Binary Search Trees”, Journal of the ACM (JACM), Vol. 32(3), pp. 652-686, 1985.
F. M. Willems, “The Context-Tree Weighting Method: Extensions”, IEEE Transactions on Information Theory, Vol. 44(2), pp. 792-798, 1998.
I. J. Good, “The Population Frequencies Of Species And The Estimation Of Population Parameters”, Biometrika, Vol. 40(3-4), pp. 237-264, 1953.
Jaber Karimpour,Ali A. Noroozi,Adeleh Abadi, "The Impact of Feature Selection on Web Spam Detection", IJISA, vol.4, no.9, pp.61-67, 2012.
Souleymane KOUSSOUBE, Roger NOUSSI, Balira O. KONFE, "Using Description Logics to specify a Document Synthesis System", IJISA, vol.5, no.3, pp.13-22, 2013.DOI: 10.5815/ijisa.2013.03.02
Leandro Luiz de Almeida, Maria Stela V. de Paiva, Francisco Assis da Silva, Almir Olivette Artero, "Super-resolution Image Created from a Sequence of Images with Application of Character Recognition", IJISA, vol.6, no.1, pp.11-19, 2014. DOI: 10.5815/ijisa.2014.01.02
Q. Abbas, G. Raza, “A computational classification of Urdu dynamic copula verb”, International Journal of Computer Applications (IJCA), Vol. 85 (10), pp. 1-12, ISSN: 0975 - 8887, Published by Foundation of Computer Science, New York, USA, 2014.
Q. Abbas, “Semi-Semantic Part of Speech Annotation and Evaluation”, In Proceedings of ACL 8th Linguistic Annotation Workshop held in conjunction with COLING, Association of Computational Linguistics, P 75-81, Ireland, 2014.
Q. Abbas, M. S. Ahmed, S. Niazi, “Language Identifier for Languages of Pakistan Including Arabic and Persian”, International Journal of Computational Linguistics (IJCL), Vol. 01(03), P 27-35, ISSN 2180-1266, 2010.