A Comparison between Syllable, Di-Phone, and Phoneme-based Myanmar Speech Synthesis

Full Text (PDF, 495KB), PP.58-66

Views: 0 Downloads: 0

Author(s)

Aye Thida 1,* Chaw Su Hlaing 1

1. Artificial Intelligence Lab, Faculty of Computer Science, University of Computer Studies, Mandalay, Myanmar

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2018.11.06

Received: 30 Aug. 2018 / Revised: 5 Sep. 2018 / Accepted: 24 Sep. 2018 / Published: 8 Nov. 2018

Index Terms

Myanmar language, phoneme, concatenative speech synthesis

Abstract

Among the speech synthesis approach, concatenative method is one of the most popular method which can produce more natural sounding speech output. The most important challenge in this method is choosing an appropriate unit for creating a database. The present used speech units are word, syllable, di-phone, tri-phone and phoneme. The speech quality may be trade-off between the selected speech units. This paper presents the three speech synthesis system of Myanmar language, respectively based on syllable, di-phone and phoneme speech units by using concatenation method. Then, we compare the speech quality of the three systems, using the subjective tests.

Cite This Paper

Aye Thida, Chaw Su Hlaing, "A Comparison between Syllable, Di-Phone, and Phoneme-based Myanmar Speech Synthesis", International Journal of Information Technology and Computer Science(IJITCS), Vol.10, No.11, pp.58-66, 2018. DOI:10.5815/ijitcs.2018.11.06

Reference

[1]S. Lemmetty, 1999. Review of speech synthesis technology. Helsinki University of Technology, 320, pp.79-90.
[2]A. Black and N. Campbell, “Optimizing selection of units from speech database for concatenative synthesis,” Proceeding of EUROSPEECH’95, vol. 1, pp. 581-584, Sept. 1995.
[3]A. Conkie, “A robust unit selection system for speech synthesis,” Proc. of 137th meet. ASA/Forum Acusticum, 1999.
[4]A. J. Hunt and A. W. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” Proc. of ICASSP, vol. 1, pp. 373-376, 1996.
[5]T. Toda, H. Kawai, M. Tsuzaki and K. Shikano, “Unit selection algorithm for Japanese speech synthesis based on both phoneme unit and diphone unit,” Proc. of ICASSP, vol. 1, pp. 465-468, May 2002.
[6]M. Douke, M. Hayashi, and E. Makino, “A study of automatic program production using TVML,” Short Papers and Demos, Eurographics, pp. 42-45, 1999
[7]G. D. Ramteke, R. J. Ramteke, “Efficient Model for Numerical Text-To-Speech Synthesis System in Marathi, Hindi and English Languages”, International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.9, No.3, pp.1-13, 2017.DOI: 10.5815/ijigsp.2017.03.01
[8]E.B. Kasie and Y. Assabie, October. Concatenative speech synthesis for Amharic using unit selection method. In Proceedings of the International Conference on Management of Emergent Digital Eco Systems, pp. 27-31. ACM, 2012.
[9]N. K. Bakhsh, S. Alshomrani, Imtiaz Khan, “A Comparative Study of Arabic Text-to-Speech Synthesis Systems”, IJIEEB, vol.6, no.4, pp.27-31, 2014. DOI: 10.5815/ijieeb.2014.04.04
[10]F. Adeeba, T. Habib, S. Hussain, and K.S. Shahid, October. Comparison of Urdu text to speech synthesis using unit selection and HMM based techniques. In Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA), 2016 Conference of The Oriental Chapter of International Committee for (pp. 79-83). IEEE, 2016.
[11]A. Verma, D. K. Singh, “Robust Assistive Reading Framework for Visually Challenged”, International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.9, No.10, pp. 29-37, 2017.DOI: 10.5815/ijigsp.2017.10.04
[12]K. Y. Win and T. Takara, “Myanmar text-to-speech system with rule-based tone synthesis,” Acoustical Science and Technology, vol. 32, no. 5, pp. 174–181, 2011.
[13]E. P. P. Soe and A. Thida, “Text-to-speech synthesis for Myanmar language”, International Journal of Scientific & Engineering Research, vol. 4, no. 6, pp. 1509–1518, 2013.
[14]C.S. Hlaing and A. Thida, “Phoneme based Myanmar text to speech system”, International Journal of Advanced Computer Research, 8(34), pp.47-58, 2018
[15]Myanmar Language Commission, Myanmar Grammar, 30th Year Special Edition, University Press, Yangon, Myanmar, 2005.
[16]Z. M. Maung and Y. Mikami, “A rule-based syllable segmentation of Myanmar text”. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, 2008.
[17]T. Tun, “Acoustic phonetics and phonology of the Myanmar language”, School of Human Communication Sciences, La Trobe University, Melbourne, Australia, 2007.
[18]Schwarz, Diemo. “Current research in concatenative sound synthesis.” In International Computer Music Conference (ICMC), pp. 1-1. 2005.
[19]M. Karjalainen Review of Speech Synthesis Technology. Helsinki University of Technology, Department of Electrical and Communications Engineering. 1999 Mar 30.
[20]M. T. Noe, “Study of Myanmar Phonology”, 3rd Edition, University Press, Yangon, Myanmar, 2007.
[21]Clark, John; Yallop, Colin; Fletcher, Janet, “An Introduction to Phonetics and Phonology (3rd ed.)”. Massachusetts, USA; Oxford, UK; Victoria, Australia: Blackwell Publishing. ISBN 978-1-4051-3083-7, 2007.
[22]“Text to speech testing strategy, Version 2.1”, Technology Development for Indian Languages Programme DeitY, 07 July, 2014