Agile Intelligent Information Technology for Speech Synthesis Based on Transfer Function Approximation Methods Using Continued Fractions

PDF (944KB), PP.1-28

Views: 0 Downloads: 0

Author(s)

Lyubomyr Chyrun 1 Victoria Vysotska 2 Sofia Chyrun 3 Zhengbing Hu 4 Yuriy Ushenko 5,6,* Dmytro Uhryn 6

1. Applied Mathematics Department, Faculty of Applied Mathematics and Informatics, Ivan Franko National University of Lviv, Lviv, 79000, Ukraine

2. Lviv Polytechnic National University, Lviv, 79013, Ukraine

3. Telecommunication Department, Lviv Polytechnic National University, Lviv, 79013, Ukraine

4. School of Computer Science, Hubei University of Technology, Wuhan, China

5. Department of Physics, Shaoxing University, Shaoxing, Zhejiang Province 312000, China

6. Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58012, Ukraine

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2025.02.01

Received: 20 Jan. 2025 / Revised: 22 Feb. 2025 / Accepted: 18 Mar. 2025 / Published: 8 Apr. 2025

Index Terms

Text-To-Speech, NLP, Speech Recognition, Continued Fractions, Speech Recognition Systems, Speech Synthesis Systems, Autoregressive Modelling, Formant Synthesis, Continued Fraction Approximation

Abstract

The study considers the methodology of using continued fractions to approximate transfer functions in speech synthesis systems. The main results of the research are an increase in the accuracy of approximation, acceleration of calculations, and a new method of convergence analysis. The use of continued fractions allowed for a reduction in the error of approximation of transfer functions compared to classical methods. With an error of 1.0E-06, the continued fraction method requires only 3–13 terms, while the power series requires 3–15 terms. The use of continued fractions reduced the time for calculating transfer functions by 2–3%. It was determined that the most effective for calculating the values of continued fractions are the Δ-algorithm and the α-algorithm. A new criterion for the convergence of continued fractions is proposed, which allows the sum fractions that are "divergent" in the classical sense. The graphs used to classify different types of continued fractions allowed us to better understand their structure and potential for application in speech synthesis. Software for calculating transfer function values based on continued fraction decomposition has been developed and tested. It has allowed automation of the approximation process and increased the efficiency of speech synthesis systems. The results obtained have allowed improving the quality of synthesised speech while simultaneously reducing the complexity of calculations. Systems using continued fractions consume less memory and provide more accurate voice reproduction. In summary, the work presents a new approach to the approximation of transfer functions, which is essential for optimising speech synthesis systems.

Cite This Paper

Lyubomyr Chyrun, Victoria Vysotska, Sofia Chyrun, Zhengbing Hu, Yuriy Ushenko, Dmytro Uhryn, "Agile Intelligent Information Technology for Speech Synthesis Based on Transfer Function Approximation Methods Using Continued Fractions", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.17, No.2, pp. 1-28, 2025. DOI:10.5815/ijigsp.2025.02.01

Reference

[1]Hayes, B., Shier, J., Fazekas, G., McPherson, A., & Saitis, C. (2024). A review of differentiable digital signal processing for music and speech synthesis. Frontiers in Signal Processing, 3, 1284100.
[2]Chyrun, L. (2020). Model of Adaptive Language Synthesis Based On Cosine Conversion Furies with the Use of Continuous Fractions. In COLINS (pp. 600-611). 
[3]Vysotska, V., Chyrun, L., Chyrun, S., & Soltys, M. (2024). Information technology for textual content author's gender and age determination based on machine learning. In CEUR Workshop Proceedings.
[4]Koshtura, D., Andrunyk, V., & Shestakevych, T. (2021). Development of a Speech-to-Text Program for People with Haring Impairments. In MoMLeT+ DS (pp. 565-583).
[5]Vysotska, V., Chyrun, L., Chyrun, S., & Holets, I. (2024). Information technology for identifying disinformation sources and inauthentic chat users' behaviours based on machine learning. In CEUR Workshop Proceedings (Vol. 3723, pp. 466-483).
[6]Motyka, V., Vysotska, V., Chyrun, L., Vlasenko, O., Holoshchuk, R., & Nagachevska, O. (2023, October). Information technology of transcribing Ukrainian-language content based on deep learning. In 2023 IEEE 18th International Conference on Computer Science and Information Technologies (CSIT) (pp. 1-6). IEEE.
[7]Vysotska, V. (2024). Modern State and Prospects of Information Technologies Development for Natural Language Content Processing. In COLINS (2) (pp. 198-234).
[8]Vysotska, V., Nagachevska, O., Mozol, N., Chyrun, S., Chyrun, L., & Prokipchuk, O. (2024, October). Identifying Accent and Origin of English Language Users by Voice and Text Analysis, NLP and Machine Learning Technology. In 2024 IEEE 17th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET) (pp. 1-6). IEEE.
[9]Dmytriv, A., Vysotska, V., & Bublyk, M. (2021, September). The Speech Parts Identification for Ukrainian Words Based on VESUM and Horokh Using. In 2021 IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT) (Vol. 2, pp. 21-33). IEEE.
[10]Tepljakov, A., et al. (2021). Towards industrialisation of FOPID controllers: A survey on milestones of fractional-order control and pathways for future developments. IEEE Access, 9, 21016-21042.
[11]Yanarateş, C., Okur, S., & Altan, A. (2023). Performance analysis of digitally controlled nonlinear systems considering time delay issues. Heliyon, 9(10).
[12]Neo, V. W., Redif, S., McWhirter, J. G., Pestana, J., Proudler, I. K., Weiss, S., & Naylor, P. A. (2023). Polynomial eigenvalue decomposition for multichannel broadband signal processing: A mathematical technique offering new insights and solutions. IEEE Signal Processing Magazine, 40(7), 18-37.
[13]Viera-Martin, E., Gómez-Aguilar, J. F., Solís-Pérez, J. E., Hernández-Pérez, J. A., & Escobar-Jiménez, R. F. (2022). Artificial neural networks: a practical review of applications involving fractional calculus. The European Physical Journal Special Topics, 231(10), 2059-2095.
[14]Poletti, M. A., & Teal, P. D. (2021). A superfast Toeplitz matrix inversion method for single-and multi-channel inverse filters and its application to room equalisation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3144-3157.
[15]Muresan, C. I., Birs, I. R., Dulf, E. H., Copot, D., & Miclea, L. (2021). A review of recent advances in fractional-order sensing and filtering techniques. Sensors, 21(17), 5920.
[16]Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023, July). Robust speech recognition via large-scale weak supervision. In International conference on machine learning (pp. 28492-28518). PMLR.
[17]Mahata, S., Herencsar, N., & Kubanek, D. (2021). Optimal approximation of fractional-order Butterworth filter based on weighted sum of classical Butterworth filters. IEEE Access, 9, 81097-81114.
[18]Wen, C., Huang, Y., & Davidson, T. N. (2023). Efficient transceiver design for MIMO dual-function radar-communication systems. IEEE Transactions on Signal Processing, 71, 1786-1801.