Shreyas Reddy; Rashmi Ranjan Das; Anjali Mohapatra

An Integrated Pipeline with Internal Image Processing for Efficient Image to Text to Speech Conversion

Full Text (PDF, 483KB), PP.1-8

Views: 0 Downloads: 0

Author(s)

Shreyas Reddy ^1,* Rashmi Ranjan Das ¹ Anjali Mohapatra ¹

1. International Institute of Information Technology, Bhubaneswar, Odisha, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijem.2023.06.01

Received: 27 May 2023 / Revised: 1 Aug. 2023 / Accepted: 23 Aug. 2023 / Published: 8 Dec. 2023

Index Terms

Optical Character Recognition(OCR), Text-to-speech(TTS), Image processing, Character Error Rate(CER)

Abstract

Optical Character Recognition Systems (OCR) is a tool that helps computers read text from pictures of papers. It makes it easier for machines to understand what the words say without needing a person to read it out loud. It allows for easy digitizing of historical documents, archival material, and medical records thereby saving on their retrieval times. However, the accuracy of OCR systems heavily relies on the quality of the input images. To negate the contribution of the quality of input images to the accuracy of OCR systems, in this paper, we propose an integrated image pre-processing pipeline integrated with the OCR systems that enhances the quality of input images for efficient image to text conversion. This method results in an easily understandable text output with a lower Character Error Rate (CER) in comparison to the current methods. In addition, we explore a technique for converting text from a document or image into machine-readable form and then converting it to audio output using gTTS, a Python library that interfaces with Google Translate's text-to-speech API. We assess the effectiveness of this approach and illustrate that it substantially enhances OCR precision when compared to other existing methods. This paper presents a clear overview of the growth phases and significant obstacles, accompanied by compelling comparisons of results achieved through various methods.

Cite This Paper

Shreyas Reddy, Rashmi Ranjan Das, Anjali Mohapatra, "An Integrated Pipeline with Internal Image Processing for Efficient Image to Text to Speech Conversion", International Journal of Engineering and Manufacturing (IJEM), Vol.13, No.6, pp. 1-8, 2023. DOI:10.5815/ijem.2023.06.01

Reference

[1]Sonia Bhaskar, Nicholas Lavassar and Scott Green, Implementing Optical Character Recognition on the Android Operating System for Business Cards, EE 368 Digital Image Processing.
[2]Abdullah-Al Mahmud, Ahmed Sabbir Arif, Md. Mahbubur Rahman, and Muhammad Abul Hasan, ”Development of an intelligent text-to-speech (ITTTS) system for visually impaired people,” Journal of Assistive Technologies, vol. 11, no. 2, pp. 91-99, 2017
[3]Mishra, A., Tiwari, V. (2019). Usability and Accessibility Evaluation of Intelligent Text to Speech (ITTTS) Software for Visually Impaired Users. Journal of Accessibility and Design for All, 9(1), 106-129.
[4]Aditya Bakshi, Sunanda Gupta et al., “3T-FASDM: Linear Discriminant Analysis based 3-Tier Face Anti-Spoofing Detection Model using Support Vector”, International Journal of Communication Systems, Wiley, 2020, vol 33, issue 12.
[5]Aditya Bakshi, Sunanda Gupta “An Efficient Face Anti-Spoofing and Detection Model Using Image Quality Assessment Parameters” in Multimedia Tools and Applications, 2020.
[6]Shakti, Aditya Bakshi “An Optimal Energy Efficient Spatial-Temporal Correlation Method for Data Aggregation in Wireless Sensor Networks” published in International Journal of Control Theory and Applications, ISSN : 0974-5572,Number 45(2016).
[7]Aditya Bakshi, Sunanda Gupta “A Taxonomy on Biometric Security and its Applications” International Conference on Innovations in Information and Communication Technologies.
[8]Aditya Bakshi and Sunanda Gupta” A Comparative Analysis of Different Intrusion Detection Techniques in Cloud Computing” published in 2nd International Conference on Advanced Informatics for Computing Research ,2018, CCIS 956, pp. 358–378.
[9]Zheng, C., Wang, B., Liu, Y., Yang, M., Han, J. (2021). EasyOCR: End-to-End Scene Text Recognition. Pattern Recognition, 114, 107778. doi: 10.1016/j.patcog.2021.107778.
[10]Gao, Z., Yang, Y., Chen, Y., Deng, L., Wang, Y. (2020). EasyOCR: A Practical Scene Text Recognition System. In 2020 IEEE International Conference on Multimedia and Expo (ICME) (pp. 1-6). IEEE. doi: 10.1109/ICME46284.2020.9102593.
[11]https://www.kaggle.com/datasets/shreyaspj/tiocr
[12]https://pypi.org/project/img2speech/
[13]Chucai Yi & Yingli Tian, 2014 Scene Text Recognition in Mobile Applications by Character Descriptor and Structure Configuration, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 7, JULY 2014
[14]Julinda Gllavata', Ralph Ewerth' and Bemd Freisleben’ 2003 , A Robust Algorithm for Text Detection in Images, Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis (2003).

International Journal of Engineering and Manufacturing (IJEM)