Agile Intelligent Information Technology for Speech Synthesis Based on Transfer Function Approximation Methods Using Continued Fractions

By Lyubomyr Chyrun Victoria Vysotska Sofia Chyrun Zhengbing Hu Yuriy Ushenko Dmytro Uhryn

DOI: https://doi.org/10.5815/ijigsp.2025.02.01, Pub. Date: 8 Apr. 2025

The study considers the methodology of using continued fractions to approximate transfer functions in speech synthesis systems. The main results of the research are an increase in the accuracy of approximation, acceleration of calculations, and a new method of convergence analysis. The use of continued fractions allowed for a reduction in the error of approximation of transfer functions compared to classical methods. With an error of 1.0E-06, the continued fraction method requires only 3–13 terms, while the power series requires 3–15 terms. The use of continued fractions reduced the time for calculating transfer functions by 2–3%. It was determined that the most effective for calculating the values of continued fractions are the Δ-algorithm and the α-algorithm. A new criterion for the convergence of continued fractions is proposed, which allows the sum fractions that are "divergent" in the classical sense. The graphs used to classify different types of continued fractions allowed us to better understand their structure and potential for application in speech synthesis. Software for calculating transfer function values based on continued fraction decomposition has been developed and tested. It has allowed automation of the approximation process and increased the efficiency of speech synthesis systems. The results obtained have allowed improving the quality of synthesised speech while simultaneously reducing the complexity of calculations. Systems using continued fractions consume less memory and provide more accurate voice reproduction. In summary, the work presents a new approach to the approximation of transfer functions, which is essential for optimising speech synthesis systems.

[...] Read more.

A Novel Local Adaptive Percentage Split Distribution Method for Image Binarization and Classification

By Joy Christy A. Umamakeswari A. Shanthi P. Srilakshmi A. Siva Chandrasekaran

DOI: https://doi.org/10.5815/ijigsp.2025.02.02, Pub. Date: 8 Apr. 2025

Binary thresholding methods separate image pixels into two groups as 0s or 1s. The two types of binary thresholding methods are global thresholding and local thresholding. Global thresholding methods are appropriate for binarizing the images that has smooth and contrast distribution of pixels. The performance of global thresholding struggles with distorted and tampered images as it introduces additional noise and causes variation in contrast and illumination. Local adaptive thresholding methods address the issue with every pixel a threshold based on the contrast distribution of neighboring pixels. This paper introduces Local Adaptive Percentage Split Distribution (LAPSD) method for binarization. LAPSD computes threshold based on percentage wise split of neighboring pixels. The performance of LAPSD is compared with benchmark binary thresholding methods such Bradley’s, Niblack’s, and Sauvola’s against PSNR, SSIM and MSE metrics. The accuracy of LAPSD image binarization is measured using Convolution Neural Network (CNN) models and the results prove that the performance of the proposed method surpasses traditional methods in all means.

[...] Read more.

A Novel Approach for Enhancing COVID-19 Diagnosis Accuracy through Graph Neural Networks Using Respiratory Sound Data

By Nagaraju Sonti Rukmini M. S. S. Venkatesh Munagala

DOI: https://doi.org/10.5815/ijigsp.2025.02.03, Pub. Date: 8 Apr. 2025

This research presents a groundbreaking method using graph neural networks (GNN) for the accurate identification of COVID-19 through the analysis of respiratory sounds. The method utilizes advanced signal processing and machine learning techniques, including Fast Fourier Transforms (FFTs), Mel-spectrograms, and GNN methodology. FFTs are used as a preprocessing step to convert raw respiratory sound signals into frequency-domain representations, enhancing signal quality and isolating informative acoustic patterns. Mel-spectrograms are used to extract essential feature vectors for diagnostic classification, enhancing the model's ability to discern subtle patterns indicative of COVID-19 infection.
The GNN methodology feeds preprocessed audio features into a graph neural network architecture, which excels at capturing complex relationships and dependencies within data by modeling them as graphs. In this context, respiratory sound data is represented as a graph, with nodes corresponding to specific audio features and edges representing relationships between them. The GNN effectively learns to propagate information across the graph, enabling it to identify meaningful patterns indicative of COVID-19 infection. The research findings show that GNN surpasses convolutional neural network (CNN) in terms of accuracy, precision, recall, and F1 score, indicating significant progress in the application of GNN in medical diagnostics. The study provides a comprehensive examination of the possibilities of using advanced neural network techniques to transform disease detection and diagnosis, with a validation accuracy of up to 97% under rigorous constraints.

[...] Read more.

Brain Tissue Segmentation from the MR Images Affected by Noise and Intensity Inhomogeneity Using a Novel Linguistic Fuzzifier-Based FCM Algorithm

By Sandhya Gudise

DOI: https://doi.org/10.5815/ijigsp.2025.02.04, Pub. Date: 8 Apr. 2025

Brain MRI is mainly affected by noise and intensity inhomogeneity (IIH) during its acquisition. Brain tissue segmentation plays an important role in biomedical research and clinical applications. Brain tissue segmentation is essential for physicians for the proper diagnosis and right treatment of brain-related disorders. Fuzzy C-means (FCM) clustering is one of the widely used algorithms for brain tissue segmentation. Traditional FCM has the limitations of misclassification of pixels that leads to inaccurate cluster centers. Due to this, it is unable to address the issues of noise and IIH. In FCM there exists uncertainty in controlling the fuzziness of the clusters as the fuzzifier is fixed. This paper proposed a novel linguistic fuzzifier-based FCM (LFFCM) to overcome the limitations of traditional FCM during brain tissue segmentation from the MR images. In this method, a linguistic fuzzifier is used instead of a fixed fuzzifier. The spatial information incorporated in the membership function can reduce the misclassification of pixels. The proposed LFFCM can handle IIH, due to having highly accurate cluster centers. The inclusion of the adaptive weights in the membership function results in accurate cluster centers. Various brain MR images are used to evaluate the proposed technique and the results are compared with some state-of-the-art techniques. The results reveal that the proposed method performed better than the other.

[...] Read more.

Design of an Efficient UNet-Based Transfer Learning Model for Enhancing Skin Cancer Segmentation and Classification Performance

By Namrata Verma Pankaj Kumar Mishra

DOI: https://doi.org/10.5815/ijigsp.2025.02.05, Pub. Date: 8 Apr. 2025

Accurate and efficient segmentation and classification are indispensable for the early diagnosis and treatment of skin cancer, a common and potentially fatal condition. Combining the UNet architecture with Auto Encoders for robust skin cancer segmentation, followed by binary cascade Convolutional Neural Networks (CNNs). In this text, we present a novel method for accurately classifying melanoma and basal cell carcinoma. Existing models are limited in their ability to achieve high precision, accuracy, and recall rates while maintaining a high Peak Signal-to-Noise Ratio (PSNR) for accurate image reconstruction, which necessitates this research. Our proposed model overcomes these limitations and performs exceptionally well on datasets: ISIC, HAM10000, PH2 Dataset, and Dermofit Image Libraries. When UNet and Auto Encoders are used, the advantages of both architectures are combined. The UNet architecture, renowned for its superior performance in image segmentation tasks, provides a solid foundation for separating skin cancer regions from surrounding tissue. The Auto Encoder component simultaneously facilitates feature extraction and image reconstruction, leading to improved representation learning and segmentation results. Utilizing the complementary capabilities of these models, our method improves the accuracy and efficiency of skin cancer segmentations. Using binary cascade CNNs for classification also improves our model's performance. The binary cascade architecture employs a hierarchical classification method that iteratively improves classification choices at each stage. This facilitates the differentiation between basal cell carcinoma, melanoma, and melanocytic nevi, resulting in highly accurate and trustworthy predictions. Extensive experiments were conducted on the ISIC, HAM10000, PH2 Dataset, and Dermofit Image Library to evaluate the performance of our proposed model. The achieved precision of 99.2%, accuracy of 98.3%, recall of 98.9%, and PSNR greater than 42dB demonstrate the superior functionality and effectiveness of our strategy. These results suggest that our model has a great deal of potential for assisting dermatologists in the early identification and classification of skin cancer, ultimately leading to improved patient outcomes. The combination of UNet with Auto Encoders and binary cascade CNNs has proven effective for segmenting and classifying skin cancer. Our proposed model outperforms current methods in terms of precision, accuracy, recall, and PSNR, demonstrating its potential to have a significant impact on the field of dermatology and aid in the early detection and treatment of skin cancers.

[...] Read more.

CNN and GAN Based Stroke Detection Using CT Scan Images

By Archana Chaudhari Atharva Rajadhyaksha Sharvil Patil Himanshu Pawar

DOI: https://doi.org/10.5815/ijigsp.2025.02.06, Pub. Date: 8 Apr. 2025

The objective of the research work is to detect stroke using CT scan images. In the research work an analysis of 3D CNN method for stroke detection is presented. The work also presents a new method of stroke detection using semi-supervised Adversarial Networks (SGAN).3D CNN is the traditional approach to any type of image classification problem. But being data-hungry, it becomes difficult to use them when data is scarce. High-quality medical data is difficult to find and hence alternative approaches seem worth approaching. The relatively new GANs can generate images like the training images, and its SGAN variant can use these generated images for training the classifier. We investigate the usefulness of SGANs comparatively with CNNs in this paper. The proposed SGAN method is compared with state of art methods in literature using accuracy, sensitivity and specificity. The SGAN method demonstrates an accuracy of 93%, Sensitivity of 100% and Specificity of 90%. For small data sets in medical imaging the proposed SGAN method exhibit an encouraging performance as compared to other methods using large datasets. In the research paper, we propose methodologies for detecting strokes by using 2 approaches: 3D CNNs and SGANs. The relatively new GANs can generate images like the training images, and its SGAN variant can use these generated images for training the classifier. We investigate the usefulness of SGANs comparatively with CNNs in this paper.

[...] Read more.

Voice Comparison Using Acoustic Analysis and Generative Adversarial Network for Forensics

By Kruthika S. G Trisiladevi C Nagavi P. Mahesha Abhishek Kumar

DOI: https://doi.org/10.5815/ijigsp.2025.02.07, Pub. Date: 8 Apr. 2025

Forensic Voice Comparison (FVC) is a scientific analysis that examines audio recordings to determine whether they come from the same or different speakers in digital forensics. In this research work, the experiment utilizes three different techniques, like pre-processing, feature extraction, and classification. In preprocessing, the stationery noise reduction algorithm is used to remove unwanted background noise by increasing the clarity of the speech. This in turn helps to improve the overall audio quality by reducing distractions. Further, acoustic features like Mel Frequency Cepstral Coefficients (MFCC) are used to extract relevant and distinctive features from audio signals to characterize and analyze the unique vocal patterns of different individual. Later, the Generative Adversarial Network (GAN) is used to generate synthetic MFCC features and also for augmenting the data samples. Finally, the Logistic Regression (LR) is realized using UK framework for the classification of the model to predict whether the result is true or false. The results achieved in terms of accuracy are 62% considering 3899 samples and 85% when considering set of 985 samples for the Australian English datasets.

[...] Read more.

Performance Comparison and Investigation of Tropical Cyclone Intensity Estimation from Satellite Images Using Deep Learning and Machine Learning

By Md. Ahsan Rahat Nusrat Sharmin Fairooz Nawar Nawme Sabbir Rahman

DOI: https://doi.org/10.5815/ijigsp.2025.02.08, Pub. Date: 8 Apr. 2025

Tropical cyclones, considered extreme weather events, can cause significant damage to coastal areas, impacting millions of people and animals while also posing the risk of substantial economic losses. Traditionally, the Dvorak technique has been employed to assess the intensity of these cyclones, involving the visual analysis of satellite data to evaluate the storm’s cloud patterns and strength. In recent years, various studies have explored the use of deep learning (DL) and machine learning (ML) techniques to estimate tropical cyclone intensity. However, there is a lack of research providing a comparative analysis that integrates both ML and DL approaches for the estimation of tropical cyclone intensity. This study looks into the use of ML and DL techniques to estimate the strength of tropical cyclones. On diverse datasets and satellite imagery, we study the usage of convolutional neural networks (CNN, VGG16, DenseNet), recurrent neural networks (LSTM), and other machine learning methods (XGBoost, CatBoost, SVM, DT). Our findings suggest that both ML and DL methods have substantial promise for improving tropical cyclone intensity estimation accuracy; however, in our case study, DL algorithms outperformed ML algorithms. This study investigates the utilization of ML and DL techniques in assessing the strength of tropical cyclones. Employing various datasets and satellite imagery, we examine the performance of convolutional neural networks (CNNs such as VGG16 and DenseNet), recurrent neural networks (LSTM), and other ML methods (XGBoost, CatBoost, SVM, DT). Our results indicate that both ML and DL approaches show significant promise in enhancing the accuracy of tropical cyclone intensity estimation. Nevertheless, in our specific case study, DL algorithms demonstrated superior performance compared to ML algorithms.

[...] Read more.

International Journal of Image, Graphics and Signal Processing (IJIGSP)

MECS Press Journal

Table Of Contents

Agile Intelligent Information Technology for Speech Synthesis Based on Transfer Function Approximation Methods Using Continued Fractions

A Novel Local Adaptive Percentage Split Distribution Method for Image Binarization and Classification

A Novel Approach for Enhancing COVID-19 Diagnosis Accuracy through Graph Neural Networks Using Respiratory Sound Data

Brain Tissue Segmentation from the MR Images Affected by Noise and Intensity Inhomogeneity Using a Novel Linguistic Fuzzifier-Based FCM Algorithm

Design of an Efficient UNet-Based Transfer Learning Model for Enhancing Skin Cancer Segmentation and Classification Performance

CNN and GAN Based Stroke Detection Using CT Scan Images

Voice Comparison Using Acoustic Analysis and Generative Adversarial Network for Forensics

Performance Comparison and Investigation of Tropical Cyclone Intensity Estimation from Satellite Images Using Deep Learning and Machine Learning