Sushma Jaiswal

Work place: Dept. of CSIT, Guru Ghasidas Central University, Bilaspur, India

E-mail: jaiswal1302@gmail.com

Website: https://orcid.org/0000-0002-6253-7327

Research Interests: Image Processing, Image Manipulation, Image Compression, 2D Computer Graphics, Computer Graphics and Visualization, Computer Vision

Biography

Sushma Jaiswal is an Assistant Professor in the department of CSIT of Guru Ghasidas Central University, Bilaspur, (C.G.). She has completed Ph.D. in the field of image processing and her fields of interest are computer graphics and machine vision. She has teaching experience of 18 years, and she has published many papers in various national and international journals as well as presented her work at various conferences.

Author Articles
Optimized Image Captioning: Hybrid Transformers Vision Transformers and Convolutional Neural Networks: Enhanced with Beam Search

By Sushma Jaiswal Harikumar Pallthadka Rajesh P. Chinchewadi Tarun Jaiswal

DOI: https://doi.org/10.5815/ijisa.2024.02.05, Pub. Date: 8 Apr. 2024

Deep learning has improved image captioning. Transformer, a neural network architecture built for natural language processing, excels at image captioning and other computer vision applications. This paper reviews Transformer-based image captioning methods in detail. Convolutional neural networks (CNNs) extracted image features and RNNs or LSTM networks generated captions in traditional image captioning. This method often has information bottlenecks and trouble capturing long-range dependencies. Transformer architecture revolutionized natural language processing with its attention strategy and parallel processing. Researchers used Transformers' language success to solve image captioning problems. Transformer-based image captioning systems outperform previous methods in accuracy and efficiency by integrating visual and textual information into a single model. This paper discusses how the Transformer architecture's self-attention mechanisms and positional encodings are adapted for image captioning. Vision Transformers (ViTs) and CNN-Transformer hybrid models are discussed. We also discuss pre-training, fine-tuning, and reinforcement learning to improve caption quality. Transformer-based image captioning difficulties, trends, and future approaches are also examined. Multimodal fusion, visual-text alignment, and caption interpretability are challenges. We expect research to address these issues and apply Transformer-based image captioning to medical imaging and distant sensing. This paper covers how Transformer-based approaches have changed image captioning and their potential to revolutionize multimodal interpretation and generation, advancing artificial intelligence and human-computer interactions.

[...] Read more.
Linear Discriminate Analysis based Robust Watermarking in DWT and LWT Domain with PCA based Statistical Feature Reduction

By Sushma Jaiswal Manoj Kumar Pandey

DOI: https://doi.org/10.5815/ijigsp.2023.02.07, Pub. Date: 8 Apr. 2023

With aiming to design a novel image watermarking technique, this paper presents a novel method of image watermarking using lifting wavelet transform, discrete wavelet transform, and one-dimensional linear discriminate analysis. In this blind watermarking technique, statistical features of the watermarked image have been incorporated for preparing the training set and testing set. After that, the principal component analysis is applied to reduce the obtained feature set, so the training time is reduced to the desired level and accuracy is enhanced. The one-dimensional linear discriminate analysis is used for binary classification as it has the ability to classify with good accuracy. This technique applies discrete wavelet transform and lifting wavelet transform in two different watermarking schemes for the image transformation. Both transformations give higher tolerance against image distortion than other conventional transformation methods. One of the significant challenges of a watermarking technique is maintaining the proper balance between robustness and imperceptibility. The proposed blind watermarking technique exhibits the imperceptibility of 43.70 dB for Lena image in case of no attack for the first scheme (using LWT) and 44.71 dB for the second scheme (using DWT+LWT). The first watermarking scheme is tested for robustness, and it is seen that the given scheme is performing well against most of the image attacks in terms of robustness. This technique is compared using some existing similar watermarking methods, and it is found to be robust against most image attacks. It also maintains the excellent quality of the watermarked image.

[...] Read more.
Other Articles