Work place: A.K. Choudhury Institute of Technology, Calcutta University, Kolkata, India
E-mail: saptarsi007@gmail.com
Website:
Research Interests: Computational Science and Engineering, Computational Engineering, Software Construction, Software Development Process, Software Engineering, Data Structures and Algorithms
Biography
Saptarsi Goswami: He is an Assistant professor at A.K. Choudhury Institute of Technology, University of Calcutta, India and a Research Scholar at A. K. Choudhury School of Information Technology, University of Calcutta. He has 10 + Years of working experience in IT industry. His area of interest is feature selection, outlier detection, mining unstructured data etc. He has several publications in various reputed journals (like, Expert systems with Applications, ASEJ, JUCS etc.) and international conferences.
By Sourav Saha Saptarsi Goswami Priya Ranjan Sinha Mahapatra
DOI: https://doi.org/10.5815/ijigsp.2018.04.06, Pub. Date: 8 Apr. 2018
This paper presents a heuristic approach to approximate a two-dimensional planar shape using a thick-edged polygonal representation based on some optimal criteria. The optimal criteria primarily focus on derivation of minimal thickness for an edge of the polygonal shape representation to handle noisy contour. Vertices of the shape-approximating polygon are extracted through a heuristic exploration using a digital geometric approach in order to find optimally thick-line to represent a discrete curve. The merit of such strategies depends on how efficiently a polygon having minimal number of vertices can be generated with modest computational complexity as a meaningful representation of a shape without loss of significant visual characteristics. The performance of the proposed frame- work is comparable to the existing schemes based on extensive empirical study with standard data set.
[...] Read more.By Saptarsi Goswami Sanjay Chakraborty Himadri Nath Saha
DOI: https://doi.org/10.5815/ijisa.2017.10.03, Pub. Date: 8 Oct. 2017
Feature selection plays a very important role in all pattern recognition tasks. It has several benefits in terms of reduced data collection effort, better interpretability of the models and reduced model building and execution time. A lot of problems in feature selection have been shown to be NP – Hard. There has been significant research in feature selection in last three decades. However, the problem of feature selection for clustering is still quite an open area. The main reason is unavailability of target variable as compared to supervised tasks. In this paper, five properties or metafeatures like entropy, skewness, kurtosis, coefficient of variation and average correlation of the features have been studied and analysed. An extensive study has been conducted over 21 publicly available datasets, to evaluate viability of feature elimination strategy based on the values of the metafeatures for feature selection in clustering. A strategy to select the most appropriate metafeatures for a particular dataset has also been outlined. The results indicate that the performance decrease is not statistically significant.
[...] Read more.By Saptarsi Goswami Sourav Saha Subhayu Chakravorty Amlan Chakrabarti Basabi Chakraborty
DOI: https://doi.org/10.5815/ijisa.2015.10.04, Pub. Date: 8 Sep. 2015
Feature selection is one of the most important preprocessing steps for a data mining, pattern recognition or machine learning problem. Finding an optimal subset of features, among all the combinations is a NP-Complete problem. Lot of research has been done in feature selection. However, as the sizes of the datasets are increasing and optimality is a subjective notion, further research is needed to find better techniques. In this paper, a genetic algorithm based feature subset selection method has been proposed with a novel feature evaluation measure as the fitness function. The evaluation measure is different in three primary ways a) It considers the information content of the features apart from relevance with respect to the target b) The redundancy is considered only when it is over a threshold value c) There is lesser penalization towards cardinality of the subset. As the measure accepts value of few parameters, this is available for tuning as per the need of the particular problem domain. Experiments conducted over 21 well known publicly available datasets reveal superior performance. Hypothesis testing for the accuracy improvement is found to be statistically significant.
[...] Read more.By Saptarsi Goswami Amlan Chakrabarti
DOI: https://doi.org/10.5815/ijitcs.2014.11.10, Pub. Date: 8 Oct. 2014
Feature selection is one of the most important preprocessing steps in data mining and knowledge Engineering. In this short review paper, apart from a brief taxonomy of current feature selection methods, we review feature selection methods that are being used in practice. Subsequently we produce a near comprehensive list of problems that have been solved using feature selection across technical and commercial domain. This can serve as a valuable tool to practitioners across industry and academia. We also present empirical results of filter based methods on various datasets. The empirical study covers task of classification, regression, text classification and clustering respectively. We also compare filter based ranking methods using rank correlation.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals