International Journal of Information Engineering and Electronic Business(IJIEEB)

ISSN: 2074-9023 (Print), ISSN: 2074-9031 (Online)

Published By: MECS Press

IJIEEB Vol.10, No.4, Jul. 2018

Map Reduce and Match Aggregate Pipeline Performance Analysis in Metadata Identification and Analysis for Document, Audio, Image, and Video

Full Text (PDF, 645KB), PP.1-7

Views:40   Downloads:0


Mardhani Riasetiawan

Index Terms

Metadata;Document;Audio;Image;Video;MapReduce;Match Aggregate Pipeline;FITS;Self-Assignment Data Management


The study observes the metadata identification and analysis for Document, Audio, Image, and Videos. The process uses MapReduce and Match Aggregate Pipeline to identify, classify, and categories for identification purposes. The inputs are FITS array results and processed in form of XML. The works consist of the extraction process, identification and analysis, classification, and metadata information. The objective is establishing the file information based on volume, variety, veracity, and velocity criteria as part of task identification component in Self-Assignment Data Management. Testing is done for all file types with the number of files and the size of the file according to the grouping. The results show that there is a pattern where the match-aggregate-pipeline has a longer processing time than MapReduce on a small block size, shown in a block size of 64 Mb, 128 Mb, and 256 Mb. But once the block size is magnified the match-aggregate-pipeline has faster processing time at 1024 Mb and 2048 Mb. The results have a contribution in the metadata processing for large files can be done by arranging the block sizes in Match Aggregate Pipeline. 

Cite This Paper

Mardhani Riasetiawan," Map Reduce and Match Aggregate Pipeline Performance Analysis in Metadata Identification and Analysis for Document, Audio, Image, and Video", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.10, No.4, pp. 1-7, 2018. DOI: 10.5815/ijieeb.2018.04.01


[1]ISBD(G): General International Standard Blibliographic Description, International Federation of Library Associations and Institutions, 2004.

[2]Jones, W., Ahronheim, J.R., Crawford, J., Cataloging the Web: Metadata, AACR, and MARC 21, ALCTS, July, 2000.

[3]Dewey Decimal Classification, OCLC, accessed at

[4]Universal Decimal Classification, UDC Consortium, accessed at 

[5]Library of Congress Classification, LOC, accessed at

[6]Manyinka, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H., 2011, Big Data: The Next Frontier for Innovation, Competition, and Productivity,  McKinsey Global Institute 2011 Report, [Online], May 2011.

[7]FITS, File Information Tool Set, Harvard, accessed at 

[8]Pandey, S., 2010, Scheduling and Management of Data Intenesive Application Workflows in Grid and Cloud Computing Environments, Dissertation, Department of Computer Science and Software Engineering, The University of Melbourne, Australia.

[9]Teng, F., 2012, Management Des Donnees Et Ordinnnancement Des Taches Sur Architectures Distributes, Desertation, Ecole Cenrale Paris Et Manufactures, Centrale Paris.

[10]Sedaghat, M., Rodriguez, F. H., Elmroth, E., 2013, A Virtual Machine Re-packaging Approach to the Horizontal vs. Vertical Elasticity Trade-off for Cloud Autoscaling of The 2013 ACM Cloud and Autonomic Computing Conference.

[11]Xu, Q., Arumugam, R. V., Yong, K. L., Wen, Y., Ong, Y. S., Xi., W., 2015,  Adaptive and Scalable Load Balancing for Metadata Server Cluster in Cloud-scale File System, Frontier Computer Science, vol. 9, issue 6, pp.904-918.

[12]Kumar, N., Saxena, S., 2015, A Preference-based Resources Allocation In Cloud Computing Systems, in 3rd International Conference on Recent Trends in Computing 2015. Procedia Computer Science, vol 57, pp. 104-111.