IJISA Vol. 6, No. 5, 8 Apr. 2014
Cover page and Table of Contents: PDF (size: 810KB)
Full Text (PDF, 810KB), PP.70-75
Views: 0 Downloads: 0
Dynamic Load Balancing, Task assignment, GPU, Task stealing, SMP
To get maximum performance on the many-core graphics processors, it is important to have an even balance of the workload so that all processing units contribute equally to the task at hand. This can be hard to achieve when the cost of a task is not known beforehand and when new sub-tasks are created dynamically during execution. Both the dynamic load balancing methods using Static task assignment and work stealing using deques are compared to see which one is more suited to the highly parallel world of graphics processors. They have been evaluated on the task of simulating a computer move against the human move, in the famous four in a row game. The experiments showed that synchronization can be very expensive, and those new methods which use graphics processor features wisely might be required.
R Mohan, N P Gopalan, "Dynamic Load Balancing using Graphics Processors", International Journal of Intelligent Systems and Applications(IJISA), vol.6, no.5, pp.70-75, 2014. DOI:10.5815/ijisa.2014.05.07
[1]Stanley Tzeng, Brandon Lloyd, John D. Owens. A GPU Task-Parallel Model with Dependency Resolution. Computer, vol. 45, Aug 2012, no. 8, pp. 34-41.
[2]Christopher P. Stone, Earl P. N. Duque, Yao Zhang, David Car, John D. Owens, and Roger L. Davis. GPGPU parallel algorithms for structured-grid CFD codes. Proceedings of the 20th AIAA Computational Fluid Dynamics Conference, number 2011-3221, June 2011.
[3]Shubhabrata Sengupta, Mark Harris, Michael Garland, and John D. Owens. Efficient Parallel Scan Algorithms for many-core GPUs. In Jakub Kurzak, David A. Bader, and Jack Dongarra, editors, Scientific Computing with Multicore and Accelerators, Chapman & Hall/CRC Computational Science, chapter 19, pages 413–442. Taylor & Francis, January 2011.
[4]Arora N.S., Blumfoe R. D., Plaxton C.G. Thread Scheduling for Multiprogrammed Multiprocessors. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures (1998), pp. 119–129.
[5]Blumfoe R., Leiserson C. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, New Mexico. (1994), pp. 356–368.
[6]Heirich A., Arvo J. A Competitive Analysis of Load Balancing Strategies for Parallel Ray Tracing. J. Supercomputer. 12, 1-2 (1998), 57–68
[7]Daniel Cederman and Philippas Tsigas. On Dynamic Load Balancing on Graphics Processors. Graphics Hardware (2008)
[8]Nyland L., Harris M., Prins J. Fast NBody Simulation with CUDA. In GPU Gems 3. Addison-Wesley, 2007, ch. 31, pp. 677–695
[9]R.D. Blumofe, C.F. Joerg, B.C. Kuszmaul, C.E. Leiserson, K.H. Randall, Y. Zhou, Cilk: an efficient multithreaded runtime system, in: R.L. Wexelblat (Ed.), Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), ACM, Santa Barbara, CA, 1995, pp. 207–216
[10]N.S. Arora, R.D. Blumofe, C. Greg Plaxton, Thread scheduling for multiprogrammed multiprocessors, in: Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, ACM, Puerto Vallarta, Mexico, 1998, pp. 119–129.
[11]Daniel Cederman and Philippas Tsigas. Dynamic load balancing using work-stealing. In Wen-mei W. Hwu, editor, GPU Computing Gems, volume 2, chapter 35, pages 485–499. Morgan Kaufmann, October 2011.
[12]Stanley Tzeng, Anjul Patney, and John D. Owens. Task management for irregular-parallel workloads on the GPU. In Proceedings of High Performance Graphics 2010, pages 29–37, June 2010.