Design and Implementation of an Efficient Thread Partitioning Algorithm

José Nelson Amaral, Guang Gao, Erturk D. Kocalar, Patrick O'Neill, and Xinan Tang

The development of fine-grain multi-threaded program execution models has created an interesting challenge: how to partition a program into threads that can exploit machine parallelism, achieve latency tolerance, and maintain reasonable locality of reference? A successful algorithm must produce a thread partition that best utilizes multiple execution units on a single processing node and handles long and unpredictable latencies.

In this paper, we introduce a new thread partitioning algorithm that can meet the above challenge for a range of machine architecture models.  A quantitative affinity heuristic is introduced to guide the placement of operations into threads. This heuristic addresses the trade-off between exploiting parallelism and preserving locality.  The algorithm is surprisingly simple due to the use of a time-ordered
event list to account for the multiple execution unit activities.  We have implemented the proposed algorithm and our experiments, performed on a wide range of examples, have demonstrated its efficiency and effectiveness.
 

Return to José Nelson Amaral's Publications

Send comments to: amaral AT cs DOT ualberta DOC ca


Return to Amaral's home page