In this paper, we introduce a new thread partitioning algorithm that
can meet the above challenge for a range of machine architecture models.
A quantitative affinity heuristic is introduced to guide the placement
of operations into threads. This heuristic addresses the trade-off between
exploiting parallelism and preserving locality. The algorithm is
surprisingly simple due to the use of a time-ordered
event list to account for the multiple execution unit activities.
We have implemented the proposed algorithm and our experiments, performed
on a wide range of examples, have demonstrated its efficiency and effectiveness.
Send comments to: amaral AT cs DOT ualberta DOC ca