Schedule Overview

PACT 2014 iCal

Program Details

Sunday, August 24: Workshops & Tutorials

8:00 - 9:00Registration and Breakfast

9:00 - 17:30Data-Flow Models Workshop

Data Flow Models (DFM)

The purpose of DFM continues being to bring together those researchers interested in novel computational models based on Data-Flow principles of execution. The switch to multi-core systems has raised concurrency to the level of a major issue if we are to use the increasing number of cores in a chip.

More on DFM Workshop >

9:00 - 12:30Advanced Design, Analysis and Verification of NoC Architectures Workshop

Advanced Design, Analysis and Verification of NoC Architectures (NoC)

Communication fabrics are critical for the quality (correctness, performance, energy, reliability) and fast integration of future computer systems in all segments. Examples of communication fabrics range from high-end regular rings and meshes in high- end servers and high-performance computing to SoC interconnect and IO fabric. Interconnect design is one of the greatest challenges faced by designers regardless of whether the interconnect fabrics are regular or irregular in structure.

Designing communication fabrics is a multidimensional challenge that involves complex functional and performance validation, cost analysis (area, power, design cost), and multilayered optimization (logical performance of interconnect vs. physical design aspects of the chip). This tutorial will summarize the results of the Communication Fabrics research program funded by Intel University Research Office and executed by multiple universities in collaboration with Intel Corp. In particular, the tutorial will cover:

  • Functional correctness (memory consistency proofs, deadlock-freedom, livelocks)
  • Traffic models for the evaluation of communication fabrics
  • Quality of service analysis and optimization
  • Uncore, interconnect, and system power management
  • Physically-aware performance and area optimization for communication fabrics

Detailed NoC program >
Full NoC proposal (PDF)

14:00 - 17:30Toward Improved Performance Solutions Tutorial

Toward Improved Performance Solutions (TIPS)

The HPC community is now using powerful supercomputing systems composed of heterogeneous nodes built from multi-core processors and accelerator like NVIDIA GPUs and Intel Xeon Phi, providing an aggregate node performance of more than one TeraFlop/s. This tremendous computational power can only be efficiently utilized with the appropriate programming models and software infrastructure. Indeed, it requires end-users to rethink and redesign their algorithms to take advantage of the underlying architecture while keeping productivity in mind.

This proposed tutorial is designed for the HPC community interested on the best practices to speed up efficiently their parallel codes. A comprehensive overview on GPU architecture and the Intel Xeon Phi coprocessor, also known as Intel Many Integrated Core (MIC) will be presented. Technical sessions will include interactive examples using high-level programming models such as OpenACC and OpenMP. With a focus on high productivity in programming manycores architectures, examples of current supported software packages such as numerical libraries (e.g. cuBLAS, MAGMA), and a broad range of applications (e.g., computational electromagnetics, computational chemistry, CFD, seismic imaging) will be also highlighted. Participants are encouraged to bring laptop computers and follow live demonstrations with detailed examples.

Full TIPS proposal (PDF)

Back to top

Monday, August 25: Keynote & Sessions

8:00 - 9:00Registration and Breakfast

9:00 - 10:00Keynote: Klara Nahrstedt

Internet of Mobile Things: Challenges and Opportunities

The Internet of Things (IoT) concept has been around for some time and applications such as transportation, health-care, education, travel, smart grid, retail, are and will be major benefactors of this concept. However, only recently, due to technological advances in sensor devices and rich wireless connectivity, Internet of Things at scale is becoming reality. For example, Cisco’s Internet of Things Group predicts over 50 billion connected sensory devices by 2020.

In this talk, we will discuss the Internet of Mobile Things (IoMT) since several game-changing technological advances happened on ‘mobile things’ such as mobile phones, trains, and cars, where rich sets of sensors, connected via diverse sets of wireless Internet technologies, are changing and influencing how people communicate, move, and download and distribute information. In this space, challenges come from the needs to determine (1) contextual information such as location, duration of contact, density of devices, utilizing networked sensory information; (2) higher level knowledge such as users’ activity detection, mood detection, applications usage pattern detection and user interactions on ‘mobile things’, utilizing contextual information; and (3) adaptive and real-time parallel and distributed architectures that integrate context, activity, mood, usage patterns into mobile application services on mobile ‘things’. Solving these challenges will provide enormous opportunities to improve the utility of mobile ‘things’, optimizing scarce resources on mobile ‘things’ such as energy, memory, and bandwidth.

Klara Nahrstedt (University of Illinois at Urbana-Champaign)

Klara Nahrstedt is the Ralph and Catherine Fisher Professor in the Computer Science Department, and Acting Director of Coordinated Science Laboratory in the College of Engineering at the University of Illinois at Urbana-Champaign. Her research interests are directed toward 3D teleimmersive systems, mobile systems, Quality of Service (QoS) and resource management, Quality of Experience in multimedia systems, and real-time security in mission-critical systems. She is the co-author of widely used multimedia books `Multimedia: Computing, Communications and Applications' published by Prentice Hall, and ‘Multimedia Systems’ published by Springer Verlag. She is the recipient of the IEEE Communication Society Leonard Abraham Award for Research Achievements, University Scholar, Humboldt Award, IEEE Computer Society Technical Achievement Award, and the former chair of the ACM Special Interest Group in Multimedia. She was the general chair of ACM Multimedia 2006, general chair of ACM NOSSDAV 2007 and the general chair of IEEE Percom 2009.

Klara Nahrstedt received her Diploma in Mathematics from Humboldt University, Berlin, Germany in numerical analysis in 1985. In 1995 she received her PhD from the University of Pennsylvania in the Department of Computer and Information Science. She is ACM Fellow, IEEE Fellow, and Member of the Leopoldina German National Academy of Sciences.

10:00 - 10:30Coffee Break

10:30 - 12:30Session 1: Best Papers

Virtues and Limitations of Commodity Hardware Transactional Memory

  • Nuno Diegues (INESC-ID / Instituto Superior Técnico, University of Lisbon)
  • Paolo Romano (INESC-ID / Instituto Superior Técnico, University of Lisbon)
  • Luís Rodrigues (INESC-ID / Instituto Superior Técnico, University of Lisbon)

Cooperative Cache Scrubbing

  • Jennifer B. Sartor (Ghent University)
  • Wim Heirman (Ghent University)
  • Stephen M. Blackburn (Australia National University)
  • Lieven Eeckhout (Ghent University)
  • Kathryn S. McKinley (Microsoft)

KLA: A New Algorithmic Paradigm for Parallel Graph Computations

  • Harshvardhan (Texas A&M University)
  • Adam Fidel (Texas A&M University)
  • Nancy M. Amato (Texas A&M University)
  • Lawrence Rauchwerger (Texas A&M University)

Tiling and Optimizing Time-Iterated Computations over Periodic Domains

  • Uday Bondhugula (Indian Institute of Science)
  • Vinayaka Bandishti (Indian Institute of Science)
  • Albert Cohen (INRIA)
  • Guillain Potron (ENS)
  • Nicolas Vasilache (Reservoir Labs)

12:30 - 14:00Lunch

14:00 - 16:00Session 2A: Cache Hierarchies (I)

ATCache: Reducing DRAM-cache Latency via a Small SRAM Tag Cache

  • Cheng-Chieh Huang (University of Edinburgh)
  • Vijay Nagarajan (University of Edinburgh)

SpongeDirectory: Flexible Sparse Directories Utilizing Multi-Level Memristors

  • Lunkai Zhang (ICT, Chinese Academy of Sciences)
  • Dmitri Strukov (Electrical and Computer Engineering, UC Santa Barbara)
  • Hebatallah Saadeldeen (Department of Computer Science, UC Santa Barbara)
  • Dongrui Fan (ICT, Chinese Academy of Sciences)
  • Mingzhe Zhang (ICT, Chinese Academy of Sciences)
  • Diana Franklin (Department of Computer Science, UC Santa Barbara)

EFetch: Optimizing Instruction Fetch for Event-Driven Web Applications

  • Gaurav Chadha (University of Michigan)
  • Scott Mahlke (University of Michigan)
  • Satish Narayanasamy (University of Michigan)

XStream: Cross-core Spatial Streaming based MLC Prefetchers for Parallel Applications in CMPs

  • Biswabandan Panda (IIT Madras, India)
  • Shankar Balachandran (IIT Madras, India)

14:00 - 15:00Session 2B1: Parallelism Studies

What is the Cost of Weak Determinism?

  • Cedomir Segulja (University of Toronto)
  • Tarek Abdelrahman (University of Toronto)

ILP and TLP in Shared Memory Applications: A Limit Study

  • Ehsan Fatehi (Texas A&M University)
  • Paul V. Gratz (Texas A&M University)

15:00 - 16:00Session 2B2: Algorithms

Versatile and Scalable Parallel Histogram Construction

  • Wookeun Jung (Seoul National University)
  • Jongsoo Park (Parallel Computing Lab, Intel Corporation)
  • Jaejin Lee (Seoul National University)

Bitwise Data Parallelism in Regular Expression Matching

  • Rob Cameron (Simon Fraser University)
  • Tom Shermer (Simon Fraser University)
  • Arrvindh Shriraman (Simon Fraser University)
  • Ken Herdy (Simon Fraser University)
  • Dan Lin (Simon Fraser University)
  • Ben Hull (Simon Fraser University)
  • Meng Lin (Simon Fraser University)

16:00 - 16:30Coffee Break

16:30 - 18:00PACT poster presentations (4 min each)

18:30 - 20:30PACT poster session & SRC posters at reception

Back to top

Tuesday, August 26: Sessions

9:00 - 10:30Session 3A: GPUs (I)

Adaptive heterogeneous scheduling on integrated GPUs

  • Rashid Kaleem (UT-Austin)
  • Raj Barik (Intel Labs)
  • Tatiana Shpeisman (Intel Labs)
  • Brian T. Lewis (Intel Labs)
  • Chunling Hu (Intel Labs)
  • Keshav Pingali (UT-Austin)

Warp-Aware Trace Scheduling for GPUs

  • James Jablin (Brown University)
  • Thomas Jablin (UIUC)
  • Onur Mutlu (CMU)
  • Maurice Herlihy (Brown University)

CAWS: Criticality-Aware Warp Scheduling for GPGPU Workloads

  • Shin-Ying Lee (Arizona State University)
  • Carole-Jean Wu (Arizona State University)

9:00 - 10:30Session 3B: Transactional Memory

Invyswell: A Hybrid Transactional Memory for Haswell's Restricted Transactional Memory

  • Irina Calciu (Brown University)
  • Justin Gottschlich (Intel Labs)
  • Tatiana Shpeisman (Intel Labs)
  • Gilles Pokam (Intel Labs)
  • Maurice Herlihy (Brown University)

Consolidated Conflict Detection in Hardware Transactional Memory

  • Lihang Zhao (Information Sciences Institute / USC)
  • Jeffrey Draper (Information Sciences Institute / USC)

DeSTM: Harnessing Determinism in STMs for Application Development

  • Kaushik Ravichandran (Georgia Institute of Technology)
  • Ada Gavrilovska (Georgia Institute of Technology)
  • Santosh Pande (Georgia Institute of Technology)

10:30 - 11:00Coffee Break

11:00 - 12:30Session 4A: Energy Efficiency

Pattern Aware Scheduling and Power Gating for GPGPUs

    Qiumin Xu (University of Southern California) Murali Annavaram (University of Southern California)

Heterogeneous Microarchitectures Trump Voltage Scaling for Mobile Cores

  • Andrew Lukefahr (University of Michigan)
  • Shruti Padmanabha (University of Michigan)
  • Reetuparna Das (University of Michigan)
  • Ronald Dreslinski Jr. (University of Michigan)
  • Thomas F. Wenisch (University of Michigan)
  • Scott Mahlke (University of Michigan)

RCS: Runtime Resource and Core Scaling for Power-Constrained Multi-core

  • Hamid Reza Ghasemi (University of Wisconsin-Madison)
  • Nam Sung Kim (University of Wisconsin-Madison / AMD Research)

11:00 - 12:30Session 4B: Runtime Systems

Realm: An Event-Based Low-Level Runtime for Distributed Memory Architectures

  • Sean Treichler (Stanford University)
  • Michael Bauer (Stanford University)
  • Alex Aiken (Stanford University)

kMAF: Automatic Kernel-Level Management of Thread and Data Affinity

  • Matthias Diener (Federal University of Rio Grande do Sul)
  • Eduardo H. M. Cruz (Federal University of Rio Grande do Sul)
  • Philippe O. A. Navaux (Federal University of Rio Grande do Sul)
  • Anselm Busse (Technische Universität Berlin)
  • Hans-Ulrich Heiss (Technische Universität Berlin)

Shuffling: A Framework for Lock Contention Aware Thread Scheduling for Multicore Multiprocessor Systems

  • Kishore Kumar Pusukuri (University of California, Riverside)
  • Rajiv Gupta (University of California, Riverside)
  • Laxmi N. Bhuyan (University of California, Riverside)

12:30 - 14:00Lunch

14:00 - 17:00Excursion to Fort Edmonton

18:00 - 19:00Tour of Art Gallery of Alberta

19:30 - 22:00Banquet at Hotel MacDonald (Best Paper Award announced)

Back to top

Wednesday, August 27: Keynote & Sessions

9:00 - 10:00Keynote: Bob Blainey

Domain-Specific Models for Innovation in Analytics

Big data is a transformational force for businesses and organizations of every stripe. The ability to rapidly and accurately derive insights from massive amounts of data is becoming a critical competitive differentiator so it is driving continuous innovation among business analysts, data scientists, and computer engineers. Two of the most important success factors for analytic techniques are the ability to quickly develop and incrementally evolve them to suit changing business needs and the ability to scale these techniques using parallel computing to process huge collections of data. Unfortunately, these goals are often at odds with each other because innovation at the algorithm and data model level requires a combination of domain knowledge and expertise in data analysis while achieving high scale demands expertise in parallel computing, cloud computing and even hardware acceleration. In this talk, I will examine various approaches to bridging these two goals, with a focus on domain-specific models which simultaneously improve the agility of analytics development and the achievement of efficient parallel scaling.

Bob Blainey (IBM Canada Software Laboratory)

Bob Blainey is an IBM Fellow and the technical architect of the Hardware Acceleration Laboratory in IBM's Software Group. Bob has been with IBM for over 20 years, with a consistent focus on deep optimization of software for IBM systems. He spent many years working on compiler-based transformations for parallelism and for high performance on systems, at the microprocessor, node and cluster level. More recently, Bob has had a focus on re-imagining the relationship between software and hardware in the post-scaling world, which includes optimization of IBM systems and software in the short term and the creation of new system structures in the longer term using disruptive technologies.

10:00 - 10:30Coffee Break

10:30 - 11:30Session 5A1: Compiler Frameworks

OpenTuner: An Extensible Framework for Program Autotuning

  • Jason Ansel (MIT)
  • Shoaib Kamil (MIT)
  • Kalyan Veeramachaneni (MIT)
  • Una-May O'Reilly (MIT)
  • Saman Amarasinghe (MIT)

Velociraptor: A compiler toolkit for numerical programs targeting CPUs and GPUs

  • Rahul Garg (McGill University)
  • Laurie Hendren (McGill University)

11:30 - 12:30Session 5A2: Scheduling

Memory Scheduling Towards High-Throughput Cooperative Heterogeneous Computing

  • Hao Wang (University of Wisconsin-Madison)
  • Ripudaman Singh (University of Wisconsin-Madison)
  • Michael Schulte (AMD Research)
  • Nam Sung Kim (University of Wisconsin-Madison / AMD Research)

Bounded memory scheduling of dynamic task graphs

  • Dragos Sbirlea (Rice University)
  • Zoran Budimlic (Rice University)
  • Vivek Sarkar (Rice University)

10:30 - 12:30Session 5B: ACM Student Research Competition (SRC) Presentations

12:30 - 14:00Lunch (SRC Awards Announced)

14:00 - 15:30Session 6A: Cache Hierarchies (II)

Trading Cache Hit Rate for Memory Performance

  • Wei Ding (Penn State)
  • Mahmut Kandemir (Penn State)
  • Diana Guttman (Penn State)
  • Adwait Jog (Penn State)
  • Chita R. Das (Penn State)
  • Praveen Yedlapalli (Penn State)

Compiler Support for Selective Page Migration in NUMA Architectures

  • Guilherme Piccoli (UNICAMP)
  • Henrique Nazare Santos (UFMG)
  • Raphael Rodrigues (UFMG)
  • Christiane Pousa (ETH Zurich)
  • Edison Borin (UNICAMP)
  • Fernando Magno Quintao Pereira (UFMG)

COLORIS: A Dynamic Cache Partitioning System Using Page Coloring

  • Ying Ye (Boston University)
  • Richard West (Boston University)
  • Zhuoqun Cheng (Boston University)
  • Ye Li (Boston University)

14:00 - 15:30Session 6B: Performance Tools and I/O

PEMOGEN:Automatic Adaptive Performance Modeling during Program Runtime

  • Arnamoy Bhattacharyya (ETH Zurich)
  • Torsten Hoefler (ETH Zurich)

ArrayTool: A Lightweight Profiler to Guide Array Regrouping

  • Xu Liu (Rice University)
  • Kamal Sharma (Rice University)
  • John Mellor-Crummey (Rice University)

Design for Scalability in Enterprise SSDs

  • Arash Tavakkol (Sharif University of Technology)
  • Mohammad Arjomand (Sharif University of Technology)
  • Hamid Sarbazi-Azad (Sharif University of Technology and Institute for Research in Fundamental Sciences)

15:30 - 16:00Coffee Break

16:00 - 17:30Session 7: GPUs (II)

D2MA: Accelerating Coarse-Grained Data Transfer for GPUs

  • Davoud Anoushe Jamshidi (University of Michigan)
  • Mehrzad Samadi (University of Michigan)
  • Scott Mahlke (University of Michigan)

VAST: The Illusion of a Large Memory Space for GPUs

  • Janghaeng Lee (University of Michigan)
  • Mehrzads Samadi (University of Michigan)
  • Scott Mahlke (University of Michigan)

Automatic Optimization of Thread-Coarsening for Graphics Processors

  • Alberto Magni (University of Edinburgh)
  • Christophe Dubach (University of Edinburgh)
  • Michael O'Boyle (University of Edinburgh)

Back to top


SM-Centric Transformation: Circumventing Hardware Restrictions for Flexible GPU Scheduling

  • Bo Wu (The College of William and Mary)
  • Guoyang Chen (The College of William and Mary)
  • Dong Li (Oak Ridge National Laboratory)
  • Xipeng Shen (The College of William and Mary)
  • Jeffrey Vetter (Oak Ridge National Laboratory)

Using STT-RAM to Enable Energy-Efficient Near-Threshold Chip Multiprocessors

  • Xiang Pan (The Ohio State University)
  • Radu Teodorescu (The Ohio State University)

Automatic execution of single-GPU computations across multiple GPUs

  • Javier Cabezas (Barcelona Supercomputing Center)
  • Llu�s Vilanova (Barcelona Supercomputing Center)
  • Isaac Gelado (NVIDIA Corporation)
  • Thomas B. Jablin (UIUC)
  • Nacho Navarro (UPC)
  • Wen-mei Hwu (UIUC)

SQRL: Hardware Accelerator for Collecting Software Data Structures

  • Snehasish Kumar (Simon Fraser University)
  • Arrvindh Shriraman (Simon Fraser University)
  • Vijayalakshmi Srinivasan (IBM Research)
  • Dan Lin (Simon Fraser University)
  • Jordon Philips (Simon Fraser University)

Rollback-Free Value Prediction with Approximate Memory Loads

  • Bradley Thwaites (Georgia Institute of Technology)
  • Gennady Pekhimenko (Carnegie Mellon University)
  • Amir Yazdanbakhsh (Georgia Institute of Technology)
  • Girish Mururu (Georgia Institute of Technology)
  • Jongse Park (Georgia Institute of Technology)
  • Hadi Esmaeilzadeh (Georgia Institute of Technology)
  • Onur Mutlu (Carnegie Mellon University)
  • Todd C. Mowry (Carnegie Mellon University)

Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels

  • Sreepathi Pai (The University of Texas at Austin)
  • R. Govindarajan (Indian Institute of Science)
  • Matthew J. Thazhuthaveetil (Indian Institute of Science)

LCA: A memory Link and Cache-Aware co-scheduling approach for CMPs

  • Alexandros-Herodotos Haritatos (National Technical University of Athens)
  • Georgios Goumas (National Technical University of Athens)
  • Nikos Anastopoulos (National Technical University of Athens)
  • Konstantinos Nikas (National Technical University of Athens)
  • Kornilios Kourtis (Department of Computer Science, ETH, Zurich)
  • Nectarios Koziris (National Technical University of Athens)

GDP: Accurate Dynamic Performance Accounting for Chip Multiprocessor Memory Systems

  • Magnus Jahre (Norwegian University of Science and Technology)

Protection and Utilization in Shared Cache Through Rationing

  • Raj Parihar (University of Rochester)
  • Jacob Brock (University of Rochester)
  • Chen Ding (University of Rochester)
  • Michael Huang (University of Rochester)

Automatic Parallelism through Macro Dataflow in High-level Array Languages

  • Pushkar Ratnalikar (Indiana University - Bloomington)
  • Arun Chauhan (Indiana University - Bloomington)

Specializing Compiler Optimizations Through Programmable Composition For Dense Matrix Computations

  • Qing Yi (rUniversity of Colorado at Colorado Spings)
  • Qian Wang (Institute of Software, Chinese Academy of Sciences)
  • Huimin Cui (Institute of Computing, Chinese Academy of Sciences)

Measuring Flexibility in Single-ISA Heterogeneous Processors

  • Erik Tomusk (University of Edinburgh)
  • Christophe Dubach (University of Edinburgh)
  • Michael O'Boyle (University of Edinburgh)

Active Learning Accelerated Automatic Heuristic Construction for Parallel Program Mapping

  • William F Ogilvie (The University of Edinburgh)
  • Pavlos Petoumenos (The University of Edinburgh)
  • Zheng Wang (Lancaster University)
  • Hugh Leather (The University of Edinburgh)

Power Manager with Application Parallelism Awareness for Many-Core Systems

  • Simon Holmbacka (Abo Akademi University)
  • Sebastien Lafond (Abo Akademi University)
  • Johan Lilius (Abo Akademi University)

A Runtime Support Mechanism for Fast Mode Switching of a Self-Morphing Core for Power Efficiency

  • Sudarshan Srinivasan (Umass Amherst)
  • Nithesh Kurella (Umass Amherst)
  • Rance Rodrigues (Nvidia)
  • Sandip Kundu (Umass Amherst)
  • Israel Koren (Umass Amherst)

Automatic Data Layout Framework for Heterogeneous Architectures

  • Deepak Majeti (Rice University)
  • Kuldeep S. Meel (Rice University)
  • Rajkishore Barik (Intel Labs)
  • Vivek Sarkar (Rice University)

Locality of Computation for Stencil Optimization

  • Yulong Luo (Institute of Computing Technology, Chinese Academy of Sciences)
  • Guangming Tan (Institute of Computing Technology, Chinese Academy of Sciences)
  • Ninghui Sun (Institute of Computing Technology, Chinese Academy of Sciences)

Back to top