The Trellis Project

Introduction

The VMWare Trellis NAS Bridge Appliance

The Trellis Project is investigating techniques to create a single metacomputer from a collection of high-performance systems. This includes work in scheduling, remote data access, distributed file systems, security, and a Web-based portal to metacomputers. Different user-level overlay metacomputers can be created using the same constituent systems on a (if desired) per-user basis.

The Principal Investigator is Paul Lu. The current graduate students are Mike Closson, Cam Macdonell, Paul Nalos, and Yang Wang. The project has benefited greatly from the significant contributions of Chris Pinchak, Jonathan Schaeffer, Meng Ding, Mark Goldenberg, Nicholas Lamb, George Ma, Danny Ngo, Victor Salamon, Mark Lee, Morgan Kan, Nolan Bard, and others. We gratefully acknowledge research support from C3.ca, Alberta's Ministry of Innovation and Science, NSERC, CFI, and SGI.

From September 15 to September 17, 2004, the CISS-3 Experiment (the Third Canadian Internetworked Scientific Supercomputer) used the latest Trellis technology to harness over 4,000 processors from across Canada into a virtual supercomputer or metacomputer. With a ramp up and ramp down, the actual number of concurrent jobs varied over the 48 hour period of the experiment, but we had 4,000+ jobs running, at the peak. Two applications, one studying protein folding using GROMACS and another studying biological membranes using CHARMM, were used. CISS-3 also marked the first production use of the Trellis File System (see below) in both library-based and NFS-based forms.

On November 4, 2002, the CISS Experiment (the Canadian Internetworked Scientific Supercomputer) used technology from the Trellis Project to harness over 1,300 processors from across Canada into a virtual supercomputer or metacomputer. The application was a MOLPRO-based computational chemistry problem from Dr. Wolfgang Jäger's group (Dept. of Chemistry, U. of Alberta).

Canadian Internetworked Scientific Supercomputer (CISS) FAQ

Overview and Context

Put simply, a metacomputer is an useful aggregation of individual computer systems. We have developed techniques to schedule jobs and create file systems across computers that span different institutions and administrative domains. The advantage of the Trellis approach is a high level of functionality, simplicity, and minimal changes to the underlying software infrastructure.

In computing science, the dream of metacomputing has been around for decades. In various forms (and with important distinctions), it has also been known as “distributed computing”, “batch scheduling”, “cycle stealing”, and (most recently) “grid computing”. Some well-known, contemporary examples in this area include SETI@home, Project RC5/distributed.net, Condor, Legion, PUNCH, UNICORE, and the projects associated with Globus/Open Grid Service Architecture (OGSA). In Canada, Grid Canada has been exploring Globus technology since September 2001.

SETI@home, Project RC5, and similar projects are based on single applications (e.g., signal processing, cracking codes); Trellis is designed to support arbitrary applications. Currently, we do not use any of the new software that might be considered part of “grid computing”; Trellis is exploring alternate approaches to metacomputing. However, the design of Trellis allows us to incorporate and/or co-exist with grid technology, in the future, through the use of (say), grid-enabled Secure Shell.

Of course, there are many, many other related projects. Google the above terms or look on Slashdot. Some of the projects are cited in our academic papers, listed below.

Trellis NFS and Brief CISS-3 Results

Michael Closson and Paul Lu. Bridging Local and Wide Area Networks for Overlay Distributed File Systems 2nd USENIX Workshop on Real, Large Distributed Systems (WORLDS '05) , San Francisco, California, U.S.A., December 13, 2005, pp. 49--54, Acrobat PDF(54 kbytes).

Abstract: In metacomputing and grid computing, a computational job may execute on a node that is geographically far away from its data files. In such a situation, some of the issues to be resolved are: First, how can the job access its data? Second, how can the high latency and low bandwidth bottlenecks of typical wide-area networks (WANs) be tolerated? Third, how can the deployment of distributed file systems be made easier? The Trellis Network File System (Trellis NFS) uses a simple, global namespace to provide basic remote data access. Data from any node accessible by Secure Copy can be opened like a file. Aggressive caching strategies for file data and metadata can greatly improve performance across WANs. And, by layering Trellis NFS over local file systems and using a bridging strategy, instead of a re-implementation strategy, between the well-known Network File System (NFS) and wide-area protocols, the deployment is greatly simplified. As part of the Third Canadian Internetworked Scientific Supercomputer (CISS-3) experiment, Trellis NFS was used as a distributed file system between high-performance computing (HPC) sites across Canada. CISS-3 ramped up over several months, ran in production mode for over 48 hours, and at its peak, had over 4,000 jobs running concurrently. Typically, there were about 180 concurrent jobs using Trellis NFS. We discuss the functionality, scalability, and benchmarked performance of Trellis NFS. Our hand-on experience with CISS and Trellis NFS has reinforced our design philosophy of layering, overlaying, and bridging systems to provide new functionality.

CISS-1 and CISS-2 Overview and Results

Christopher Pinchak, Paul Lu, Jonathan Schaeffer, and Mark Goldenberg. The Canadian Internetworked Scientific Supercomputer, 17th International Symposium on High Performance Computing Systems and Applications (HPCS), Sherbrooke, Quebec, Canada, May 11--14, 2003. pp. 193--199. Acrobat PDF(270 kbytes).

Yunjie Xu, Aiko Huckauf, Wolfgang Jäger, Paul Lu, Jonathan Schaeffer, and Christopher Pinchak. The CISS-1 Experiment: ab initio Study of Chiral Interactions, 39th International Union of Pure and Applied Chemistry (IUPAC) Congress and 86th Conference of The Canadian Society for Chemistry, Ottawa, Ontario, Canada, August 10--15, 2003. Refereed poster. Acrobat PDF(12 kbytes).

Abstract: On November 4, 2002, a Canada-wide virtual supercomputer gave a chemistry research team the opportunity to do several years worth of computing in a single day. This experiment, called CISS-1 (Canadian Internetworked Scientific Supercomputer), had three research impacts: (1) in partnership with C3.ca, created a new precedent for cooperation among Canadian high-performance computing sites, (2) demonstrated the scalability and capabilities of the Trellis system for wide-area high-performance metacomputing, and (3) produced a new computational chemistry result.

Using roughly 1,376 dedicated processors, at 20 facilities, and in 18 administrative domains across the country, approximately 3.5 CPU years of computing were completed. CISS-1 is a prototype for future research projects requiring large-scale computing in Canada. This is the first step towards making CISS a regular event on the Canadian research landscape.

Trellis Security Infrastructure
(TSI)

Morgan Kan, Danny Ngo, Mark Lee, Paul Lu, Nolan Bard, Michael Closson, Meng Ding, Mark Goldenberg, Nicholas Lamb, Ron Senda, Edmund Sumbar, and Yang Wang. The Trellis Security Infrastructure: A Layered Approach to Overlay Metacomputers, 18th International Symposium on High Performance Computing Systems and Applications (HPCS), pp. 109-117, Winnipeg, Manitoba, Canada, May 16--19, 2004. Acrobat PDF(82 kbytes).

NOTE: An expanded version of this paper is in press (2006) for publication in a special issue of the Journal of Parallel and Distributed Computing. Please email paullu@cs.ualberta.ca if you wish to see a pre-print.

Abstract: Researchers often have access to a variety of different high-performance computer (HPC) systems in different administrative domains, possibly across a wide-area network. Consequently, the security infrastructure becomes an important component of an overlay metacomputer: a user-level aggregation of HPC systems. The Grid Security Infrastructure (GSI) uses a sophisticated approach based on proxies and certification authorities. However, GSI requires a substantial amount of installation support and it requires human-negotiated organization-to-organization security agreements.

In contrast, the Trellis Security Infrastructure (TSI) is layered on top of the widely-deployed Secure Shell (SSH) and systems administrators only need to provide unprivileged accounts to the users. The contribution of the TSI approach is in demonstrating that a single sign-on (SSO) system can be implemented without requiring a new security infrastructure. We describe the design of the TSI and provide a tutorial of some of the tools created to make the TSI easier to use.

Placeholder Scheduling

Christopher Pinchak, Paul Lu, and Mark Goldenberg. Practical Heterogeneous Placeholder Scheduling in Overlay Metacomputers: Early Experiences, 8th Workshop on Job Scheduling Strategies for Parallel Processing, Edinburgh, Scotland, U.K., July 24, 2002, pp. 85--105. Also published as Springer-Verlag LNCS 2537 (2003), pages 205--228. Gzipped postscript (88 kbytes). BibTeX entry here. Also available in Acrobat PDF (299 kbytes).

Abstract: A practical problem faced by users of high-performance computers is: How can I automatically load balance my jobs across different batch queues, which are in different administrative domains, if there is no existing grid infrastructure? It is common to have user accounts for a number of individual high-performance systems (e.g., departmental, university, regional) that are administered by different groups. Without an administration-deployed grid infrastructure, one can still create a purely user-level aggregation of individual computing systems.

The Trellis Project is developing the techniques and tools to take advantage of a user-level overlay metacomputer. Because placeholder scheduling does not require superuser permissions to set up or configure, it is well-suited to overlay metacomputers. This paper contributes to the practical side of grid and metacomputing by empirically demonstrating that placeholder scheduling can work across different administrative domains, across different local schedulers (i.e., PBS and Sun Grid Engine), and across different programming models (i.e., Pthreads, MPI, and sequential). We also describe a new metaqueue system to manage jobs with explicit workflow dependencies.

Workflow and DAG Scheduling

Mark Goldenberg, Paul Lu, and Jonathan Schaeffer. TrellisDAG: A System for Structured DAG Scheduling, 9th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP) , Seattle, Washington, U.S.A., June 24, 2003. To appear as a volume of Springer-Verlag's LNCS. Draft LNCS version: Acrobat PDF(316 kbytes).

Abstract: High-performance computing often involves sets of jobs or workloads that must be scheduled. If there are dependencies in the ordering of the jobs (e.g., pipelines or directed acyclic graphs) the user often has to carefully, manually submit the jobs in the right order and/or delay submitting dependent jobs until other jobs have finished. If the user can submit the entire workload with dependencies, then the scheduler has more information about future jobs in the workflow. We have designed and implemented TrellisDAG, a system that combines the use of placeholder scheduling and a subsystem for describing workflows to provide novel mechanisms for computing non-trivial workloads with inter-job dependencies. TrellisDAG also has a modular architecture for implementing different scheduling policies, which will be the topic of future work. Currently, TrellisDAG supports: (1) A spectrum of mechanisms for users to specify both simple and complicated workflows. (2) The ability to load balance across multiple administrative domains. (3) A convenient tool to monitor complicated workflows.

Trellis Driver

for interfacing
with Java;
using a
bioinformatics
case study

Nicholas Lamb, Paul Lu, and Alona Fyshe. Trellis Driver: Distributing a Java Workflow Across a Network of Workstations, 6th International Workshop on High Performance Scientific and Engineering Computing with Applications (HPSEC-04) held with the 33rd International Conference on Parallel Processing (ICPP-04), Montreal, Quebec, Canada, August 15--18, 2004, pp. 198--205. Acrobat PDF(87 kbytes).

Abstract: Some applications in science and engineering consist of a main job that invokes, or drives, other jobs. For example, a server process may receive a request, then invoke a workflow of stand-alone scripts or executables to handle the request, and then generate the final response. Java's Runtime.exec() function allows jobs to be invoked from within a master Java program. However, these jobs are usually restricted to the same machine. If the number of jobs in the workflow is large, then it can be desirable to load balance the workload across different servers to maximize throughput.

We describe the design and implementation of the Trellis Driver, a newly-developed Java module that runs jobs using TrellisDriver.exec() and allows jobs to be scheduled across clusters and metacomputers (i.e., aggregations of servers). Using a Java-based bioinformatics application as a case study, we evaluate the performance improvement Trellis Driver offers through workflow parallelism.

Trellis-SDP

for simple
data parallel
applications

Meng Ding and Paul Lu. Trellis-SDP: A Simple Data-Parallel Programming Interface, 3rd Workshop on Compile and Runtime Techniques for Parallel Computing (CRTPC) held with the 33rd International Conference on Parallel Processing (ICPP-04), Montreal, Quebec, Canada, August 15--18, 2004, pp. 498--505. Acrobat PDF(129 kbytes).

Abstract: Some datasets and computing environments are inherently distributed. For example, image data may be gathered and stored at different locations. Although data parallelism is a well-known computational model, there are few programming systems that are both easy to program (for simple applications) and can work across administrative domains.

We have designed and implemented a simple programming system, called Trellis-SDP, that facilitates the rapid development of data-intensive applications. Trellis-SDP is layered on top of the Trellis infrastructure, a software system for creating overlay metacomputers: user-level aggregations of computer systems. Trellis-SDP provides a master-worker programming framework where the worker components can run self-contained, new or existing binary applications. We describe two interface functions, namely trellis_scan() and trellis_gather(), and show how easy it is to get reasonable performance with simple data-parallel applications, such as Content Based Image Retrieval (CBIR) and Parallel Sorting by Regular Sampling (PSRS).

TrellisWeb Portal

(formerly PBSWeb)

Home Page of TrellisWeb (formerly PBSWeb)
(New open-source release coming Real Soon Now.)

George Ma, Victor Salamon, and Paul Lu. Security and History Management Improvements to PBSWeb, 15th International Symposium on High Performance Computing Systems and Applications (HPCS), Windsor, Ontario, Canada, June 18--20, 2001. To be published by Kluwer Academic Publishers. Final version not yet available. Contact paullu@cs.ualberta.ca for draft version.

George Ma and Paul Lu. PBSWeb: A Web-based Interface to the Portable Batch System, 12th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS), Las Vegas, Nevada, U.S.A., November 6-9, 2000, pp. 24-30. Gzipped postscript (266 kbytes). BibTeX entry here. Also available in Acrobat PDF (6.7 MB, large). Slides from PDCS presentation Gzipped postscript (207 kbytes).

Abstract: The resource managers (e.g., batch queue schedulers) used at many parallel and distributed computing centers can be complicated systems for the average user. A large number of command-line options, environment variables, and site-specific configuration parameters can be overwhelming. Therefore, we have developed a simple Web-based interface, called PBSWeb, to the Portable Batch System (PBS), which is our local resource manager system.

By using a Web browser and server software infrastructure, PBSWeb supports both local and remote users, maintains a history of past job parameters, and hides much of the complexities of the underlying scheduler. The architecture and implementation techniques used in PBSWeb can be applied to other resource managers.

Since our first description of the PBSWeb project, we have completely re-implemented the system to address three important deficiencies:

Instead of using the filesystem to manage job histories, we are now using the PostgreSQL relational DBMS to improve history queries and reliability.
Instead of clear text socket connections, we are using the Apache Web server with Secure Socket Layer to improve the security of communications between browser and server.
Instead of using Perl and raw HTML documents, we are using the PHP server-side scripting language to access the PostgreSQL database, to manage user sessions, and to create dynamic Web pages.

We describe the security and history management benefits of the new PBSWeb.

Return to Paul Lu's home page

paullu@cs.ualberta.ca
$Id: index.html,v 1.11 2006/08/17 00:14:23 paullu Exp paullu $