2006 Algorithmica

A. Driga, P. Lu, J. Schaeffer, D. Szafron, K. Charter and I. Parsons. FastLSA: A Fast, Linear-Space, Parallel and Sequential Algorithm for Sequence Alignment, Algorithmica, 2006, 337-375. abstract or pdf.


Abstract:

Sequence alignment is a fundamental operation for homology search in bioinformatics. For two DNA or protein sequences of length m and n, full-matrix (FM), dynamic programming alignment algorithms such as Needleman-Wunsch and Smith-Waterman take O(m * n) time and use a possibly prohibitive O(m * n) space. Hirschberg’s algorithm reduces the space requirements to O(min(m, n)), but requires approximately twice the number of operations required by the FM algorithms. The Fast Linear Space Alignment (FastLSA) algorithm adapts to the amount of space available by trading space for operations. FastLSA can effectively adapt to use either linear or quadratic space, depending on the specific machine. Our experiments show that, in practice, due to memory caching effects, FastLSA is always as fast or faster than Hirschberg and the FM algorithms. To further improve the performance of FastLSA, we have parallelized it using a simple but effective form of wavefront parallelism. Our experimental results show that Parallel FastLSA exhibits good speedups, almost linear for 8 processors or less, and also that the efficiency of Parallel FastLSA increases with the size of the sequences that are aligned. Consequently, parallel and sequential FastLSA can be flexibly and effectively used with high performance in situations where space and the number of parallel processors can vary greatly.