The University of Alberta GAMES Group

GAMES
Game-playing, Analytical methods, Minimax search and Empirical Studies

Home

Biological Sequence Alignment

Sequence alignment is a crucial operation in bioinformatics, and genetics research. We have applied our knowledge of single agent search to the sequence alignment problem in biology. In sequence alignment, two or more strings are aligned together in order to get the highest number of matching characters. Gaps may be inserted into a string in order to shift the remaining characters into better matches. Typically a scoring function is used to rank different alignments so that biologically plausible alignments score higher. The task of optimal sequence alignment is to find the best possible alignment for a given scoring function and set of sequences. Sequence alignments are used to probe sequence databases for similar sequences, for genetic disease research, construction of phylogenetic trees, comparing functions between similar genes, and more.

         ----GTGAC-TGCGAT--AAG-----CTT--AGATCC---TCTT-AAAAT
             ||||| ||||||  |||     |||  ||||||   |||| ||
         GAGGGAGACATGCGATACAAGGGATCCTTGTAGATCTGCGTCTTTAA---

FastLSA (Fast Linear Space Alignment)

A clever generalization of Hirschberg's Divide and Conquer algorithm for sequence alignment allows a tradeoff between the recomputation (affecting speed) and memory usage.

Kevin Charter, Jonathan Schaeffer and Duane Szafron. Sequence Alignment using FastLSA, International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS'2000), pp.239-245, 2000.

A* variants for Multiple Sequence Alignment

Matthew McNaughton is working on the very difficult problem of multiple sequence alignment. Several sequences must be aligned to find the global alignment between all sequences. The size of the search space is explosive with the number of sequences. There are several interesting variants of A* search that can be used to attack this problem.

LBD-ALign (Linear Bounded Diagonal Alignment)

Aaron Davidson has done some work on a bounded dynamic programming algorithm for optimal pairwise sequence alignment. It builds on the Needleman-Wunch dynamic programming algorithm, but adds pruning of the DP-matrix. This approach marries the benefits of A*'s pruning with dynamic programming's low overhead. Linear space versions are discussed as well.

Aaron Davidson. A Fast Pruning Algorithm for Optimal Sequence Alignment. Proceedings of The 2nd IEEE International Symposium on Bioinformatics & Bioengineering (BIBE'2001), November 2001.

Home