Sequence alignment is a crucial operation in bioinformatics, and
genetics research. We have applied our knowledge of single agent
search to the sequence alignment problem in biology.
In sequence alignment, two or more strings are aligned together
in order to get the highest number of matching characters. Gaps may
be inserted into a string in order to shift the remaining characters
into better matches. Typically a scoring function is used
to rank different alignments so that biologically plausible alignments
score higher. The task of optimal sequence alignment is to find the best
possible alignment for a given scoring function and set of sequences.
Sequence alignments are used to probe sequence databases for similar
sequences, for genetic disease research, construction of phylogenetic
trees, comparing functions between similar genes, and more.
A clever generalization of Hirschberg's Divide and Conquer
algorithm for sequence alignment allows a tradeoff between
the recomputation (affecting speed) and memory usage.
Kevin Charter, Jonathan Schaeffer and Duane Szafron.
Sequence Alignment using FastLSA,
International Conference on Mathematics and
Engineering Techniques in Medicine and Biological Sciences (METMBS'2000),
pp.239-245, 2000.
A* variants for Multiple Sequence Alignment
Matthew McNaughton is working on the very difficult problem of
multiple sequence alignment. Several sequences must be
aligned to find the global alignment between all sequences. The
size of the search space is explosive with the number of
sequences. There are several interesting variants of A* search
that can be used to attack this problem.
LBD-ALign (Linear Bounded Diagonal Alignment)
Aaron Davidson
has done some work on a bounded dynamic programming algorithm
for optimal pairwise sequence alignment. It builds on the
Needleman-Wunch dynamic programming algorithm, but adds pruning
of the DP-matrix. This approach marries the benefits of A*'s
pruning with dynamic programming's low overhead. Linear space
versions are discussed as well.