Profile Hidden Markov Model Resources

For Protein Modeling in Bioinformatics


Disclaimer: This is a collection of links and pages I gathered while doing a literature review on HMMs in bioinformatics for CMPUT 606. I do not consider myself to be an expert on the subject, but I thought it would be useful to others to have a collection of starting points listed somewhere.

Profile HMMs are statistical tools that can model the commonalities of the amino acid sequences for a family of proteins. Considered to be more expressive than a standard consensus sequence or a regular expression, profile HMMs allow position dependent insertion and deletion penalties, as well as the option to use a separate distribution for inserted portions of the amino acid sequence. Once a model is trained on a number of amino acid sequences from a given family or group, it is most commonly used for three purposes:

  1. By aligning sequences to the model, one can construct multiple alignments.
  2. The model itself can offer insight into the characteristics of the family when one examines the structure and probabilities of the trained HMM.
  3. The model can be used to score how well a new protein sequence fits the family motif. For example, one could train a model on a number of proteins in a family, and then match sequences in a database to that model in order to try to find other family members. This technique is also used to infer protein structure and function.

Some particularly useful links to start with:

  • New Link: Instead of making your own models, you can try searching with a target amino acid sequence against a database of models at the Pfam database. References can be found on their site, and on Sean Eddy's page

Listed below are links to several profile programs that make use of HMMs.

SAM SAM: Sequence Alignment and Modeling System This page also contains links to many papers on the technologies behind SAM and Profile HMMs in general.
HMMER Sean Eddy Lab Home Page Links to publications in and around this subject can be found here.
HMMpro Net-ID, Inc: Bioinformatics and Data Mining A list of publications can be found here.
An online copy of Baldi and Brunak's book: "Bioinformatics: the Machine Learning Approach" is available at NetLibrary - but you must register with them with a U of A IP address before you can access it off campus.
GENEWISE The Sanger Institute : Informatics Software: Wise2 Still looking for publications.
PROBE FTP Link One publication is found here. Originally from Jun Liu's Homepage.
META-MEME Meta-MEME Source Code, Version 1.0 This page also contains links to instructions for the software and a paper on the technology
PFTOOLS No link yet. Will post publications as they become available.

Site maintained by: Colin Cherry.

Ye Olde Nav Bar