Single Nucleotide Polymorphism (SNP)

Supplementary materials for "Most parsimonious haplotype allele sharing determination" (BMC Bioinformatics):

  1. iLinker (a.k.a. SNPLink) executables;
  2. xPedPhase code calling PedPhase (readme);
  3. 500 10K genotype datasets with missing rate 0%, 0.5%, 1.0%, 1.5%, 2.0%, 2.5%, 3.0%, respectively;
  4. 500 50K genotype datasets with missing rate 0%, 0.5%, 1.0%, 1.5%, 2.0%, 2.5%, 3.0%, respectively.


  1. Z. Cai, H. Sabaa, Y. Wang, R. Goebel, Z. Wang, J. Xu, P. Stothard, and ——.
    Most parsimonious haplotype allele sharing determination.Open Access
    BMC Bioinformatics. 10(2009): 115.

  2. ——, Z. Wang, L. Wang, Y.-L. Lau, and W. Yang.
    Identification of linked regions using high-density SNP genotype data for linkage analyses.Open Access
    Bioinformatics. 24(1)(2008), 86-93.


  1. Using PedPhase for haplotype allele sharing determination - done/submitted
  2. Modifying PedPhase for one whole genome scan - done
  3. Implement a linear time zero-recombination haplotyping algorithm
  4. Improve iLinker for all tree pedigrees
  5. Improve PedPhase for all tree pedigrees - done
  6. Haplotype based quantitative association study
  7. Diary dataset
  8. Beef dataset (not ready yet)
  9. Combining pedigree and population datasets
  10. Crossover hotspot/pattern identification

Background: The genome of a species consists of chromosomes that are double strand DNAs. Genomes of differential individuals are almost identical, for example for human they share around 99.999% similarity, yet about one in thousands of loci they have different nucleotides, which contribute to the varieties of morphological features. These loci or sites are referred to as Single Nucleotide Polymorphisms (SNPs).

In general, one SNP site can be marked using its franking (conserved) segments of DNA, and thus its value (or state, or allele) can be determined via modern microarray technology - the SNP chips.

In diploid organisms, chromosomes come in pairs. The status of two alleles at a particular SNP locus of a pair of chromosomes is called a marker genotype. The genotype information at a bi-allelic SNP locus can be denoted using a set of two values from {A, B}. If the two alleles are the same, that is, AA or BB, the genotype is homozygous. Otherwise, it is heterozygous, i.e., AB. A haplotype consists of all alleles, one from each locus, that are on the same chromosome.

  • Haplotyping:
    At a single SNP site, haplotyping is to determine which one of the two alleles is paternal (and subsequently the other one is maternal). For homozygous sites, there is no haplotyping to talk about. Given a (population or pedigree) genotype dataset, haplotyping is to infer for each individual its paternal and maternal alleles.

  • Haplotype-Based Association Studies:
    There are three categories of association studies, case-control, quantitative (continuous), and categorical disease outcomes. Each can deal with only a single SNP and multiple SNPs, where the former is considered much easier than the latter general case. Most association study methods that deal with multiple SNPs are regression based, and they become either ineffective (undesirable results) or inefficient (exceptionally long computational time) with increasing numbers of SNPs [Bal06]. SNP tagging has been proposed to reduce the number of SNPs to the minimum while retaining as much as possible of the genetic variation of the full SNP set. However, in practice, tagging is only effective in capturing common variants.

    A popular strategy, suggested by the block-like structure of the human genome [AKS02,GSN02], is to use haplotypes to try to capture the correlation structure of SNPs in regions of little recombination [SRT02,TDW03,LZ06]. This approach can lead to analyses with significantly reduced degrees of freedom and, more importantly, haplotypes are able to capture the combined effects of tightly linked causal variants.

Been Used in:


  • General:
    1. The International HapMap Consortium.
      International HapMap Project.
      Nature. 426(2003), 789-796.
  • Zero-Recombination Haplotyping:
    1. O(m3n3): J. Li and T. Jiang.
      Efficient Rule-Based Haplotyping Algorithm for Pedigree Data.
      In Proceedings of the 7th Annual Conference on Research in Computational Molecular Biology (RECOMB'03). Pages 197-206, 2003.
    2. J. Li and T. Jiang.
      Computing the Minimum Recombinant Haplotype Configuration from Incomplete Genotype Data on a Pedigree by Integer Linear Programming.
      Journal of Computational Biology. 12(2005), 719-739.
    3. O(m n2 + n3log2n loglog n); O(m n + n3) loop-free particular; O(m n2 + n3) loop-free general: J. Xiao, L. Liu, L. Xia, and T. Jiang.
      Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-Free Mendelian Inheritance on a Pedigree.
      In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'07). Pages 655-664, 2007.
    4. O(m n) loop-free particular: M. Y. Chan, W. Chan, F. Chin, S. Fung, and M. Kao.
      Linear-Time Haplotype Inference on Pedigrees without Recombinations.
      In Proceedings of the 6th Annual Workshop on Algorithms in Bioinformatics (WABI'06)s. Pages 56-67, 2006.
    5. O(m n) loop-free particular; O(m n2) loop-free general: L. Liu and T. Jiang.
      A Linear-Time Algorithm for Reconstructing Zero-Recombinant Haplotype Configuration on Pedigrees without Mating Loops.
      Journal of Combinatorial Optimization. Submitted, 2008.
  • Pedigree Haplotyping:
  • Population Haplotyping:

  Last modified: April 21 2009 19:07:53  © Guohui Lin