SNPLink: SNP for Linkage Analysis


  • SNPLink deterministically identifies the chromosomal regions shared by family members, using the high density SNP genotype data. The current version — v2.1 or below — handles (only) "tree pedigrees", with either one or a couple of funders. With disease information for members provided, SNPLink unambiguously locates their shared chromosomal region(s), where the responsible gene(s) might reside.
    Extension to more general pedigrees is undergoing.


  1. Identification of Linked Regions Using High Density SNP Genotype Data for Linkage Analysis.
    G.-H. Lin, Z. Wang, L. Wang, Y.-L. Lau, and W. Yang.
    Submitted for publication. July 2, 2007.

Release Notes

  1. October 6, 2007:
    SNPLink version 2.1 released: for Windows, for Linux.
  2. July 3, 2007:
    SNPLink version 2.0 released.

Required Input Files [Pedigree File, Genotype File, Locus File]

  1. Pedigree file:
    Such a file describes the pedgiree structure. The first line contains a single number, which is the total number of family members in the pedigree. The other rows contain 5 columns, person, father, mother, sex (male, female), and disease status (affected, unaffected). Basically, every such row specifies the parents for the particular person, where a '-1' indicates 'unknown/missing'. Pedigree1.txt describes the sample pedigree used in the above submission.
  2. Genotype file:
    This file contains columns of genotypes, exactly one for each family member. Therefore, the number of columns is equal to the number of family members. The first row starts with '>' sign, and contains the individual's names, as used in the pedigree file. In succeeding rows, every column entry contains two characters from 'A', 'B', and '?', where a '?' indicates a missing value. 'A' and 'B' can be replaced by other pairs of letters used for describing genotype values. Here is the sample Genotype File used in the submission for producing figures.
  3. Physical locus file:
    This file contains the physical loci for the SNP markers, and its number of rows should be equal to the numbers of rows in the genotype file minus 1. Correspondingly, here is the sample Locus File associated with the sample genotype file.

Command Line Options

  • SNPLink parses commands through options, and therefore the order of input files is irrelevant. The basic command line looks like:

    SNPLink [-b | -c] -p pedigree1.txt -g genotype1.txt -l locus1.txt

    The basic SNPLink determines the haplotype allele sharing status among the family members as it is.

  • In some cases, it could be that at a particular site there are breakpoint crossover in too many family members, which is unlikely.

    With option '-b', SNPLink tries to push such breakpoints to the grandparental haplotypes and re-assigns the haplotypes for the concerned family members.

  • In some other cases, the genotype data might contain obvious errors resulting in two close (less than 1Mbp) but unlikely (less than 3 supporting markers) breakpoints in an individual.

    With option '-c', SNPLink corrects these errors by revising the haplotype phase assignment, but without changing the genotype data.

  • SNPLink always generates 'output.txt' containing all the detailed intermediate output (see below).

Output Files

  • Paternal/Maternal Allele Sharing:
    We have not yet constructed user friendly GUI for SNPLink. The program generates allele sharing status in text files and the corresponding GNU plot files for making plots of sharing. For example, using 'paternal_allele_sharing.gnu' and 'paternal_allele_sharing.txt', the user will be able to generate 'paternal_allele_sharing.eps' which displays the paternal allele sharing among the involved members.

    If you run SNPLink in Linux, you will get the .eps files when the program terminates. Otherwise, you may run the following command to generate the .eps files separately:

    gnuplot paternal_allele_sharing.gnu

  • Third Generation Allele Sharing:
    Using Mendelian inheritance rules, a portion of grandparental allele sharing among the third generation members can be unambiguously deduced. SNPLink also generates 'grand_allele_sharing.txt' which describes such sharing returned from the program.
  • All Intermediate Output:
    For those users interested in seeing all the intermediate output, the 'output.txt' for this purpose. One should note that SNPLink uses '0/1' to represent 'paternal/maternal' haplotype.


The development of SNPLink is a joint effort.
GL was partially supported by CFI and NSERC; LW was supported by RGC of HKSAR; YLL and WY were partially supported by The Shun Tak District Min Yuen Tong of Hong Kong.


Please email for any additional questions you might have.

  Last modified: October 06 2007 12:23:41  © Guohui Lin