HIV-1 Subtyping: Supplementary Materials

References

  1. X. Wu, Z. Cai, X.-F. Wan, T. Hoang, R. Goebel, and ——.
    Nucleotide Composition String Selection in HIV-1 Subtyping Using Whole Genomes.
    Bioinformatics.
    Submitted on November 11, 2006.

Release Notes

  1. April 28, 2007: The following supplementary materials were released:
    • The list of 331 HIV-1 recombinant strains, the genomic sequences, and the statistics:
      • 196 CRF01AE
      • 52 CRF02AG
      • 3 CRF03AB
      • 3 CRF04CPX
      • 3 CRF05DF
      • 8 CRF06CPX
      • 7 CRF07BC
      • 4 CRF08BC
      • 5 CRF09CPX
      • 3 CRF10CD
      • 10 CRF11CPX
      • 10 CRF12BF
      • 6 CRF13CPX
      • 7 CRF14BG
      • 5 CRF1501B (AE/B)
      • 2 CRF16A2D
      • 4 CRF18CPX
      • 3 CRF19CPX
    • The accession number list of 91 HIV-1 recombinant strains with deterministic forms (in bold), and the accession number list of 240 HIV-1 recombinant strains with ambiguous forms.
    • The prediction accuracy plot of the 91 HIV-1 recombinant strains by our method, where 1,900 subtypes were reported for each strain, and by the NCBI Genotyping Tool, where for each sliding window the two closest subtypes were reported.
    • The list of 65 HIV-1 recombinant strains among the 331 and 3 extra are used as reference recombinant strains in the NCBI Genotyping Tool; The list of the other 266 HIV-1 recombinant strains.
    • Merging these 68 reference recombinant strains with our 42 reference pure subtype strains, and using those 5,000 selected nucleotide strings, our method can predicted correctly 242 out of the 266 recombinant strains. The following 24 strains were all predicted to CRF02AG:
      • AF197341
      • AY358045
      • AY358063
      • AY444805
      • AY945727
      • DQ859180
      • EF036530
      • EF036532
      • EF036536
      • AY227107
      • AY535659
      • DQ400856
      • AY037272
      • AY536238
      • AY771588
      • AY771589
      • AY781128
      • DQ845387
      • DQ845388
      • AY586540
      • AY586541
      • AY894993
      • AY588971
      • AY894994
    • Among these 825 pure subtype strains, the NCBI Genotyping Tool assigned incorrect subtypes for (2 A + 2 G = 4) strains:
      • AF413987
      • AF193275
      • AF423760
      • AF450098
  2. March 9, 2007: The following supplementary materials were released:
    • The list of 42 HIV-1 pure subtype reference strains, and the genomic sequences.
    • The CIV outgroup strain is AF447763.
    • The list of 825 HIV-1 pure subtype independent testing strains, and the genomic sequences.
    • Among these 825 pure subtype strains, BioAfrica/REGA did not assign subtype for (4 A + 3 D = 7) strains:
      • AM000053
      • AM000054
      • AM000055
      • DQ083238
      • DQ054367
      • AF133821
      • AY773340
      STAR did not assign subtype for (17 A + 5 B + 4 C + 4 D + 1 G = 31) strains:
      • AF286237
      • AF286238
      • AM000053
      • AM000054
      • AM000055
      • AY521629
      • AY521630
      • AY521631
      • AY829203
      • AY829205
      • AY829206
      • AY829208
      • AY829209
      • AY829212
      • DQ083238
      • DQ207944
      • DQ396400
      • AY781125
      • AY839827
      • DQ085869
      • DQ085870
      • DQ383755
      • AY727526
      • AY727527
      • DQ164125
      • DQ164128
      • AY773339
      • AY773340
      • AY773341
      • U88822,
      • DQ168573
    • The list of 7 HIV-1 recombinant strains, and the genomic sequences.
    • Java executables:

      HivTrainTestFrequency.jar will output three files, one contains the specified number of nucleotide strings and their relative entropies, one contains the pairwise distances between each testing strains and every training strains, and the other contains the pairwise distances between testing strains in PHYLIP format.

      Example command:

      $ java -Xms512m -Xmx4096m -jar HivTrainTestFrequency.jar -P Reference42 -E Testing825 -L 21 -T 500

      The parameters are:

      -P the path where the reference dataset locates
      -E the path where the reference dataset locates
      -L the maximum string length
      -T the number of top ranked strings

Acknowledgments

This research is partially supported by AICML, CFI, iCORE, NSERC, and the University of Alberta.

Feedback

Please email ghlin[at]cs[dot]ualberta[dot]ca for any additional questions you might have.


  Last modified: April 28 2007 12:56:26  © Guohui Lin