2008.1.29. by Tomoko Tahira



Data of dbQSNP is downloadable as two kinds of files. 
Sequences of SNPs and its flanks are provided as "dbQSNP.#.fasta.txt" that are made from output of sequence analysis of the STSs (KG....) where SNPs are located. (# is version number in two digit)
Allele frequencies determined by PLACE-SSCP analysis are provided as "frequency.#.txt". (# is version number in two digit)

*****************************************************************************
In dbQSNP.#.fasta.txt, header line starts with '>' and has the following fields, each separated by '|'. 
  gnl: Object_type (general). 
  dbQSNP: Database name.
  QH#: dbQSNP ID.
  allele_Pos: Variation allele position (1st base of variation) on the fasta sequence. It is always the 5' length plus 1. 
  total_Len: Total number of bases of the fasta sequence, a sum of length of 5', 3' and variation (A length of variation can be more than 1 in the case of ins/del polymorphisms).
  taxid: NCBI taxonomy id 
  KG*: Internal STS ID
  alleles: Lists alleles of the SNP separated by '/'.

For example,

> gnl|dbQSNP|QH00012|allele_Pos=154|total_Len=269|taxid=9606|KG00B0007|alleles='g/t'
agggaggaggaggaaaaccctccctgggactgtgcactgccagctggggctcgggaaagcatggagtctgaattcgccctcagacctgggctggaaagctcagacagggaagtcaaagactgtggccccggaggctggccggggcagtcagagktgcttctggaaggacccagctgagtccaggcagagagagggcaaggttgagcaccaggcgccccagatcccggggggtattgaaatgggcatctttgagcagatgacctgcagga

*****************************************************************************
Frequency.#.txt is tab-delimited, and the content of each column is indicated in the first line. Those are:

  dbQSNP_id: QH# 
  strand: Orientation of the STS compared with contig sequence of GenBank (NT sequence). Note that SNPs are always described as sequences in forward strand of the STS.
  contig_id: ID of genomic contigs that showed highest homology to the STS
  rs#: dbSNPaccession for the refSNP
  allele_1: sequence of allele 1
  allele_2: sequence of allele 2
  AF_allele_1: allele frequency of allele 1
  AF_allele_2: allele frequency of allele 1 
  pool_id: ID of pooled DNA (see below)
  #individual: The number of individuals examined

***************************************************************************************
ID of pooled DNA is as follows.

Caucasian pools
  CP              : CEPH parents (78 individuals) (http://www.cephb.fr/cephdb/) 
  COP             : Caucasian (100 individuals) from CORIELL (http://locus.umdnj.edu/nigms/) 
  CAU200          : Caucasian (200 individuals) from CORIELL

Japanese pools
 JPK                   : Control (110 individuals) collected locally
 JPK2                  : Control (100 individuals) collected locally
 JPK3                  : Control (100 individuals) collected locally
 JSA426                : Control (426 individuals) collected locally
 JCM253                : Control (253 individuals) collected locally
 NCEP or NCEP135       : control (age < 50, 135 individuals) collected in local area A
 NCL_P or NCLP134      : control (age >= 50, 134 individuals) collected in local area A
 NCP269 or NCP269super : control (269 individuals) collected in local area A
 JNCE_SP               : control (age < 50, 162 individuals) collected in local area B
 JNCL_SP               : control (age >= 50, 186 individuals) collected in local area B
 BREP                  : Breast cancer patients (age < 50, 68 individuals) collected in local area A 
 BRLP                  : Breast cancer patients (age >= 50, 93 individuals) collected in local area A
 JBRE_SP               : Breast cancer patients (age < 50, 161 individuals) collected in local area B 
 JBRL_SP               : Breast cancer patients (age >= 50, 210 individuals) collected in local area B
 SLEP264*              : SLE patients (264 individuals)
 SLEP183super          : SLE patients (183 individuals)
***************************************************************************************

*SLEP264, SLEP264SP, SLEP264super indicate the same pool.

"Note"
Some pool IDs are followed by appendages, like #1,#2, #3. These indicate repeated assays 
performed on each pool. When allele frequencies obtained by these assays are averaged 
(this may not always be the case.), the pool IDs are connected by "/" (e.g., NCP269#1/NCP269#2) 
in the frequency data.
