D-HaploDB

2016/12/13

Web Site: (Closed)http://orca.gen.kyushu-u.ac.jp/
HTTPS Site: https://dbarchive.biosciencedbc.jp/data/dhaplodb/

This database presents true haplotypes and LD structures of Japanese genome, determined using DNA samples obtained from complete hydatidiform moles.

README Content

  1. Database Component
  2. Data Description
  3. License
  4. Update History
  5. Literature
  6. Contact address

1. Database Component

  1. README
  2. SNP List (Phase II)
  3. SNP List (Phase III)
  4. LD bin list (Phase II)
  5. LD bin list (Phase III)
  6. Genotype Data (Phase II)
  7. Genotype Data (Phase III)
  8. LD_bin Data (Phase II)
  9. LD_bin Data (Phase III)
Return to Top

2. Data Description

2.1 README

Data name README
Description of data contents HTML file to describe "D-HaploDB" data.
File README_e.html (English)
Return to Top

2.2 SNP List (Phase II)

Data name SNP List (Phase II)
Description of data contents

A list of SNPs in D2 (Phase II). SNP genotypes in D1 (Phase I, Perlegen 281K SNPs) and those determined using Affymetrix 500K Array for overlapping 74 CHM samples were merged and QC'ed. LD bins were then determined.

File dhaplo_d2_snp_list.zip (16.6MB)

Data items are the following:
Data itemDescription
RefSNP ID RefSNP ID (rs number) given by dbSNP. (Linked to dbSNP in Quick Search)
Affy/Perlegen ID SNP ID given by Affymetrix or Perlegen
Chromosome Chromosome number that each SNP resides
Position Chromosomal nucleotide position (NCBI Build 35) of each SNP
Alleles Alleles
MAF Minor allele frequency
Genotypes Genotypes for the 74 CHM samples
LD bin Name of LD bin (Linked to LD bin list in Quick Search)
tagSNP The flag that indicates whether the SNP is a tagSNP or not.
1: tagSNP
0: non-tagSNP
-: SNP not included in LD bin calculation (MAF<0.05) td="">
Best tagSNP The flag that indicates whether the SNP is the best tagSNP or not.
1: Best tagSNP
0: non-best tagSNP
-: SNP not included in LD bin calculation (MAF<0.05) td="">
Return to Top

2.3 SNP List (Phase III)

Data name SNP List (Phase III)
Description of data contents

A list of SNPs in D3 (Phase III). The data is essentially the same as those described in Kukita et al. (2010) paper, but contains additional samples (CHM010 and CHM035) because their data were judged to be acceptable with regard to genotypes, though they were excluded at QC steps in the previous report. LD bins were then determined.

File dhaplo_d3_snp_list.zip (23.9MB)

Data items are the following:
Data itemDescription
RefSNP ID RefSNP ID (rs number) given by dbSNP. (Linked to dbSNP in Quick Search)
Affy/Perlegen ID SNP ID given by Affymetrix
Chromosome Chromosome number that each SNP resides
Position Chromosomal nucleotide position (NCBI Build 36) of each SNP
Alleles Alleles
MAF Minor allele frequency
Genotypes Genotypes for the 87 CHM samples
LD bin Name of LD bin (Linked to LD bin list in Quick Search)
tagSNP The flag that indicates whether the SNP is a tagSNP or not.
1: tagSNP
0: non-tagSNP
-: SNP not included in LD bin calculation (MAF<0.05) td="">
Best tagSNP The flag that indicates whether the SNP is the best tagSNP or not.
1: Best tagSNP
0: non-best tagSNP
-: SNP not included in LD bin calculation (MAF<0.05) td="">
Return to Top

2.4 LD bin list (Phase II)

Data name LD bin list (Phase II)
Description of data contents

LD bin list of D2 (Phase II). LD bin is a group of SNPs that mutually shows high LD (r2 > 0.8). See below for detail.

File dhaplo_d2_ld_bin_list.zip (3.0MB)

Data items are the following:
Data itemDescription
LD bin Name of LD bin
Chromosome Chromosome number each LD bin resides (Chr1 - Chr22, ChrX)
Position Start Start position of LD bin (nucleotide position in each chromosome, according to NCBI Build 35)
Position End End position of LD bin (nucleotide position in each chromosome, according to NCBI Build 35)
SNPs Count Number of SNPs in LD bin
tagSNPs Count Number of tagSNPs in LD bin
Best tagSNP tagSNP that showed the highest mean r2, given by RefSNP ID (rs number)
Return to Top

2.5 LD bin list (Phase III)

Data name LD bin list (Phase III)
Description of data contents

LD bin list of D3 (Phase III). LD bin is a group of SNPs that mutually shows high LD (r2 > 0.8). See below for detail.

File dhaplo_d3_ld_bin_list.zip (3.1MB)

Data items are the following:
Data itemDescription
LD bin Name of LD bin
Chromosome Chromosome number each LD bin resides (Chr1 - Chr22, ChrX)
Position Start Start position of LD bin (nucleotide position in each chromosome, according to NCBI Build 36)
Position End End position of LD bin (nucleotide position in each chromosome, according to NCBI Build 36)
SNPs Count Number of SNPs in LD bin
tagSNPs Count Number of tagSNPs in LD bin
Best tagSNP tagSNP that showed the highest mean r2, given by RefSNP ID (rs number)
(Link to dbSNP available in quick search)
Return to Top

2.6 Genotype Data (Phase II)

Data name Genotype Data (Phase II)
Description of data contents

A list of SNP genotypes in D2(Phase II). SNP genotypes in D1 (Phase I, Perlegen 281K SNPs) and those determined using Affymetrix 500K Array for overlapping 74 CHM samples were merged and QC'ed.

File mole_info_DhaploD2.txt.gz (13.7MB)

Data items are the following:
Data ItemDescription
rs RefSNP accession ID (rs number)
chr Chromosome number that the SNP resides (1 - 22, X)
pos Nucleotide position on chromosome that the SNP resides
allele1 allele 1
allele2 allele 2
gtype genotypes of 74 samples of CHMs
Return to Top

2.7 Genotype Data (Phase III)

Data name Genotype Data (Phase III)
Description of data contents

Genotype data (876K SNPs, 87 samples). Essentially the same as described in Kukita et al. paper (2010), except that two additional samples (CHM010 and CHM035) were included. No CNV information is included in the download data.

File mole_info_DhaploD3.txt.gz (23.8MB)

Data items are the following:
Data ItemDescription
chr Chromosome number (1-22,X)
sample Sample (CHM) name
rs RefSNP accession ID (rs number)
pos Nucleotide position on chromosome
allele1 allele 1
allele2 allele 2
gtype genotypes of 87 CHM samples
ss Unique ID, given by Affymetrix
Return to Top

2.8 LD_bin Data (Phase II)

Data name LD_bin Data (Phase II)
Description of data contents

Results of LD bin calculations for D2 (Phase II) data sets. Files are in GFF format, and contains two kinds of lines, that are distinguishable by column# 3.
- LD_BIN line: SNPs in LD bin. tagSNPs and best tagSNPs (*) are marked.
- LD_BIN_BOUNDARIES: Limit of LD bin
*SNP that shows the highest mean r2 among the SNPs in the bin

File bin_2R80M5.gff.gz (12.1MB)

Data items are the following:
Column NumberDefinition in GFF formatDescription
#1 seqname Chromosome that the SNP resides (e.g. Chr1)
#2 source name of dataset (e.g. CHM_2R80M5Z)
#3 feature description of data. SNP information or LD bin boundary(e.g. LD_BIN, LD_BIN_BOUNDARIES)
#4 start Chromosomal position of SNP or start position of bin (NCBI Build 35)
#5 end Chromosomal position of SNP or end position of bin
#6 score LD_BIN line: 2 for Best tagSNP, and 1 for tagSNP, and 0 for other SNP.
LD_BIN_BOUNDARIES line: always "."
#7 strand always "+"
#8 frame always ","
#9 attributes

This column contains the following items.
- ld_bin: LD bin name
- RSID: RefSNP ID (rs number)
- tagging: flag that indicates tagSNP
- besttag: flag that indicates Best tagSNP
- SNPID: unique ID given by Affymetrix
(e.g. ld_bin 2R80M5Z_10_1 ; RSID rs16930466 ; tagging 1 ; besttag 0 ; SNPID SNP_A-2110939)

Return to Top

2.9 LD_bin Data (Phase III)

Data name LD_bin Data (Phase III)
Description of data contents

Results of LD bin calculations for D3 (Phase III) data sets. Files are in GFF format, and contains two kinds of lines, that are distinguishable by column# 3.
- LD_BIN line: SNPs in LD bin. tagSNPs and best tagSNPs (*) are marked.
- LD_BIN_BOUNDARIES: Limit of LD bin
*SNP that shows the highest mean r2 among the SNPs in the bin

File bin_3R80M5Zb36.gff.gz (12.8MB)

Data items are the following:
Column NumberDefinition in GFF formatDescription
#1 seqname Chromosome that the SNP resides (e.g. Chr1)
#2 source name of dataset (e.g. CHM_3R80M5Z)
#3 feature

description of data. SNP information or LD bin boundary (e.g. LD_BIN, LD_BIN_BOUNDARIES)

#4 start Chromosomal position of SNP or start position of bin (NCBI Build 36)
#5 end Chromosomal position of SNP or end position of bin
#6 score LD_BIN line: 2 for Best tagSNP, and 1 for tagSNP, and 0 for other SNP.
LD_BIN_BOUNDARIES line: always "."
#7 strand always "+"
#8 frame always "."
#9 attributes

This column contains the following items
- ld_bin: LD bin name
- RSID: RefSNP ID (rs number)
- tagging: flag that indicates tagSNP
- besttag: flag that indicates Best tagSNP
- SNPID: unique ID given by Affymetrix
(e.g. ld_bin 3R80M5Z_10_2 ; RSID rs16930466 ; tagging 1 ; besttag 0 ; SNPID SNP_A-2110939)

Return to Top

3. License

Last updated : 2011/08/25

You may use this database in compliance with the terms and conditions of the license described below. The license specifies the license terms regarding the use of this database and the requirements you must follow in using this database.

Creative Commons License
The license for this database is specified in the Creative Commons Attribution-Share Alike 2.1 Japan.
If you use data from this database, please be sure attribute this database as follows: "D-HaploDB © Kenshi Hayashi (Kyushu Univ.) licensed under CC Attribution-Share Alike 2.1 Japan".

The summary of the Creative Commons Attribution-Share Alike 2.1 Japan is found here.

With regard to this database, you are licensed to:

  1. freely access part or whole of this database, and acquire data;
  2. freely redistribute part or whole of the data from this database; and
  3. freely create and distribute database and other derivative works based on part or whole of the data from this database,
under the license, as long as you comply with the following conditions:

  1. You must attribute this database in the manner specified by the author or licensor when distributing part or whole of this database or any derivative work.
  2. You must distribute any derivative work based on part or whole of the data from this database under the license.
  3. You need to contact the Licensor shown below to request a license for use of this database or any part thereof not licensed under the license.

Tomoko Tahira
Kinjo Gakuin University
E-mail: E-mail: ttahira[at]kinjo-u[dot]ac[dot]jp

Return to Top

4. Update History

DateUpdate contents
2016/12/13 Description of the original site is updated.
2011/09/22 D-HaploDB English archive site is opened.
2005/07/20 D-HaploDB (http://orca.gen.kyushu-u.ac.jp/) is released.
Return to Top

5. Literature

Kukita Y, Miyatake K, Stokowski R, Hinds D, Higasa K, Wake N, Hirakawa T, Kato H, Matsuda T, Pant K, Cox D, Tahira T, Hayashi K.
Genome-wide definitive haplotypes determined using a collection of complete hydatidiform moles.
Genome Res. 2005 Nov;15(11):1511-8.
PMID: 16251461

Higasa K, Miyatake K, Kukita Y, Tahira T, Hayashi K.
D-HaploDB: a database of definitive haplotypes determined by genotyping complete hydatidiform mole samples.
Nucleic Acids Res. 2007 Jan;35(Database issue):D685-9.
PMID: 17166862

Higasa K, Kukita Y, Kato K, Wake N, Tahira T, Hayashi K.
Evaluation of haplotype inference using definitive haplotype data obtained from complete hydatidiform moles, and its significance for the analyses of positively selected regions.
PLoS Genetics, 2009 May;5(5):e1000468.
PMID: 19424418

Kukita Y, Yahara K, Tahira T, Higasa K, Sonoda M, Yamamoto K, Kato K, Wake N, Hayashi K.
A definitive haplotype map as determined by genotyping duplicated haploid genomes finds a predominant haplotype preference at copy-number variation events.
Am. J. Hum. Genet. 2010 Jun;86(6):918-28.
PMID: 20537301

Return to Top

6. Contact address

When you have any question about "D-HaploDB", contact the following:

Tomoko Tahira
Kinjo Gakuin University
E-mail: E-mail: ttahira[at]kinjo-u[dot]ac[dot]jp

Return to Top