Sample Scripts for Illumina 1M-Duo

This page is offering some sample scripts to you effective for the data processing of Illumina 1M-Duo. It is useful to make huge data of genotype and/or phenotype with PAGE-OM standard format.

dbSNP is operating on MS SQLServer. Moreover, only the script for SQLServer is being offered. Then, This page offers the script that can be used on MySQL.


Outline chart of procedure




1. Importing a set of illumina 1M-Duo marker information for RDB

This script helps you to create a table. For Human1M-Duov3_FinalMarkerList_1199187.txt from illumina web site. But this marker file does not contain sequence data. So, we had to get the sequence data from dbSNP database.

Example of script: create_table_tbl_array.sql

CREATE TABLE `tbl_array` (
  `rsname` varchar(15) DEFAULT NULL,
  `genomebuild` varchar(4) DEFAULT NULL,
  `chr` varchar(4) DEFAULT NULL,
  `mapinfo` varchar(12) DEFAULT NULL,
  `arrayid` varchar(9) DEFAULT NULL,
  `flank_down` varchar(255) DEFAULT NULL,
  `flank_up` varchar(255) DEFAULT NULL,
  `subsnp_id` varchar(20) DEFAULT NULL,
  `snp_id` varchar(20) DEFAULT NULL,
  KEY `k1` (`rsname`),
  KEY `k2` (`snp_id`),
  KEY `k3` (`subsnp_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Download: create_table_tbl_array.sql


2. Making a local copy of dbSNP tables

These scripts help you to create over one hundred tables of local copy of dbSNP on MySQL.

Example of script: 1_shared_schema_dbSNP_main_table.sql

CREATE TABLE Allele (
	allele_id int NOT NULL ,
	allele varchar (255) NOT NULL ,
	create_time datetime NOT NULL ,
	rev_allele_id int NULL ,
	src varchar (10) NULL ,
	last_updated_time datetime NULL 
)
;


CREATE TABLE AlleleFlagCode (
	code tinyint NOT NULL ,
	abbrev varchar (12) NOT NULL ,
	descrip varchar (255) NOT NULL ,
	create_time datetime NOT NULL 
)
;


CREATE TABLE AlleleMotif (
	allele_id int NOT NULL ,
	motif_order int NOT NULL ,
	motif_id int NOT NULL ,
	repeat_cnt real NOT NULL ,
	create_time datetime NOT NULL 
)
;

...
...
Download: create_tables_sql.zip


3. Importing a set of dbSNP data into dbSNP tables of local RDB

This script helps you to load data into many tables.

Example of script: load_data.txt

LOAD DATA LOCAL INFILE '/home/pageom/70_ftp_NCBI_dbSNP/organism_data/human_9606/AlleleFreqBySsPop.bcp' INTO TABLE AlleleFreqBySsPop;
LOAD DATA LOCAL INFILE '/home/pageom/70_ftp_NCBI_dbSNP/organism_data/human_9606/Batch.bcp' INTO TABLE Batch;
LOAD DATA LOCAL INFILE '/home/pageom/70_ftp_NCBI_dbSNP/organism_data/human_9606/BatchCita.bcp' INTO TABLE BatchCita;
...
...
Download: load_data.txt


4. Importing the result of typing data of the illumina 1M-Duo

This script helps you to create the result table of all typing data.

Example of script: create_table_tbl_fr_recluster.sql

CREATE TABLE `tbl_fr_recluster3` (
  `rsid` varchar(12) DEFAULT NULL,
  `rgsampleid` varchar(10) DEFAULT NULL,
  `allele` varchar(2) DEFAULT NULL,
  KEY `k2` (`rgsampleid`(8)),
  KEY `k1` (`rsid`(7))
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Download: create_table_tbl_fr_recluster.sql


5. Generating PaGE-OM XML file from RDB of illumina 1M-Duo

This sample Perl script helps you to extract data from MySQL by an example. The user table is necessary to execute it appropriately.

Example of script: makexml_gv_full.pl

#!usr/bin/perl

#########################
# xmlmake_gv_full.pl
#

use DBI;

$ds = 'DBI:mysql:pageomdb;host=localhost;port=3306';
$user = 'pageom';
$pass = 'pageompassxxxx';

$string = 'rs112078';

$db = DBI->connect($ds, $user, $pass) || die "Got error $DBI::errstr when connecting to $ds\n";

# for test
$sth = $db->prepare("select * from tbl_array WHERE rsname like \'$string\%\' and subsnp_id is not null");

# real thing
#$sth = $db->prepare("select * from tbl_array WHERE subsnp_id is not null");

$sth->execute;

#print "Content-type: text/html;\n\n";
while(@row = $sth->fetchrow_array) {

# for debug

#	print "@row\n";

#	print $row[0] . "\n";
#	print $row[1] . "\n";
#	print $row[2] . "\n";

	print "\t\n";
	print "\t\t$row[0]\n";
	print "\t\tSNP\n";
...
...
Download: makexml_gv_full.pl


About the processing time of RDB

We used the following computer.
CPUInter Pentium Dual CPU E2160 @ 1.80GHz
Memory5.0GB
HDD1.5TB
HostOSWindows Server 2008 x86
VMVMware Server 2.01
GuestOSUbuntu Server 8.04 LTS
GuestOS Memory4.0GB
RDBMSMySQL 5.0.51

It took quite a lot of time for some processing. For example: