Database of sequence clusters obtained as a result of all-against-all BLAST search of proteins in 95 organism species.
Data name | README |
Description of data contents | HTML file to describe "Gclust Server" data. |
File | README_e.html(English) |
Data name | Amino acid sequences of predicted proteins and their annotation for 95 organism species. |
Description of data contents |
Amino acid sequences of predicted proteins and their annotation for 95 organism species. The data are given in a CSV format text file. |
File | gclust_seq.zip (152MB) |
Data item | Primary key | Foreign key | Description |
---|---|---|---|
Sequence ID | * | ID of a sequence | |
Cluster ID | * | ID of cluster. gclust_cluster is referenced. | |
Annotation in original database | Annotation at the original website | ||
Species | Species name | ||
Length | Amino acid sequence length | ||
Sequence | Amino acid sequence |
Data name | Cluster based on sequence comparison of homologous proteins of 95 organism species |
Description of data contents |
Clustering was performed by the method in which the round-robin BLAST search of the above amino acid sequence data is performed, the E-value and the overlap score (the All-against-all BLASTP search of the above amino acid sequence data, and heuristic estimation of a similarity threshold for homologs of each protein by entropy-optimized organism count method (Bioinformatics 2009 Mar 1;25(5):599-605.). The data are given in a CSV format text file. |
File | gclust_cluster.zip (8.72MB) |
Data item | Primary key | Foreign key | Description |
---|---|---|---|
Cluster ID | * | ID of cluster | |
Representative sequence ID | * | ID of a sequence that represents the cluster. gclust_seq is referenced. | |
Link to cluster sequences | Link to the list of sequences belonging to the cluster (empty space) | ||
Link to related sequences | Link to the list of sequences that are similar to the cluster, but not clustered | ||
Sequence length | Amino acid sequence length | ||
Representative annotation | Representative annotation of the cluster | ||
Number of Sequences | Number of sequences contained in the cluster | ||
Homologs | Number of sequences contained in the cluster | ||
Clustering threshold | The threshold of E-value used for clustering | ||
Plants (7species) (%) | The appearance rate of this cluster in the plant and algal group (including 7 species) | ||
Other bikonts (9 species) (%) | The appearance rate of this cluster in other Bikonta (Chromalveolata, Excavata) group (including 9 species) | ||
Cyano (25species) (%) | The appearance rate of this cluster in the cyanobacteria group (including 25 species) | ||
Photo Bact (15species) (%) | The appearance rate of this cluster in the photosynthetic bacteria group (including 15 species) | ||
Other Bact (31 species) (%) | The appearance rate of this cluster in the non-photosynthetic bacteria group (including 31 species) | ||
Opisthokonts (8species) (%) | The appearance rate of this cluster in the opisthokont group (including 8 species) | ||
Number of Sequences for each species | The number of sequences by organism species contained in the cluster. | ||
Species not appearing in this cluster | Organism species not contained in the cluster. |
Data name | Proteins in similarity relationship with the cluster |
Description of data contents |
Protein sequences that are similar to any clustered sequence of 95 organisms species, but not clustered. The data are given in a CSV format text file. |
File | gclust_related.zip (69MB) |
Data item | Description |
---|---|
Cluster ID | ID of cluster |
Sequence ID | ID of a sequence |
Data name | Amino acid sequences used for clusterintg (Multi FASTA format) |
Description of data contents |
Amino acid sequences of predicted proteins and their annotation for 95 organism species. FASTA format file. |
File | all95.fa.zip (161MB) |
Data name | Sequence ID and annotation information |
Description of data contents |
A tab-delimited text file specifying the ID, length and annotation information of the amino acid sequences of the predicted proteins for 95 organism species. |
File | all95.p.table.zip (7.28MB) |
Data item | Description |
---|---|
Field 1 | ID of amino acid sequence (Sequence ID) |
Field 2 | Length of amino acid sequence |
Field 3 | Annotation of amino acid sequence |
Data name | Prefix list for each organism |
Description of data contents |
List of prefixes for organisms used in Gclust. Each prefix is applied to the top of the sequence ID according to each organism. The first line specifies the number of organism species (95). From the second line, the prefix of each organism is listed on each line, and "//END" is entered on the last line. |
File | prefix_all95 (1KB) |
Prefix | Organism name |
---|---|
ATH | Arabidopsis thaliana |
CME | Cyanidioschyzon merolae |
CRE | Chlamydomonas reinhardtii |
OSA | Oryza sativa |
OTAU | Ostreococcus tauri |
PPT | Physcomitrella patens |
PoTR | Populus tricocarpa |
DPTM | Paramecium tetraurelia |
GTH | Guillardia theta |
NGR | Naegleria gruberi |
PFA | Plasmodium falciparum |
PHRA | Phytophthora ramorum |
PHSO | Phytophthora sojae |
PTR | Phaeodactylum tricornutum |
TET | Tetrahymena thermophila SB210 |
TPS | Thalassiosira pseudonana |
Ana | Anabaena sp. PCC 7120 |
Ava | Anabaena variabilis ATCC 29413 |
Glv | Gloeobacter violaceus |
Npun | Nostoc punctiforme sp. PCC73102 |
Pm1 | Prochlorococcus marinus MED4 |
Pm2 | Prochlorococcus marinus MIT9313 |
Pm3 | Prochlorococcus marinus SS120 |
Pm4 | Prochlorococcus marinus MIT9312 |
Pm5 | Prochlorococcus marinus NATL2A |
Pm6 | Prochlorococcus marinus MIT9301 |
Pm7 | Prochlorococcus marinus MIT9303 |
Pm8 | Prochlorococcus marinus MIT9315 |
Pm9 | Prochlorococcus marinus NATL1A |
PmA | Prochlorococcus marinus AS9601 |
S63 | Synechococcus sp. PCC 6301 |
S79 | Synechococcus sp. PCC 7942 |
S81 | Synechococcus sp. WH8102 |
S93 | Synechococcus sp. CC9311 |
S96 | Synechococcus sp. CC9605 |
Syn | Synechocystis sp. PCC 6803 |
Tel | Thermosynechococcus elongatus |
Ter | Trichodesmium erythraeum 405 1 |
YelA | Cyanobacterium Yellowstone A-prime |
YelB | Cyanobacterium Yellowstone B-prime |
Caur | Chloroflexus aurantiacus |
Cch | Chlorobium chlorochromatii CaD3 |
Clim | Chlorobium limicola DSM 245 |
Cph | Chlorobium phaeobacteroides DSM 266 |
Ctep | Clorobium tepidum |
Pvi | Prostheocochloris vibrioformis DSM 265 |
Rde | Roseobacter denitrificans Och 114 |
Rpa1 | Rhodopseudomonas plustris BisA53 |
Rpa2 | Rhodopseudomonas plustris BisB4 |
Rpa3 | Rhodopseudomonas plustris BisB18 |
Rpa4 | Rhodopseudomonas plustris HaA2 |
Rpal | Rhodopseudomonas plustris |
Rrub | Rhodospirillum rubrum ATCC 11170 |
Rsh | Rhodobacter sphaeroides ATCC 17029 |
Rsp | Rhodobacter sphaeroides 2.4.1 |
Afu | Archaeoglobus fulgidus DSM 4304 |
Ape | Aeropyrum pernix K1 |
Atu | Agrobacterium tumefaciens str. C58 |
Bja | Bradyrhizobium japonicum USDA 110 |
Bma | Burkholderia mallei ATCC 23344 |
Bms | Brucella suis 1330 |
Bpe | Bordetella pertussis Tohama I |
Bsu | Bacillus subtilis Marburg 168 |
Ccr | Caulobacter crescentus CB15 |
Cvi | Chromobacterium violaceum ATCC 12472 |
Eba | Azoarcus sp EbN1 |
Eco | Escherichia coli K-12 |
Fal | Frankia alni ACN14a |
Fra | Frankia sp. CcI3 |
Gox | Gluconobacter_oxydans_621H |
Hal | Halobacterium sp. NRC-1 |
Mac | Methanosarcina acetivorans str. C2A |
Mes | Mesorhizobium sp. BNC1 |
Mlo | Mesorhizobium loti MAFF303099 |
Mtu | Mycobacterium tuberculosis H37Rv |
Neq | Nanoarchaeum equitans Kin4-M |
Pho | Pyrococcus horikoshii OT3 |
Pst | Pseudomonas syringae pv. tomato str. DC3000 |
Rhe | Rhizobium_etli_CFN_42 |
Rle | Rhizobium leguminosarum |
Rso | Ralstonia solanacearum GMI1000 |
Sco | Streptomyces coelicolor A3(2) |
Sep | Staphylococcus epidermidis ATCC 12228 |
Sme | Sinorhizobium meliloti 1021 |
Sto | Sulfolobus tokodaii str. 7 |
Vvy | Vibrio vulnificus YJ016 |
CEL | Caenorhabditis elegans |
DCGR | Candida glabrata CBS138 |
DKLA | Kluyveromyces lactis NRRL Y-1140 |
DME | Drosophila melanogaster |
HSA | Homo sapiens |
SPO | Schyzosaccharomyces pombe |
S99 | Synechococcus sp. CC9902 |
NCR | Neurospora crassa 74-OR23-1A |
SCE | Saccharomyces cerevisiae |
Data name | Designation of organism group |
Description of data contents |
The definition for grouping 95 species of organism is specified. The first line specifies the number of organism species, and "//END" is entered on the final line. The line starting with "#" is a line for comment. Data are provided in a tab-delimited text file format. |
File | grp_def1 (1KB) |
Data item | Description |
---|---|
Field 1 | Prefix of the sequence ID of organism |
Field 2 | Group (Numbers from 1 to 6) |
Data name | Parameters for Organism Grouping |
Description of data contents |
The file designated with the threshold for the ratio of organism species showing homology in the organism species in each organism group when allocation to the organism group is made. For example, when the designated value is 0.5, the cluster is determined as belonging to the "Plants" group if the sequences of four or more organism species out of seven species in this organism group exist in the cluster. |
File | pat_def1 (1KB) |
Field 1 | Group number |
Field 2 | Designated value for allocation to organism group |
Field 3 | Group name |
Data name | Clustering results |
Description of data contents |
Results of running Gclust program. The data include such information as the requirements for running the program, the cluster ID, the threshold used for cluster grouping, the ID of the sequence belonging to the cluster and the sequence ID of the related group. |
File | all95m8.hom.1.zip (140MB) |
Lines 1 to 80: Requirements for running the gclust program. From line 81 on: Information for each cluster. END Related groupsFormat for Each Cluster
Group [Cluster ID]: [Number of sequences belonging to cluster] sequences. Final thr = [Threshold] Group [Cluster ID]: [Number of sequences belonging to cluster] sequences. Final thr = [Threshold] [ID of sequence belonging to cluster] [Sequence length] [Presence of homology between sequences within cluster] Number of rows in the number of sequences belonging to cluster)] [Part of annotation] (Individual information of the sequences belonging to the cluster is given.) … (List of related group) Related groups Related groups [Related cluster ID](Number of sequences belonging to the cluster ID on left]): [Sequence ID 0] END Related groups
Data name | Table of Cluster and Organism Species Number |
Description of data contents |
Cluster, representative sequence ID of cluster, its length, the number of sequences contained in the cluster, organism species, the number of sequences belonging to the cluster for each of 95 organism species, compiled into a tab-delimited text file format table. |
File | all95.tbl.zip (4.53MB) |
Number | Cluster ID |
ID | Sequence ID |
Length | Sequence length |
seqs | Number of sequences belonging to cluster |
homologs | Number of sequences belonging to cluster |
ATH | Number of sequences belonging to cluster in the Arabidopsis thaliana sequence |
OSA | Number of sequences belonging to cluster in the Oryza sativa sequence |
PoTR | Number of sequences belonging to cluster in the Populus tricocarpa sequence |
PPT | Number of sequences belonging to cluster in the Physcomitrella patens sequence |
CRE | Number of sequences belonging to cluster in the Chlamydomonas reinhardtii sequence |
OTAU | Number of sequences belonging to cluster in the Ostreococcus tauri sequence |
CME | Number of sequences belonging to cluster in the Cyanidioschyzon merolae sequence |
GTH | Number of sequences belonging to cluster in the Guillardia theta sequence |
PFA | Number of sequences belonging to cluster in the Plasmodium falciparum sequence |
PTR | Number of sequences belonging to cluster in the Phaeodactylum tricornutum sequence |
TPS | Number of sequences belonging to cluster in the Thalassiosira pseudonana sequence |
Ter | Number of sequences belonging to cluster in the Trichodesmium erythraeum 405 1 sequence |
Ana | Number of sequences belonging to cluster in the Anabaena sp. PCC 7120 sequence |
Ava | Number of sequences belonging to cluster in the Anabaena variabilis ATCC 29413 sequence |
Npun | Number of sequences belonging to cluster in the Nostoc punctiforme sp. PCC73102 sequence |
Syn | Number of sequences belonging to cluster in the Synechocystis sp. PCC 6803 sequence |
Glv | Number of sequences belonging to cluster in the Gloeobacter violaceus sequence |
Tel | Number of sequences belonging to cluster in the Thermosynechococcus elongatus sequence |
YelA | Number of sequences belonging to cluster in the Cyanobacterium Yellowstone A-prime sequence |
YelB | Number of sequences belonging to cluster in the Cyanobacterium Yellowstone B-prime sequence |
S63 | Number of sequences belonging to cluster in the Synechococcus sp. PCC 6301 sequence |
S79 | Number of sequences belonging to cluster in the Synechococcus sp. PCC 7942 sequence |
S81 | Number of sequences belonging to cluster in the Synechococcus sp. WH8102 sequence |
S93 | Number of sequences belonging to cluster in the Synechococcus sp. CC9311 sequence |
S96 | Number of sequences belonging to cluster in the Synechococcus sp. CC9605 sequence |
S99 | Number of sequences belonging to cluster in the Synechococcus sp. CC9902 sequence |
Pm1 | Number of sequences belonging to cluster in the Prochlorococcus marinus MED4 sequence |
Pm2 | Number of sequences belonging to cluster in the Prochlorococcus marinus MIT9313 sequence |
Pm3 | Number of sequences belonging to cluster in the Prochlorococcus marinus SS120 sequence |
Pm4 | Number of sequences belonging to cluster in the Prochlorococcus marinus MIT9312 sequence |
Pm5 | Number of sequences belonging to cluster in the Prochlorococcus marinus NATL2A sequence |
Pm6 | Number of sequences belonging to cluster in the Prochlorococcus marinus MIT9301 sequence |
Pm7 | Number of sequences belonging to cluster in the Prochlorococcus marinus MIT9303 sequence |
Pm8 | Number of sequences belonging to cluster in the Prochlorococcus marinus MIT9315 sequence |
Pm9 | Number of sequences belonging to cluster in the Prochlorococcus marinus NATL1A sequence |
PmA | Number of sequences belonging to cluster in the Prochlorococcus marinus AS9601 sequence |
Atu | Number of sequences belonging to cluster in the Agrobacterium tumefaciens str. C58 sequence |
Bja | Number of sequences belonging to cluster in the Bradyrhizobium japonicum USDA 110 sequence |
Bms | Number of sequences belonging to cluster in the Brucella suis 1330 sequence |
Ccr | Number of sequences belonging to cluster in the Caulobacter crescentus CB15 sequence |
Gox | Number of sequences belonging to cluster in the Gluconobacter_oxydans_621H sequence |
Mes | Number of sequences belonging to cluster in the Mesorhizobium sp. BNC1 sequence |
Mlo | Number of sequences belonging to cluster in the Mesorhizobium loti MAFF303099 sequence |
Rhe | Number of sequences belonging to cluster in the Rhizobium_etli_CFN_42 sequence |
Rle | Number of sequences belonging to cluster in the Rhizobium leguminosarum sequence |
Sme | Number of sequences belonging to cluster in the Sinorhizobium meliloti 1021 sequence |
Rpa1 | Number of sequences belonging to cluster in the Rhodopseudomonas plustris BisA53 sequence |
Rpa2 | Number of sequences belonging to cluster in the Rhodopseudomonas plustris BisB4 sequence |
Rpa3 | Number of sequences belonging to cluster in the Rhodopseudomonas plustris BisB18 sequence |
Rpa4 | Number of sequences belonging to cluster in the Rhodopseudomonas plustris HaA2 sequence |
Rpal | Number of sequences belonging to cluster in the Rhodopseudomonas plustris sequence |
Rrub | Number of sequences belonging to cluster in the Rhodospirillum rubrum ATCC 11170 sequence |
Rde | Number of sequences belonging to cluster in the Roseobacter denitrificans Och 114 sequence |
Rsh | Number of sequences belonging to cluster in the Rhodobacter sphaeroides ATCC 17029 sequence |
Rsp | Number of sequences belonging to cluster in the Rhodobacter sphaeroides 2.4.1 sequence |
Eco | Number of sequences belonging to cluster in the Escherichia coli K-12 sequence |
Pst | Number of sequences belonging to cluster in the Pseudomonas syringae pv. tomato str. DC3000 sequence |
Vvy | Number of sequences belonging to cluster in the Vibrio vulnificus YJ016 sequence |
Bsu | Number of sequences belonging to cluster in the Bacillus subtilis Marburg 168 sequence |
Sep | Number of sequences belonging to cluster in the Staphylococcus epidermidis ATCC 12228 sequence |
Fal | Number of sequences belonging to cluster in the Frankia alni ACN14a sequence |
Fra | Number of sequences belonging to cluster in the Frankia sp. CcI3 sequence |
Mtu | Number of sequences belonging to cluster in the Mycobacterium tuberculosis H37Rv sequence |
Sco | Number of sequences belonging to cluster in the Streptomyces coelicolor A3(2) sequence |
Rso | Number of sequences belonging to cluster in the Ralstonia solanacearum GMI1000 sequence |
Cvi | Number of sequences belonging to cluster in the Chromobacterium violaceum ATCC 12472 sequence |
Bma | Number of sequences belonging to cluster in the Burkholderia mallei ATCC 23344 sequence |
Bpe | Number of sequences belonging to cluster in the Bordetella pertussis Tohama I sequence |
Eba | Number of sequences belonging to cluster in the Azoarcus sp EbN1 sequence |
Caur | Number of sequences belonging to cluster in the Chloroflexus aurantiacus sequence |
Cch | Number of sequences belonging to cluster in the Chlorobium chlorochromatii CaD3 sequence |
Clim | Number of sequences belonging to cluster in the Chlorobium limicola DSM 245 sequence |
Cph | Number of sequences belonging to cluster in the Chlorobium phaeobacteroides DSM 266 sequence |
Ctep | Number of sequences belonging to cluster in the Clorobium tepidum sequence |
Pvi | Number of sequences belonging to cluster in the Prostheocochloris vibrioformis DSM 265 sequence |
Afu | Number of sequences belonging to cluster in the Archaeoglobus fulgidus DSM 4304 sequence |
Hal | Number of sequences belonging to cluster in the Halobacterium sp. NRC-1 sequence |
Mac | Number of sequences belonging to cluster in the Methanosarcina acetivorans str. C2A sequence |
Pho | Number of sequences belonging to cluster in the Pyrococcus horikoshii OT3 sequence |
Ape | Number of sequences belonging to cluster in the Aeropyrum pernix K1 sequence |
Sto | Number of sequences belonging to cluster in the Sulfolobus tokodaii str. 7 sequence |
Neq | Number of sequences belonging to cluster in the Nanoarchaeum equitans Kin4-M sequence |
SCE | Number of sequences belonging to cluster in the Saccharomyces cerevisiae sequence |
SPO | Number of sequences belonging to cluster in the Schyzosaccharomyces pombe sequence |
PHRA | Number of sequences belonging to cluster in the Phytophthora ramorum sequence |
PHSO | Number of sequences belonging to cluster in the Phytophthora sojae sequence |
DCGR | Number of sequences belonging to cluster in the Candida glabrata CBS138 sequenc |
DKLA | Number of sequences belonging to cluster in the Kluyveromyces lactis NRRL Y-1140 sequence |
NCR | Number of sequences belonging to cluster in the Neurospora crassa 74-OR23-1A sequence |
DPTM | Number of sequences belonging to cluster in the Paramecium tetraurelia sequence |
TET | Number of sequences belonging to cluster in the Tetrahymena thermophila SB210 sequence |
NGR | Number of sequences belonging to cluster in the Naegleria gruberi sequence |
HSA | Number of sequences belonging to cluster in the Homo sapiens sequence |
DME | Number of sequences belonging to cluster in the Drosophila melanogaster sequence |
CEL | Number of sequences belonging to cluster in the Caenorhabditis elegans sequence |
Annotations | Annotation |
The Standard License specifies the license terms regarding the use of this database and the requirements you must follow in using this database.
The Additional License specifies those items that are exceptionally permitted even though they are generally prohibited in the Standard License.
The Standard License for this database is the license specified in the
Creative Commons Attribution-Share Alike 2.1 Japan.
If you use data from this database, please be sure attribute this database as follows:
"Gclust Server, Copyright© 2008-2009 Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo licensed under CC Attribution-Share Alike 2.1 Japan".
The summary of the Creative Commons Attribution-Share Alike 2.1 Japan is found
here.
With regard to this database, you are licensed to:
under the Standard License, as long as you comply with the following conditions:
1. You must display this Additional License along with the Standard License when distributing any derivative work based on part of whole of the data from this database.
2. When you conduct research by using this database, and describe the research results in an article or paper, you always need to cite this database, and specify the name and URL of this database in the article or paper.
3.You need to contact the Licensor shown below to request a license for use of this database or any part thereof not licensed under the Standard License and the above Additional License.
Naoki Sato
Laboratory of Plant Functional Genomics, Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo
E-Mail: naokisat[at]bio[dot]c[dot]u-tokyo[dot]ac[dot]jp
You can freely provide links to all contents in this database. But, contents might be changed without notice.
Date | Update contents |
2010/03/29 | Gclust Server English archive site is opened. |
2009/8 | Data is updated. |
2006/6 | Gclust Server(http://gclust.c.u-tokyo.ac.jp/) is released. |
Naoki Sato
Laboratory of Plant Functional Genomics, Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo
E-Mail: naokisat[at]bio[dot]c[dot]u-tokyo[dot]ac[dot]jp