[ Japanese | English ]
About This Database

Cluster based on sequence comparison of homologous proteins of 95 organism species

Data description
Data name
Cluster based on sequence comparison of homologous proteins of 95 organism species
DOI
10.18908/lsdba.nbdc00464-002
Description of data contents
Clustering was performed by the method in which the round-robin BLAST search of the above amino acid sequence data is performed, the E-value and the overlap score (the All-against-all BLASTP search of the above amino acid sequence data, and heuristic estimation of a similarity threshold for homologs of each protein by entropy-optimized organism count method (<a href="http://gclust.c.u-tokyo.ac.jp/" rel="external">Bioinformatics 2009 Mar 1;25(5):599-605.</a>). The data are given in a CSV format text file.
Data file
File name :
gclust_cluster.zip
File URL :
File size :
8.72MB
Simple search URL
http://togodb.biosciencedbc.jp/togodb/view/gclust_cluster#en
Data acquisition method

Sequence data stated in "Amino acid sequences of predicted proteins and their annotation for 95 organism species".

Data analysis method

All-against-all BLASTP search of the above amino acid sequence data, and heuristic estimation of a similarity threshold for homologs of each protein by entropy-optimized organism count method (Bioinformatics 2009 Mar 1;25(5):599-605.).

Number of data entries

206,764 entries

Data detail
Data item Description
Cluster ID

-

Representative sequence ID

-

Link to cluster sequences

-

Link to related sequences

-

Sequence length

-

Representative annotation

-

Number of Sequences

-

Homologs

-

Clustering threshold

-

Plants (7species) (%)

-

Other bikonts (9 species) (%)

-

Cyano (25species) (%)

-

Photo Bact (15species) (%)

-

Other Bact (31 species) (%)

-

Opisthokonts (8species) (%)

-

Number of Sequences for each species

-

Species not appearing in this cluster

-