PGDBj - Ortholog DB

2016/07/29

Web Site: http://pgdbj.jp/pages/index.html?dir=&page=od&ln=en
FTP Site: ftp://ftp.biosciencedbc.jp/archive/pgdbj-ortholog-db/

The database of orthologous relationships of genes that are computationally determined according to similarities between amino acid sequences.

README Content

  1. Database Component
  2. Data Description
  3. License
  4. Update History
  5. Literature
  6. Contact address

1. Database Component

  1. README
  2. Protein (Viridiplantae)
  3. Cluster (Viridiplantae)
  4. Taxon (Viridiplantae)
  5. Protein (Cyanobacteria)
  6. Cluster (Cyanobacteria)
  7. Taxon (Cyanobacteria)
Return to Top

2. Data Description

2.1 README

Data name README
Description of data contents HTML file to describe "PGDBj - Ortholog DB" data.
File README_e.html(English)
Return to Top

2.2 Protein (Viridiplantae)

Data name Protein (Viridiplantae)
Description of data contents Amino acid sequences of Viridiplantae (green plants) obtained from the NCBI Reference Sequence Database with the NCBI GI numbers, the Reference Sequence IDs, and annotations. The IDs of clusters that the amino acid sequences belong to in each taxon are indicated.
File pgdbj_ortholog_db_viridiplantae_protein.zip (85 MB)

Data items are the following:
Data itemDescription
GI number NCBI GI number of Amino Acid sequence
RefSeq ID NCBI Reference Sequence ID
Cluster (Kingdom) Cluster ID (rank: Kingdom)
Cluster (Phylum) Cluster ID (rank: Phylum)
Cluster (No rank 1) Cluster ID (rank: No rank 1)
Cluster (No rank 2) Cluster ID (rank: No rank 2)
Cluster (No rank 3) Cluster ID (rank: No rank 3)
Cluster (No rank 4) Cluster ID (rank: No rank 4)
Cluster (No rank 5) Cluster ID (rank: No rank 5)
Cluster (No rank 6) Cluster ID (rank: No rank 6)
Cluster (No rank 7) Cluster ID (rank: No rank 7)
Cluster (No rank 8) Cluster ID (rank: No rank 8)
Cluster (Class) Cluster ID (rank: Class)
Cluster (Subclass) Cluster ID (rank: Subclass)
Cluster (No rank 9) Cluster ID (rank: No rank 9)
Cluster (Order) Cluster ID (rank: Order)
Cluster (Family) Cluster ID (rank: Family)
Cluster (No rank 10) Cluster ID (rank: No rank 10)
Cluster (Subfamily) Cluster ID (rank: Subfamily)
Cluster (Tribe) Cluster ID (rank: Tribe)
Cluster (Genus) Cluster ID (rank: Genus)
Cluster (Subgenus) Cluster ID (rank: Subgenus)
Cluster (Species) Cluster ID (rank: Species)
Cluster (Subspecies) Cluster ID (rank: Subspecies)
Cluster (Forma) Cluster ID (rank: Forma)
Cluster (No rank 11) Cluster ID (rank: No rank 11)
Annotation Annotation of protein
Organism Organism
AA sequence Amino acid sequence
Return to Top

2.3 Cluster (Viridiplantae)

Data name Cluster (Viridiplantae)
Description of data contents Clusters of amino acid sequences of Viridiplantae (green plants) obtained from the NCBI Reference Sequence Database. Along a phylogenetic tree, clusters were generated in Viridiplantae taxon and in each sub-taxon of Viridiplantae by using the results of all-against-all BLAST searches among the amino acid sequences. An amino acid sequence belongs to only one cluster in a taxon.
File pgdbj_ortholog_db_viridiplantae_cluster.zip (15.6 MB)

Data items are the following:
Data itemDescription
Cluster ID The cluster ID is composed of a Taxonomy ID and a serial number beginning with “0”. For instance, “cluster ID: 33090.0” means the protein belongs to the cluster ranked 0th among the clusters in the “taxon: 33090”. This cluster ID is uniquely-assigned by the PGDBj Ortholog Database.
Explanatory note The explanatory note for an ortholog cluster is selected from the annotations of each amino acid sequence in the ortholog cluster. Frequency of the words included in the annotations of each amino acid sequence that belongs to the ortholog cluster was calculated. Then, a most suitable annotation, which contains high-frequency words most, was selected as the explanatory note for the ortholog cluster.
Cluster size Number of proteins affiliated with the Cluster
Supercluster Next supercluster
Subcluster Next subcluster
Return to Top

2.4 Taxon (Viridiplantae)

Data name Taxon (Viridiplantae)
Description of data contents Phylogenetic relationships among the recursively generated clusters in "Cluster (Viridiplantae)."
File pgdbj_ortholog_db_viridiplantae_taxon.zip (2.3 KB)

Data items are the following:
Data itemDescription
Taxonomy name NCBI Taxonomy name
Taxonomy ID NCBI Taxonomy ID
Taxonomy rank NCBI Taxonomy rank
Number of clusters Number of clusters affiliated with the Taxon
Number of proteins Number of proteins affiliated with the Taxon
Higher taxon Next suprageneric taxon
Lower taxon Next infrageneric taxon
Return to Top

2.5 Protein (Cyanobacteria)

Data name Protein (Cyanobacteria)
Description of data contents Amino acid sequences of Cyanobacteria (blue-green algae) obtained from the NCBI Reference Sequence Database with the NCBI GI numbers, the Reference Sequence IDs, and annotations. The IDs of clusters that the amino acid sequences belong to in each taxon are indicated.
File pgdbj_ortholog_db_cyanobacteria_protein.zip (60 MB)

Data items are the following:
Data itemDescription
GI number NCBI GI number of Amino Acid sequence
RefSeq ID NCBI Reference Sequence ID
Cluster (Phylum) Cluster ID (rank: Phylum)
Cluster (Class) Cluster ID (rank: Class)
Cluster (Order) Cluster ID (rank: Order)
Cluster (No rank 1) Cluster ID (rank: No rank 1)
Cluster (Family) Cluster ID (rank: Family)
Cluster (Genus) Cluster ID (rank: Genus)
Cluster (No rank 2) Cluster ID (rank: No rank 2)
Cluster (Species) Cluster ID (rank: Species)
Cluster (No rank 3) Cluster ID (rank: No rank 3)
Cluster (Subspecies) Cluster ID (rank: Subspecies)
Cluster (No rank 4) Cluster ID (rank: No rank 4)
Annotation Annotation of protein
Organism Organism
AA sequence Amino acid sequence
Return to Top

2.6 Cluster (Cyanobacteria)

Data name Cluster (Cyanobacteria)
Description of data contents Clusters of amino acid sequences of Cyanobacteria (blue-green algae) obtained from the NCBI Reference Sequence Database. Along a phylogenetic tree, clusters were generated in Cyanobacteria taxon and in each sub-taxon of Cyanobacteria by using the results of all-against-all BLAST searches among the amino acid sequences. An amino acid sequence belongs to only one cluster in a taxon.
File pgdbj_ortholog_db_cyanobacteria_cluster.zip (9.6 MB)

Data items are the following:
Data itemDescription
Cluster ID The cluster ID is composed of a Taxonomy ID and a serial number beginning with “0”. For instance, “cluster ID: 33090.0” means the protein belongs to the cluster ranked 0th among the clusters in the “taxon: 33090”. This cluster ID is uniquely-assigned by the PGDBj Ortholog Database.
Explanatory note The explanatory note for an ortholog cluster is selected from the annotations of each amino acid sequence in the ortholog cluster. Frequency of the words included in the annotations of each amino acid sequence that belongs to the ortholog cluster was calculated. Then, a most suitable annotation, which contains high-frequency words most, was selected as the explanatory note for the ortholog cluster.
Cluster size Number of proteins affiliated with the Cluster
Supercluster Next supercluster
Subcluster Next subcluster
Return to Top

2.7 Taxon (Cyanobacteria)

Data name Taxon (Cyanobacteria)
Description of data contents Phylogenetic relationships among the recursively generated clusters in "Cluster (Cyanobacteria)."
File pgdbj_ortholog_db_cyanobacteria_taxon.zip (4.3 KB)

Data items are the following:
Data itemDescription
Taxonomy name NCBI Taxonomy name
Taxonomy ID NCBI Taxonomy ID
Taxonomy rank NCBI Taxonomy rank
Number of clusters Number of clusters affiliated with the Taxon
Number of proteins Number of proteins affiliated with the Taxon
Higher taxon Next suprageneric taxon
Lower taxon Next infrageneric taxon
Return to Top

3. License

Last updated : 2014/04/04

You may use this database in compliance with the terms and conditions of the license described below. The license specifies the license terms regarding the use of this database and the requirements you must follow in using this database.

Creative Commons License

The license for this database is specified in the Creative Commons Attribution-Share Alike 2.1 Japan.
If you use data from this database, please be sure attribute this database as follows: "PGDBj - Ortholog DB c Akihiro Nakaya (Osaka University) licensed under CC Attribution-Share Alike 2.1 Japan".

The summary of the Creative Commons Attribution-Share Alike 2.1 Japan is found here.

With regard to this database, you are licensed to:

  1. freely access part or whole of this database, and acquire data;
  2. freely redistribute part or whole of the data from this database; and
  3. freely create and distribute database and other derivative works based on part or whole of the data from this database,

under the license, as long as you comply with the following conditions:

  1. You must attribute this database in the manner specified by the author or licensor when distributing part or whole of this database or any derivative work.
  2. You must distribute any derivative work based on part or whole of the data from this database under the license.
  3. You need to contact the Licensor shown below to request a license for use of this database or any part thereof not licensed under the license.

Department of Genome Informatics, Graduate School of Medicine, Osaka University
2-2 Yamadaoka, Suita, Osaka 565-0871, JAPAN
E-mail: pgdbj[at]kazusa[dot]or[dot]jp / nakaya[at]gi[dot]med[dot]osaka-u[dot]ac[dot]jp

Return to Top

4. Update History

DateUpdate contents
2016/07/29 Database Description page is updated.
  • The URL of the Whole data download
  • The URL of The original website information
2014/05/12 PGDBj Ortholog DB English archive site is opened.
2012/08/01 PGDBj Ortholog DB (http://pgdbj.jp/ortholog-db.html) is opened.
Return to Top

5. Literature

-

6. Contact address

When you have any question about "PGDBj - Ortholog DB", contact the following:

Department of Genome Informatics, Graduate School of Medicine, Osaka University
2-2 Yamadaoka, Suita, Osaka 565-0871, JAPAN
E-mail: pgdbj[at]kazusa[dot]or[dot]jp / nakaya[at]gi[dot]med[dot]osaka-u[dot]ac[dot]jp

Return to Top