[ Japanese | English ]
About This Database

Sequence Classification

Data description
Data name
Sequence Classification
DOI
10.18908/lsdba.nbdc00713-002
Description of data contents
Results of predicting β-barrel membrane proteins or transmembrane helical proteins by applying statistical and machine learning methods to each amino acid sequence in the genomes. Statistical methods are based on amino acid composition, residue pair preference (dipeptide) and motifs (2 amino acid residues with an in-between residue gap). In machine learning methods, the combination of amino acid and dipeptide compositions has been used as main attributes.
Data file
File name :
tmbeta_genome_sequence_classification.zip
File URL :
File size :
177 MB
Simple search URL
http://togodb.biosciencedbc.jp/togodb/view/tmbeta_genome_sequence_classification#en
Data acquisition method

Amino acid sequences were taken from the NCBI database.

Data analysis method

-

Number of data entries

903,989 entries

Data detail
Data item Description
Sequence ID

Sequential serial number assigned to each amino acid sequence.

Sequence Collection ID

Sequential serial number assigned to each genome.

New Approach

Result of predicting transmembrane helical protein using a newly developed method which is performed by the following steps.
Identify the β-barrel membrane proteins using the dipeptide compositions of β-barrel membrane proteins and globular proteins.
Refine the search using the dipeptide compositions of β-barrel membrane proteins and transmembrane helical proteins.
Remove the shorter sequences (proteins with less than 50 amino acid residues).
Eliminate transmembrane helical proteins using SOSUI, a prediction system for transmembrane helical proteins, using the criterion that it identified at least two membrane spanning helical segments.
Exclude globular and transmembrane helical proteins which have > 70% sequence identity and 80% coverage with that deposited in PDB.
Exclude globular and transmembrane helical proteins which have > 80% sequence identity with that deposited in SWISS-PROT database.

SOSUI

Result of predicting transmembrane helical protein using SOSUI.

Amino Acid

Result of predicting β-barrel membrane protein with a statistical method using amino acid composition. (TMBETADISC-COMP)

Dipeptide

Result of predicting β-barrel membrane protein with a statistical method using residue pair preference. (TMBETADISC_DIPEPTIDE)

Motif

Result of predicting β-barrel membrane protein with a statistical method using motifs. (TMBETADISC-MOTIF)

SVM

Result of predicting β-barrel membrane protein with a machine learning method using amino acid composition and residue pair preference. (TMBETA-SVM)

Header

Header line of the amino acid sequence entry in the FASTA file.

Sequence

Amino acid sequence