|
Data description
|
Data name
|
Sequence Classification
|
DOI
|
10.18908/lsdba.nbdc00713-002
|
Description of data contents
|
Results of predicting β-barrel membrane proteins or transmembrane helical proteins by applying statistical and machine learning methods to each amino acid sequence in the genomes. Statistical methods are based on amino acid composition, residue pair preference (dipeptide) and motifs (2 amino acid residues with an in-between residue gap). In machine learning methods, the combination of amino acid and dipeptide compositions has been used as main attributes.
|
Data file
|
File name :
tmbeta_genome_sequence_classification.zip
File URL :
File size :
177 MB
|
Simple search URL
|
http://togodb.biosciencedbc.jp/togodb/view/tmbeta_genome_sequence_classification#en
|
Data acquisition method
|
Amino acid sequences were taken from the NCBI database.
|
Data analysis method
|
-
|
Number of data entries
|
903,989 entries
|
|
Data detail
|
|
Data item
|
Description
|
| Sequence ID |
Sequential serial number assigned to each amino acid sequence. |
| Sequence Collection ID |
Sequential serial number assigned to each genome. |
| New Approach |
Result of predicting transmembrane helical protein using a newly developed method which is performed by the following steps.
Identify the β-barrel membrane proteins using the dipeptide compositions of β-barrel membrane proteins and globular proteins.
Refine the search using the dipeptide compositions of β-barrel membrane proteins and transmembrane helical proteins.
Remove the shorter sequences (proteins with less than 50 amino acid residues).
Eliminate transmembrane helical proteins using SOSUI, a prediction system for transmembrane helical proteins, using the criterion that it identified at least two membrane spanning helical segments.
Exclude globular and transmembrane helical proteins which have > 70% sequence identity and 80% coverage with that deposited in PDB.
Exclude globular and transmembrane helical proteins which have > 80% sequence identity with that deposited in SWISS-PROT database. |
| SOSUI |
Result of predicting transmembrane helical protein using SOSUI. |
| Amino Acid |
Result of predicting β-barrel membrane protein with a statistical method using amino acid composition. (TMBETADISC-COMP) |
| Dipeptide |
Result of predicting β-barrel membrane protein with a statistical method using residue pair preference. (TMBETADISC_DIPEPTIDE) |
| Motif |
Result of predicting β-barrel membrane protein with a statistical method using motifs. (TMBETADISC-MOTIF) |
| SVM |
Result of predicting β-barrel membrane protein with a machine learning method using amino acid composition and residue pair preference. (TMBETA-SVM) |
| Header |
Header line of the amino acid sequence entry in the FASTA file. |
| Sequence |
Amino acid sequence |
|