|
Data description
|
Data name
|
MeCab user dictionary: Nikkaji (Japan Chemical Substance Dictionary)
|
DOI
|
10.18908/lsdba.nbdc02358-003.V002
|
Description of data contents
|
A user dictionary for morphological analysis engine MeCab(<a href="http://taku910.github.io/mecab/" target="_blank">http://taku910.github.io/mecab/</a>) from J-GLOBAL science and technology terms that have linked to Japan Chemical Substance Dictionary (Nikkaji), an organic compound dictionary database prepared by the Japan Science and Technology Agency. The dictionary items are based on IPA dictionary. Csv file is encoded in Shift-JIS and dic file is encoded in UTF-8.
|
Data file
|
File name :
Nikkaji.dic.zip (MeCab dic format)
File URL :
File size :
6.6 MB
|
Simple search URL
|
http://togodb.biosciencedbc.jp/togodb/view/mecab_nikkaji#en
|
Data acquisition method
|
IPA dictionary (mecab-ipadic-2.7.0-20070801 downloaded from MeCab's site [see above])、J-GLOBAL Knowledge
|
Data analysis method
|
-
|
Number of data entries
|
82,922 entries
|
|
Data detail
|
|
Data item
|
Description
|
| Surface form |
The word itself |
| Left-context ID |
MeCab internal ID for left context (see http://taku910.github.io/mecab/dic.html) |
| Right-context ID |
MeCab internal ID for right context (see http://taku910.github.io/mecab/dic.html) |
| Cost |
The cost for the likelihood of the word to appear in a sentence (smaller, more likely) |
| POS |
Part of speech |
| POS subcategory 1 |
POS subcategory 1 |
| POS subcategory 2 |
POS subcategory 2 |
| POS subcategory 3 |
POS subcategory 3 |
| Conjugation type |
Conjugation type |
| Conjugation form |
Conjugation form |
| Base form |
Same as the surface form |
| Reading('Furigana') |
(empty) |
| Pronunciation |
(empty) |
| Source dictionary |
It is fixed as 'Nikkaji'. |
| ID in Source dictionary |
MeSH UID |
| J-GLOBAL ID |
ID in J-GLOBAL |
| Headword Flag |
It is fixed as 'C'. |
| Category code |
Category code of science fields in JST Thesaurus |
| Common word flag 1 |
・1: There is an entry (or entries) for the surface form in IPA dictionary・0: There are no entries for the surface in IPA dictionary |
| Common word flag 2 |
Based on "IPA dictionary analysis results":・When the value of Common word flag 1 is 1, the value of this flag is the part of speech for the IPA dictionary analysis result.・When the value of Common word flag 1 is 0:- UNKNOWN_1: if the result is one unknown word- UNKNOWN_2: if the result is multiple tokens including unknown word- MULTI_WORD: if the result is multiple tokens in IPA dictionary |
| IPA dictionary analysis results |
Results of the morphological analysis with the original IPA dictionary (and the dictionary with IPA dictionary entries where zenkaku alphanumeric characters and symbols are converted into corresponding hankaku characters). If the result is devided into multiple tokens, it is whitespace-separated. It is not manually corrected. |
|