The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Kod bahasa Asia 16-bit tidak boleh dimampatkan dengan baik oleh skema pemampatan teks pensampelan 8-bit konvensional. Sebelum ini, kami melaporkan penggunaan kaedah pemampatan teks berasaskan perkataan yang menggunakan pensampelan 16-bit untuk pemampatan teks Jepun. Kertas kerja ini menerangkan usaha kami selanjutnya dalam menggunakan kaedah berasaskan perkataan dengan pengekod Huffman kanonik statik kepada kedua-dua teks Jepun dan Cina. Kaedah ini dicadangkan untuk menyokong persekitaran berbilang bahasa, kerana kami menggantikan kamus perkataan dan jadual kod Huffman kanonik untuk bahasa masing-masing dengan sewajarnya. Simulasi komputer menunjukkan bahawa kaedah ini berkesan untuk kedua-dua bahasa. Nisbah mampatan yang diperoleh adalah kurang sedikit daripada 0.5 tanpa berkenaan dengan konteks Markov, dan sekitar 0.4 apabila mengambil kira konteks Markov tertib pertama.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Shigeru YOSHIDA, Takashi MORIHARA, Hironori YAHAGI, Noriko ITANI, "Application of a Word-Based Text Compression Method to Japanese and Chinese Texts" in IEICE TRANSACTIONS on Fundamentals,
vol. E85-A, no. 12, pp. 2933-2938, December 2002, doi: .
Abstract: 16-bit Asian language codes can not be compressed well by conventional 8-bit sampling text compression schemes. Previously, we reported the application of a word-based text compression method that uses 16-bit sampling for the compression of Japanese texts. This paper describes our further efforts in applying a word-based method with a static canonical Huffman encoder to both Japanese and Chinese texts. The method was proposed to support a multilingual environment, as we replaced the word-dictionary and the canonical Huffman code table for the respective language appropriately. A computer simulation showed that this method is effective for both languages. The obtained compression ratio was a little less than 0.5 without regarding the Markov context, and around 0.4 when accounting for the first order Markov context.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e85-a_12_2933/_p
Salinan
@ARTICLE{e85-a_12_2933,
author={Shigeru YOSHIDA, Takashi MORIHARA, Hironori YAHAGI, Noriko ITANI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Application of a Word-Based Text Compression Method to Japanese and Chinese Texts},
year={2002},
volume={E85-A},
number={12},
pages={2933-2938},
abstract={16-bit Asian language codes can not be compressed well by conventional 8-bit sampling text compression schemes. Previously, we reported the application of a word-based text compression method that uses 16-bit sampling for the compression of Japanese texts. This paper describes our further efforts in applying a word-based method with a static canonical Huffman encoder to both Japanese and Chinese texts. The method was proposed to support a multilingual environment, as we replaced the word-dictionary and the canonical Huffman code table for the respective language appropriately. A computer simulation showed that this method is effective for both languages. The obtained compression ratio was a little less than 0.5 without regarding the Markov context, and around 0.4 when accounting for the first order Markov context.},
keywords={},
doi={},
ISSN={},
month={December},}
Salinan
TY - JOUR
TI - Application of a Word-Based Text Compression Method to Japanese and Chinese Texts
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 2933
EP - 2938
AU - Shigeru YOSHIDA
AU - Takashi MORIHARA
AU - Hironori YAHAGI
AU - Noriko ITANI
PY - 2002
DO -
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E85-A
IS - 12
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - December 2002
AB - 16-bit Asian language codes can not be compressed well by conventional 8-bit sampling text compression schemes. Previously, we reported the application of a word-based text compression method that uses 16-bit sampling for the compression of Japanese texts. This paper describes our further efforts in applying a word-based method with a static canonical Huffman encoder to both Japanese and Chinese texts. The method was proposed to support a multilingual environment, as we replaced the word-dictionary and the canonical Huffman code table for the respective language appropriately. A computer simulation showed that this method is effective for both languages. The obtained compression ratio was a little less than 0.5 without regarding the Markov context, and around 0.4 when accounting for the first order Markov context.
ER -