The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Kami mencadangkan untuk menemui anggaran kebergantungan fungsi utama (aPFD) untuk jadual web, yang menumpukan pada hubungan penentuan antara atribut utama dan atribut bukan utama dan lebih membantu untuk pengesanan lajur entiti dan penemuan topik pada jadual web. Berdasarkan peraturan persatuan dan teori maklumat, kami mencadangkan metrik Conf and InfoGain untuk menilai PFD. Dengan mengukur kekuatan PFD dan mereka bentuk strategi pemangkasan untuk menghapuskan positif palsu, kaedah kami boleh memilih anggaran PFD bukan remeh minimum dengan berkesan dan boleh diskalakan kepada jadual besar. Keputusan percubaan komprehensif pada set data web sebenar menunjukkan bahawa kaedah kami dengan ketara mengatasi kerja sebelumnya dalam kedua-dua keberkesanan dan kecekapan.
Siyu CHEN
Beijing Jiaotong University
Ning WANG
Beijing Jiaotong University
Mengmeng ZHANG
North China University of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Siyu CHEN, Ning WANG, Mengmeng ZHANG, "Mining Approximate Primary Functional Dependency on Web Tables" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 3, pp. 650-654, March 2019, doi: 10.1587/transinf.2018EDL8130.
Abstract: We propose to discover approximate primary functional dependency (aPFD) for web tables, which focus on the determination relationship between primary attributes and non-primary attributes and are more helpful for entity column detection and topic discovery on web tables. Based on association rules and information theory, we propose metrics Conf and InfoGain to evaluate PFDs. By quantifying PFDs' strength and designing pruning strategies to eliminate false positives, our method could select minimal non-trivial approximate PFD effectively and are scalable to large tables. The comprehensive experimental results on real web datasets show that our method significantly outperforms previous work in both effectiveness and efficiency.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDL8130/_p
Salinan
@ARTICLE{e102-d_3_650,
author={Siyu CHEN, Ning WANG, Mengmeng ZHANG, },
journal={IEICE TRANSACTIONS on Information},
title={Mining Approximate Primary Functional Dependency on Web Tables},
year={2019},
volume={E102-D},
number={3},
pages={650-654},
abstract={We propose to discover approximate primary functional dependency (aPFD) for web tables, which focus on the determination relationship between primary attributes and non-primary attributes and are more helpful for entity column detection and topic discovery on web tables. Based on association rules and information theory, we propose metrics Conf and InfoGain to evaluate PFDs. By quantifying PFDs' strength and designing pruning strategies to eliminate false positives, our method could select minimal non-trivial approximate PFD effectively and are scalable to large tables. The comprehensive experimental results on real web datasets show that our method significantly outperforms previous work in both effectiveness and efficiency.},
keywords={},
doi={10.1587/transinf.2018EDL8130},
ISSN={1745-1361},
month={March},}
Salinan
TY - JOUR
TI - Mining Approximate Primary Functional Dependency on Web Tables
T2 - IEICE TRANSACTIONS on Information
SP - 650
EP - 654
AU - Siyu CHEN
AU - Ning WANG
AU - Mengmeng ZHANG
PY - 2019
DO - 10.1587/transinf.2018EDL8130
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2019
AB - We propose to discover approximate primary functional dependency (aPFD) for web tables, which focus on the determination relationship between primary attributes and non-primary attributes and are more helpful for entity column detection and topic discovery on web tables. Based on association rules and information theory, we propose metrics Conf and InfoGain to evaluate PFDs. By quantifying PFDs' strength and designing pruning strategies to eliminate false positives, our method could select minimal non-trivial approximate PFD effectively and are scalable to large tables. The comprehensive experimental results on real web datasets show that our method significantly outperforms previous work in both effectiveness and efficiency.
ER -