A deep web data extraction model for web mining: a review

Sabri, Ily Amalina Ahmad; Man, Mustafa

doi:10.11591/ijeecs.v23.i1.pp519-528

Published July 1, 2021 | Version v1

Journal article Open

A deep web data extraction model for web mining: a review

1. Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu, Kuala Nerus, Terengganu, Malaysia

The world wide web has become a large pool of information. Extracting structured data from a published webpages has drawn attention in the last decade. The process of web data extraction (WDE) has many challenges, due to variety of web data and the unstructured data from hypertext markup language (HTML) files. The aim of this paper is to provide a comprehensive overview of current web data extraction techniques, in terms of extracted quality data. This paper focuses on study for data extraction using wrapper approaches and compares each other to identify the best approach to extract data from online sites. To observe the efficiency of the proposed model, we compare the performance of data extraction by single web page extraction with different models such as document object model (DOM), wrapper using hybrid dom and json (WHDJ), wrapper extraction of image using DOM and JSON (WEIDJ) and WEIDJ (no-rules). Finally, the experimentations proved that WEIDJ can extract data fastest and low time consuming compared to other proposed method

Files

56 25157.pdf

Files (521.4 kB)

Name	Size	Download all
56 25157.pdf md5:47d5dd92c23feb4da05c03f2c365b8c2	521.4 kB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	36	36
Downloads	29	29
Data volume	15.1 MB	15.1 MB

More info on how stats are collected....

DOI

Resource type

Journal article

Publisher

Zenodo

Published in

Indonesian Journal of Electrical Engineering and Computer Science, 23(1), 519-528, 2021.

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: September 21, 2022
Modified: July 16, 2024

A deep web data extraction model for web mining: a review

Authors/Creators

Description

Files

56 25157.pdf

Files (521.4 kB)