Published July 13, 2025 | Version 1.1.0
Dataset Open

WebClasSeg25

Description

A Dual-Classified Webpage Segmentation Dataset for Functional and Maturity-Based Webpage Analysis

This dataset includes current websites for public services (university, hospital, municipality, courts, tourism and other, insurances). The dataset features polygon and HTML segmentation for both visual and text-based methods. Additionally, it provides a functional and maturity classification of the segments. 

The publication of this dataset was presented at SIGIR 2025. Please cite the associated publication if you use the dataset in your research.

Files

README.md

Files (3.0 GB)

Name Size Download all
md5:1935e8fad720db89ae06ba6e9d06d2d8
4.1 MB Preview Download
md5:597c594fb550c7f91699338a5793de2f
1.3 GB Preview Download
md5:0ee11b9004141a867fd4e5b3b23203da
4.7 MB Preview Download
md5:cc2fd1891c0e8276eb55879671298904
4.0 kB Preview Download
md5:83392439e055333692c36fa1db71741f
1.7 GB Preview Download
md5:66d34350adaa14500eb532ef88a363a8
26.3 kB Preview Download

Additional details

Related works

Is described by
Conference paper: 10.1145/3726302.3730309 (DOI)

Funding

Swiss National Science Foundation
Digital Transformation at the Local Tier of Government in Europe: Dynamics and Effects from a Cross-Countries and Over-Time Comparative Perspective (DIGILOG) 200839