Published July 13, 2025
| Version 1.1.0
Dataset
Open
WebClasSeg25
Creators
Description
A Dual-Classified Webpage Segmentation Dataset for Functional and Maturity-Based Webpage Analysis
This dataset includes current websites for public services (university, hospital, municipality, courts, tourism and other, insurances). The dataset features polygon and HTML segmentation for both visual and text-based methods. Additionally, it provides a functional and maturity classification of the segments.
The publication of this dataset was presented at SIGIR 2025. Please cite the associated publication if you use the dataset in your research.
Files
README.md
Files
(3.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:1935e8fad720db89ae06ba6e9d06d2d8
|
4.1 MB | Preview Download |
|
md5:597c594fb550c7f91699338a5793de2f
|
1.3 GB | Preview Download |
|
md5:0ee11b9004141a867fd4e5b3b23203da
|
4.7 MB | Preview Download |
|
md5:cc2fd1891c0e8276eb55879671298904
|
4.0 kB | Preview Download |
|
md5:83392439e055333692c36fa1db71741f
|
1.7 GB | Preview Download |
|
md5:66d34350adaa14500eb532ef88a363a8
|
26.3 kB | Preview Download |
Additional details
Related works
- Is described by
- Conference paper: 10.1145/3726302.3730309 (DOI)
Funding
- Swiss National Science Foundation
- Digital Transformation at the Local Tier of Government in Europe: Dynamics and Effects from a Cross-Countries and Over-Time Comparative Perspective (DIGILOG) 200839