Published February 17, 2024
| Version v1
Dataset
Open
AdFlush: A Real-World Deployable Machine Learning Solution for Effective Advertisement and Web Tracker Prevention
Creators
Description
The dataset of AdFlush: A Real-World Deployable Machine Learning Solution for Effective Advertisement and Web Tracker Prevention, accepted to the Web Conference 2024, Singapore.
Abstract:
Ad blocking and web tracking prevention tools are widely used, but traditional filter list-based methods struggle to cope with web content manipulation. Machine learning-based approaches have been proposed to address these limitations, but they have primarily focused on improving detection accuracy at the expense of practical considerations such as deployment overhead. In this paper, we present *AdFlush*, a lightweight machine learning model for ad blocking and web tracking prevention that is practically designed for the Chrome browser. To develop *AdFlush*, we first evaluated the effectiveness of 883 features, including 350 existing and 533 new features, and ultimately identified 27 key features that achieve optimal detection performance. We then evaluated *AdFlush* using a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, which outperforms state-of-the-art models such as AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84). Importantly, *AdFlush* also exhibits a significantly reduced computational footprint, requiring 56% less CPU and 80% less memory than AdGraph. We also evaluated the robustness of *AdFlush* against adversarial manipulation, such as URL manipulation and JavaScript obfuscation. Our experimental results show that *AdFlush* exhibits superior robustness with F1 scores of 0.89–0.98, outperforming AdGraph and WebGraph, which achieved F1 scores of 0.81–0.87 against adversarial samples. To demonstrate the real-world applicability of *AdFlush*, we have implemented it as a Chrome browser extension and made it publicly available. We also conducted a six-month longitudinal study, which showed that *AdFlush* maintained a high F1 score above 0.97 without retraining, demonstrating its effectiveness. Additionally, *AdFlush* detected 642 URLs across 108 domains that were missed by commercial filter lists, which we reported to filter list providers.
Files
AdFlush_test.csv
Files
(5.3 GB)
Name | Size | Download all |
---|---|---|
md5:fd806d6d16fe5affbea436dfeba499a5
|
35.6 MB | Preview Download |
md5:57db841f85c4ccb5095b97c415c69812
|
142.6 MB | Preview Download |
md5:feaf036ef797d08625be53ac5320ea08
|
1.0 GB | Preview Download |
md5:b2636b7fc6c5c7ae4abc64873743497d
|
4.0 GB | Preview Download |
md5:30b5208f15912245546170bbbdabb00d
|
12.4 MB | Preview Download |
md5:29d89710efaad705f64ce80aadbe0f8a
|
7.9 MB | Preview Download |
md5:08a1b317a2480ba300a63d61ab23de8a
|
10.5 MB | Preview Download |
md5:e786d9f381a1e20706a55468cdd30983
|
53.8 MB | Preview Download |
md5:710d6e78c4ebc05d5793cec2edbfc310
|
56.3 MB | Preview Download |
md5:f3df2ee9d6f91e7a02c2db3db4e582ea
|
54.9 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/SKKU-SecLab/AdFlush
- Programming language
- Python, JavaScript
- Development Status
- Active