Published February 14, 2024 | Version v1
Dataset Open

MAdVerse: A Hierarchical Dataset of Multi-Lingual Ads from Diverse Sources and Categories

  • 1. ROR icon International Institute of Information Technology, Hyderabad

Description

MAdVerse, an extensive, multilingual compilation of more than 50,000 ads from the web,
social media websites, and e-newspapers. Advertisements are hierarchically grouped with uniform granularity into 11 categories, divided into 51 sub-categories, and 524 finegrained brands at leaf level, each featuring ads in various languages.

Files

dataset_readme.md

Files (18.2 GB)

Name Size Download all
md5:8b332d117372e33de81f2c1f44ed49a1
615.4 kB Preview Download
md5:35a199c4af815a9d8810cb3e3e0f575f
925.8 MB Preview Download
md5:0c4a43d8b3c765a71ac19c40638ffa67
6.8 kB Preview Download
md5:8a0552ed62492bafa0b242c23fbc51f4
3.2 GB Preview Download
md5:05e8afacbc7e57a68079c115438bdc03
2.7 MB Preview Download
md5:5f5e65817e93c3c2894715aa0c43f366
9.4 GB Preview Download
md5:e625c7a6211a182c5fe0377e740b383f
2.2 MB Preview Download
md5:a10e9dfd7bd376863ba53bad8db6c2cc
4.7 GB Preview Download
md5:f1ff01173be926bfe032c5398ff9441a
7.3 MB Preview Download

Additional details

Software

Repository URL
https://github.com/Amruth-sagar/MAdVerse
Programming language
Python