YouTube Tagging Dataset (2006-2007): 1+ Million Videos from Early YouTube
Description
This dataset contains metadata and user-generated tags from 1,092,310 YouTube
videos collected between November 2, 2006 and January 28, 2007, representing
one of the earliest systematic collections of YouTube user behavior data.
The data was collected during YouTube's first full year of operation, before
the Google acquisition was finalized and before algorithmic recommendations
became dominant. It captures organic folksonomy and tagging practices of
YouTube's early community.
Dataset Statistics:
- 1,092,310 unique videos
- 517,008 unique tags
- 7,530,904 video-tag pairs
- 537,246 unique users
- 87-day collection period
The dataset is provided in multiple formats for accessibility:
- SQLite database (1.1 GB)
- CSV files (603 MB total)
- JSON Lines format (603 MB total)
- Sample JSON files (1,000 records each)
Historical Significance:
This dataset captures a unique moment in social media history when users
created tags organically without algorithmic suggestion. Analysis showed that
66% of tags had zero relevance to video titles, descriptions, or authors,
demonstrating purely user-driven categorization behavior.
Data Collection:
Collected via YouTube's Data API v1 (now deprecated) through systematic
sampling. The collection methodology and findings were published in peer-
reviewed research (see Related Identifiers).
This dataset is valuable for research in:
- Information Science (folksonomy, user-generated metadata)
- Social Computing (early social media practices)
- Digital History (internet culture, YouTube's formative period)
- Computational Linguistics (natural language use in tags)
- Information Retrieval (tag-based search and discovery)
For complete documentation, schema details, and example queries, see
DATA_DICTIONARY.md and README.md included in the archive.
Files
Files
(611.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:62050d02638eec8c71cc41bceb2e44db
|
611.4 MB | Download |
Additional details
Identifiers
Related works
- Is cited by
- Poster: 10.1145/1255175.1255279 (DOI)
Dates
- Collected
-
2006-11-02Data collection period via YouTube Data API v1
- Collected
-
2007-01-28Data collection period via YouTube Data API v1
References
- Geisler, G., & Burns, S. (2008). Tagging Video: Conventions and Strategies of the YouTube Community. Bulletin of IEEE Technical Committee on Digital Libraries (TCDL) 4(1).