Published November 2, 2025 | Version 1.0
Dataset Open

YouTube Tagging Dataset (2006-2007): 1+ Million Videos from Early YouTube

  • 1. EDMO icon The University of Texas at Austin

Description

This dataset contains metadata and user-generated tags from 1,092,310 YouTube 
videos collected between November 2, 2006 and January 28, 2007, representing 
one of the earliest systematic collections of YouTube user behavior data.

The data was collected during YouTube's first full year of operation, before 
the Google acquisition was finalized and before algorithmic recommendations 
became dominant. It captures organic folksonomy and tagging practices of 
YouTube's early community.

Dataset Statistics:
- 1,092,310 unique videos
- 517,008 unique tags
- 7,530,904 video-tag pairs
- 537,246 unique users
- 87-day collection period

The dataset is provided in multiple formats for accessibility:
- SQLite database (1.1 GB)
- CSV files (603 MB total)
- JSON Lines format (603 MB total)
- Sample JSON files (1,000 records each)

Historical Significance:
This dataset captures a unique moment in social media history when users 
created tags organically without algorithmic suggestion. Analysis showed that 
66% of tags had zero relevance to video titles, descriptions, or authors, 
demonstrating purely user-driven categorization behavior.

Data Collection:
Collected via YouTube's Data API v1 (now deprecated) through systematic 
sampling. The collection methodology and findings were published in peer-
reviewed research (see Related Identifiers).

This dataset is valuable for research in:
- Information Science (folksonomy, user-generated metadata)
- Social Computing (early social media practices)
- Digital History (internet culture, YouTube's formative period)
- Computational Linguistics (natural language use in tags)
- Information Retrieval (tag-based search and discovery)

For complete documentation, schema details, and example queries, see 
DATA_DICTIONARY.md and README.md included in the archive.

Files

Files (611.4 MB)

Name Size Download all
md5:62050d02638eec8c71cc41bceb2e44db
611.4 MB Download

Additional details

Related works

Is cited by
Poster: 10.1145/1255175.1255279 (DOI)

Dates

Collected
2006-11-02
Data collection period via YouTube Data API v1
Collected
2007-01-28
Data collection period via YouTube Data API v1

References

  • Geisler, G., & Burns, S. (2008). Tagging Video: Conventions and Strategies of the YouTube Community. Bulletin of IEEE Technical Committee on Digital Libraries (TCDL) 4(1).