Published November 27, 2025 | Version v1
Dataset Open

20 Newsgroups (5 Topics) — PII-Augmented version

Authors/Creators

Description

Description

This dataset is a curated subset of the 20 Newsgroups corpus, containing 5 clearly distinguishable topics for experimentation with intelligent text anonymization and topic classification

It was created as part of the Bachelor’s thesis “Intelligent anonymization for natural language processing and inference” at FIIT STU, 2025

 

Files

20 Newsgroups (5 Topics) — PII-Augmented version.pdf

Files (58.0 kB)