Published February 15, 2020 | Version 1.0
Dataset Open

South African News Data

  • 1. University of Pretoria
  • 2. Council for Scientific and Industrial Research

Description

This repository hosts a valuable collection of South African local language news compiled and enriched by the University of Pretoria's Data Science for Social Impact research group - https://dsfsi.github.io/. Spanning across South Africa's official languages, these localised news datasets aim to support natural language processing and social computing research focused on domestic challenges.

Related Publication

Please cite

@inproceedings{marivate2020investigating, title = {Investigating an Approach for Low Resource Language Dataset Creation, Curation and Classification: Setswana and Sepedi}, author = {Marivate, Vukosi  and Sefara, Tshephisho  and Chabalala, Vongani  and Makhaya, Keamogetswe  and Mokgonyane, Tumisho  and Mokoena, Rethabile  and Modupe, Abiodun}, booktitle = {Proceedings of the first workshop on Resources for African Indigenous Languages}, year = {2020}, address = {Marseille, France}, publisher = {European Language Resources Association (ELRA)}, url = {https://aclanthology.org/2020.rail-1.3}, pages = {15--20}, language = {English}, ISBN = {979-10-95546-60-3}, preprint_url ={https://arxiv.org/abs/2003.04986}, dataset_url = {https://zenodo.org/record/3668495}, keywords = {NLP} }

Datasets

SABC 

We claim no copyright of the SABC original content. 

  • Motsweding FM (An SABC Setswana radio station) Facebook Page'
  • Dikgang Tsa Setswana (SABC Setswana News)
  • Thobela FM (An SABC Sepedi radio station) Facebook Page
  • Ditaba Tsa Sepedi (SABC Sepedi News)

Disclaimer

This dataset contains extracted news content from a different sources including, but not limited to: SABC. While efforts were made to ensure the accuracy and completeness of this data, there may be errors or discrepancies between the original publications and this dataset. No warranties, guarantees or representations are given in relation to the information contained in the dataset. The members of the Data Science for Societal Impact Research Group bear no responsibility and/or liability for any such errors or discrepancies in this dataset. The original owners bear no responsibility and/or liability for any such errors or discrepancies in this dataset. It is recommended that users verify all information contained herein before making decisions based upon this information.

 

Files

sabc_news_headlines_fb_nso_sepedi.csv

Files (1.4 MB)

Name Size Download all
md5:cdf3e8f4a68fc608ac54f6c27818306c
100.6 kB Preview Download
md5:5aeff03136ad84e58129cb59ff629fe0
39.8 kB Preview Download
md5:101f1794c0e1f1c8ae57d7fa8d059164
841.3 kB Preview Download
md5:3780440d5981955ba4534526d843241f
241.0 kB Preview Download
md5:7886e5afff507e018e7a76bb660d2d09
225.3 kB Preview Download

Additional details

Related works

Is supplement to
Conference paper: arXiv:2003.04986 (arXiv)
https://aclanthology.org/2020.rail-1.3/ (URL)