South African News Data
Creators
- 1. University of Pretoria
- 2. Council for Scientific and Industrial Research
Description
This repository hosts a valuable collection of South African local language news compiled and enriched by the University of Pretoria's Data Science for Social Impact research group - https://dsfsi.github.io/. Spanning across South Africa's official languages, these localised news datasets aim to support natural language processing and social computing research focused on domestic challenges.
Related Publication
Please cite
@inproceedings{marivate2020investigating, title = {Investigating an Approach for Low Resource Language Dataset Creation, Curation and Classification: Setswana and Sepedi}, author = {Marivate, Vukosi and Sefara, Tshephisho and Chabalala, Vongani and Makhaya, Keamogetswe and Mokgonyane, Tumisho and Mokoena, Rethabile and Modupe, Abiodun}, booktitle = {Proceedings of the first workshop on Resources for African Indigenous Languages}, year = {2020}, address = {Marseille, France}, publisher = {European Language Resources Association (ELRA)}, url = {https://aclanthology.org/2020.rail-1.3}, pages = {15--20}, language = {English}, ISBN = {979-10-95546-60-3}, preprint_url ={https://arxiv.org/abs/2003.04986}, dataset_url = {https://zenodo.org/record/3668495}, keywords = {NLP} }
Datasets
SABC
We claim no copyright of the SABC original content.
- Motsweding FM (An SABC Setswana radio station) Facebook Page'
- Dikgang Tsa Setswana (SABC Setswana News)
- Thobela FM (An SABC Sepedi radio station) Facebook Page
- Ditaba Tsa Sepedi (SABC Sepedi News)
Disclaimer
This dataset contains extracted news content from a different sources including, but not limited to: SABC. While efforts were made to ensure the accuracy and completeness of this data, there may be errors or discrepancies between the original publications and this dataset. No warranties, guarantees or representations are given in relation to the information contained in the dataset. The members of the Data Science for Societal Impact Research Group bear no responsibility and/or liability for any such errors or discrepancies in this dataset. The original owners bear no responsibility and/or liability for any such errors or discrepancies in this dataset. It is recommended that users verify all information contained herein before making decisions based upon this information.
Files
sabc_news_headlines_fb_nso_sepedi.csv
Files
(1.4 MB)
Name | Size | Download all |
---|---|---|
md5:cdf3e8f4a68fc608ac54f6c27818306c
|
100.6 kB | Preview Download |
md5:5aeff03136ad84e58129cb59ff629fe0
|
39.8 kB | Preview Download |
md5:101f1794c0e1f1c8ae57d7fa8d059164
|
841.3 kB | Preview Download |
md5:3780440d5981955ba4534526d843241f
|
241.0 kB | Preview Download |
md5:7886e5afff507e018e7a76bb660d2d09
|
225.3 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Conference paper: arXiv:2003.04986 (arXiv)
- https://aclanthology.org/2020.rail-1.3/ (URL)