Source Code for Youtube dataset processing

doi:10.5281/zenodo.5825530

Published January 6, 2022 | Version v1

Software Open

Source Code for Youtube dataset processing

Nicolas TURENNE¹

1. INRAE

The file CodeSource.rar contain an archive with main file that
main process the Youtube corpus about net-activism and whistleblowing
see https://doi.org/10.5281/zenodo.5824627

Turenne N Net activism and whistleblowing on YouTube: a text mining analysis (2022).

the file _pipeline.txt describe different steps :
- download,
- storage in a Mongo database,
- splitting id(s),
- filtering collection,
- creationg of collection with only text and sentences,
- linguistic feature extraction ,
- features extraction,
- clustering

the sub-directory called ExtractYoutube is java code for getting transcription from id

To run source code require, hadoop server and lots of libraries (see _description file)

Notes

Turenne N Net activism and whistleblowing on YouTube: a text mining analysis (2022).

Files

_description.txt

Files (31.6 MB)

Name	Size	Download all
_description.txt md5:be68ffd1cf110becc3c2f2cc6f421f51	1.9 kB	Preview Download
_pipeline.txt md5:b5d54dcfed4c9130c4af952be5d8da58	10.2 kB	Preview Download
CodeSource.rar md5:251bd6d184738752dda0b9f97742e39f	31.5 MB	Download

	All versions	This version
Views	35	35
Downloads	23	23
Data volume	157.9 MB	157.9 MB

Source Code for Youtube dataset processing

Creators

Description

Notes

Files

_description.txt

Files (31.6 MB)