README

This dataset release supports the results of the data pipeline validation and testing in the RADON H2020 Project

Install Java 9

Download and Install JMeter: http://jmeter.apache.org/download_jmeter.cgi

Download the jar files from Link: https://jar-download.com/artifacts/com.amazonaws/aws-java-sdk-s3/1.11.313/source-code

Copy the jar files to the JMeterHome/lib/ext/

Open the test plan "test_plan.jmx" with JMeter

The accesskey and secretkey of the AWS account and region of the s3 bucket needs to be specified in the code.

A Listener view results tree is used in order to check whether the upload is successfully completed.

A CSV config sampler is used in the test plan which uses the "paths.csv" file to get the csv files from the Twiiter_Data dataset folder.

For testing purposes the Twitter Sentiment Analysis dataset (https://www.kaggle.com/arkhoshghalb/twitter-sentiment-analysis-hatred-speech) has been used which has a csv file that contains around 32000 twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets.

The python script filepath.py is used to get the file paths of the csv files stored in the dataset folder. The path to the dataset folder is to be specified in the python script. The script creates a csv file "paths.csv" which contains file paths of all the csv files of the datset which are later used for uploading to the S3 bucket via Jmeter.


 
Copyright (c) 2020, University of Tartu.

Licensed under the Creative Commons Attribution 4.0 license, https://creativecommons.org/licenses/by/4.0/

DATA:

Input Data: Twiiter Dataset used for the data pipeline validation and testing with 100 csv files

Quick start:

Use Twiiter_Data for injecting the csv files as a form of load for validating and testing the data pipeline.  
