Abbreviations:
----------------
P: Phase
S: Session

In total, we describe 3 phases that were conducted in 5 sessions (P1S1, P1S2, P2S1, P2S2 and P3) 

General Files:
--------------
RawInputDataset: The set of initial 1,000 reviews used as input for phase 1. 
	- Contains additional tabs with analysis for average star ratings and dates of posting over time.  

Files for each phase:
---------------------
Golden: The golden standard created by the researcher to compare the answers of the crowd with.

Aggregated: Results with the answers already aggregated by Figure Eight using majority voting. The file does not contain individual judgments, but shows the category that was chosen the most or had the highest confidence rating. 
	- unit_id: generated ID for each uploaded review.
	- golden: true if the review was used as a test question for quality control measures.
	- unit_state: indicates whether all required judgments were gathered.
	- trusted_judgments: number of judgments it received that were made by crowd workers that passed all quality control measures.
	- last_judgment_at: time when the last judgments was received to complete the classification of a particular review. 

RawOutput: Complete file that contains information about each individual judgment that was received.
	- unit_id: ID provided for each row in the file. 
	- created_at: the time when the contributor submitted the judgment 
	- golden: true if review was used as a test question for quality control measures.
	- id: unique ID number for each specific judgment. 
	- missed: true if an incorrect judgment was received on a test question. 
	- started_at: time when the contributor started working on the judgment.
	- tainted: true if the judgments was contributed by a crowd worker that failed the quality control measures past the eligibility test. 
	- channel: the work channel that the contributor accessed the job through. 
	- trust: expected level of accuracy based on past performances. Value assigned by the Figure Eight platform itself. 
	- worker_id: randomly generated ID that replaces the Figure Eight worker identifier. The replacement was done to preserve privacy.
	- country: country of origin of the crowd worker who submitted the contribution. 
	- region: region of origin of the crowd worker that submitted the contribution. 
	- city:cCity of origin of the crowd worker that submitted the contribution. 
	- which_category_best_fits_this_review: Chosen answer by the crowd worker.
	- reviews, which_category_best_fits_this_review__gold: shows the correct answer if the particular row was used as a test question. 

TestQResults: Results of the questions used in the eligibility test.
	- id: unique ID number assigned by Figure Eight to each test question.
	- pct_missed: percentage of responses that were incorrect.
	- judgments: total number of judgments this question received.
	- hidden: true if the test question was disabled.
	- contention: contentions received by contributors who felt that the question was unfair or set up incorrectly.
	- pct_contested: percentage of contributors who both answered incorrectly and contested. 
	- gold_pool: can be ignored, only relevant for a different type of eligibility test that we did not use.
	- review_state: can be ignored
	- which_category_best_fits_this_review: can be ignored
	- which_category_best_fits_this_review__gold: the correct answer.
	- which_category_best_fits_this_review__gold_reason: additional explanation of the answer. Shown when question was answered incorrectly. 
	- reviews: review used in the test question. 

AnswerComparison: Where the answers of the crowd are compared to the golden standard for both level of strictness.
	- ID: number of the review in the golden standard.
	- Crowd: judgment of the crowd based on the aggregated answers from the output of Figure Eight. (1= helpful, 0 = useless)
	- Reviews: the review in question.
	- Golden: the judgments of the review by the researcher (from the golden standard)
	- Match: whether the answer between the crowd and the golden standard matched or not. 
	- Correction: which of the two parties had the correct answer based on the more lenient way of comparison. (only applied when the judgments between the two parties did not match)
	- Results: shows all the calculations for the precision, recall and accuracy values for both levels of strictness)