Published January 25, 2022 | Version v1
Software Open

Web scraping and API projects from "Online Data Collection and Management" (Fall 2021)

Description

As part of "Online Data Collection and Management" (taught at Tilburg University, Fall 2021), students collected publicly available datasets for use in academic research projects. With this repository, I am sharing (a) the documentation of these data sets, and (b) associated Python source code that led to the collection of the data. This repository does not contain any of the collected datasets.

The data consists of six data collections:

  • airlinequality.com (airline reviews)
  • untappd.com (craft beer)
  • GitHub.com (search)
  • goodreads.com (reading behavior)
  • Huizenzoeker.nl (housing market in The Netherlands)
  • Reddit API (social media)

Course website: https://odcm.hannesdatta.com. Archived at https://doi.org/10.5281/zenodo.5011458 (check for more recent versions if available).

Files

Airline reviews.zip

Files (1.8 MB)

Name Size Download all
md5:6be771ee5569f77445c7197537e76f68
320.4 kB Preview Download
md5:53951387f3548781e514b3b9fb226d04
479.0 kB Preview Download
md5:ac57d652c1301707237c014d3ee16595
11.3 kB Preview Download
md5:4408c22eac6100cffea5dd3f02fe025d
492.2 kB Preview Download
md5:1b2906940808886f05864feb7c3a5803
390.5 kB Preview Download
md5:e043b3513668e016662c2455ad86afb2
109.9 kB Preview Download