Preview

Importing the SOTorrent data set

Unzip all CSV and XML files: gunzip *.gz. On macOS, please use the build-in "Archive Utility" instead (see this issue description).
Execute the SQL script 1_create_database.sql in your database client (tested on MySQL 5.7) to create the database and tables for the SO dump.
Edit the SQL script 2_create_sotorrent_user.sql to choose a password for the sotorrent user and execute the script to create the user.
Execute the SQL script 3_load_so_from_xml.sql to import the SO dump from the XML files (please use the XML files provided by us, they are processed to be compatible with MySQL).
Execute the SQL script 4_create_indices.sql to create the indices for the SO tables.
Execute the SQL script 5_create_sotorrent_tables.sql to add the SOTorrent tables to the SO database.
Execute the SQL script 6_load_sotorrent.sql to import the SOTorrent tables from the CSV files.
Execute the SQL script 7_load_postreferencegh.sql to import the references from GitHub projects to Stack Overflow questions, answers, or comments.
Execute the SQL script 8_load_ghmatches.sql to import the matched source code lines with Stack Overflow references from GitHub projects.
Execute the SQL script 9_create_sotorrent_indices.sql to create the indices for the SOTorrent tables.

The Stack Overflow data has been extracted from the official Stack Exchange data dump released 2018-12-02.

The GitHub references have been retrieved from the Google BigQuery GitHub data set on 2018-12-09.