Dataset - High-resolution mapping of wood burning appliance hotspots using Energy Performance Certificates: A case study of England and Wales
Description
This repository contains open data and code to replicate the analysis in the manuscript "High-resolution mapping of wood burning appliance hotspots using Energy Performance Certificates: A case study of England and Wales".
To recreate the analysis on your local device, please carry out the following steps:
-
Clone the GitHub repository (available at: https://github.com/UCL-Wellcome-Trust-Air-Pollution/EPC_mapping_project_code) to your local device, or download the codebase from the 'Code.tar' folder and unzip in your project directory. Please ensure you use the directory with the R Project in it as your root directory.
-
Download the 'Data.tar' file and unzip the file in the R Project directory. The data should be in a folder called 'Data' in the root directory. All non-EPC data is provided under the UK Open Government License version 3.0. EPC data is provided under licence from DLUHC: https://epc.opendatacommunities.org/docs/copyright.
-
Download the main EPC data to your local device and unzip (see below for detailed instructions on how to do this). For Windows users, the 'Scripts' folder of the repository contains a .bat file which can be used to unzip the data. Note that this file requires the user to have installed 7Zip and added 7Zip to the system path. Otherwise, the .tar file can be unzipped manually.
-
Run the 'run.R' file in the 'Scripts' folder of the directory. You may need to change the 'path_data_epc_folders' variable to the path to the unzipped EPC data folders on your local device (see step 3). The full pipeline should now run.
-
Once you have run the pipeline for the first time, you should see a file called 'data_epc_raw.parquet' in the 'Data/raw/epc_data' folder. Once you have verified this is the case, you can safely delete the original unzipped EPC data folder, since the file is very large (>40Gb). If you run the pipeline again, you will be prompted that the raw EPC data .parquet file already exists, and you have the option to skip the merging of raw data files.
Files
Files
(2.0 GB)
Name | Size | Download all |
---|---|---|
md5:953f98acad3cb53e62a76901f41a61cb
|
313.3 kB | Download |
md5:b91436929568a1ce8309c644acb695d2
|
2.0 GB | Download |
Additional details
Funding
Software
- Repository URL
- https://github.com/UCL-Wellcome-Trust-Air-Pollution/EPC_mapping_project_code
- Programming language
- R
- Development Status
- Active