Published November 30, 2022 | Version 0
Dataset Open

Overhead Wind Turbine Dataset (NAIP)

Description

1 - OVERVIEW

 

This dataset contains overhead images of wind turbines from three regions of the United States – the Eastern Midwest (EM), Northwest (NW), and Southwest (SW). The images come from the National Agricultural Imagery Program and were extracted using Google Earth Engine and wind turbine latitude-longitude coordinates from the U.S. Wind Turbine Database. Overall, there are 2003 NAIP collected images, of which 988 images contain wind turbines and the other 1015 are background images (not containing wind turbines) collected from regions nearby the wind turbines. Labels are provided for all images containing wind turbines. We welcome uses of this dataset for object detection or other research purposes.

 

2 - DATA DETAILS

 

Each image is 608 x 608 pixels, with a GSD of 1m. This means each image represents a frame of approximately 608 m x 608m. Because images were collected from overhead the exact wind turbine coordinates, images used to be nearly exactly centered on turbines. To avoid this issue, images were randomly shifted up to 75m in two directions. 

 

We refer to images without turbines as "background images", and further split up the images with turbines into the training and testing set splits. We call the training images with turbines "real images" and the testing images "test images".

 

Distribution of gathered images by region and type:

 

Domain

Real

Test

Background 

EM

267

100

244

NW

213

100

415

SW

208

100

356

 

Note that this dataset is part of a larger research project in Duke's 2021-2022 Bass Connections team, Creating Artificial Worlds with AI to Improve Energy Access Data. Our research proposes a technique to synthetically generate images with implanted energy infrastructure objects. We include the synthetic images we generated along with the NAIP collected images above. Generating synthetic images requires a training and testing domain, so for each pair of domains we include 173 synthetically generated images. For a fuller picture on our research, including additional image data from domain adaptation techniques we benchmark our method against, visit our github: https://github.com/energydatalab/closing-the-domain-gap. If you use this dataset, please cite the citation found in our Github README.

 

3 - NAVIGATING THE DATASET

 

Once the data is unzipped, you will see that the base level of the dataset contains an image and a labels folder, which have the exact same structure. Here is how the images directory is divided:

 

  | - images

  |  | - SW

  |  |  | - Background

  |  |  | - Test

  |  |  | - Real

  |  | - EM

  |  |  | - Background

  |  |  | - Test

  |  |  | - Real

  |  | - NW

  |  |  | - Background

  |  |  | - Test

  |  |  | - Real

  |  | - Synthetic

  |  |  | - s_EM_t_NW

  |  |  | - s_SW_t_NW

  |  |  | - s_NW_t_NW

  |  |  | - s_NW_t_EM

  |  |  | - s_SW_t_EM

  |  |  | - s_EM_t_SW

  |  |  | - s_NW_t_SW

  |  |  | - s_EM_t_EM

  |  |  | - s_SW_t_SW

 

For example images/SW/Real has the 208 .jpg images from the Southwest that contain turbines. The synthetic subdirectory is structured such that for example images/Synthetic/s_EM_t_NW contains synthetic images using a source domain of Eastern Midwest and a target domain of Northwest, meaning the images were stylized to artificially look like Northwest images.

 

Note that we also provide a domain_overview.json file at the top level to help you navigate the directory. The domain_overview.json file navigates the directory with keys, so if you load the file as f, then f['images']['SW']['Background'] should list all the background photos from the SW. The keys in the domain json are ordered in the order we used the images for our experiments. So if our experiment used 100 SW background images, we used the images corresponding to the first 100 keys.

 

Naming conventions:

 

1 - Real and Test images:

 

{DOMAIN}_{UNIQUE ID}.jpg

 

For example 'EM_136.jpg' with corresponding label file 'EM_136.txt' refers to an image from the Eastern Midwest with unique ID 136.

 

2 - Background images:

 

Background images were collected in 3 waves with the purpose to create a set of images similar visually to real images, just without turbines:

  1. The first wave came from NAIP images from the U.S. Wind Turbine Database coordinates where no wind turbine was present in the snapshot (NAIP images span a relatively large time, thus it is possible that wind turbines might be missing from the images). These images are labeled {DOMAIN}_{UNIQUE ID}.jpg, for example 'EM_1612_background.jpg'.

  2. Using wind turbine coordinates, images were randomly collected either 4000m Southeast or Northwest. These images are labeled {DOMAIN}_{UNIQUE_ID}_{SHIFT DIRECTION (SE or NW)}.jpg. For example 'NW_12750_SE_background.jpg' refers to an image from the Northwest without turbines captured at a shift of 4000m Southeast from a wind turbine with unique ID 12750. Using wind turbine coordinates, images were randomly collected either 6000m Southeast or Northwest. These images are labeled {DOMAIN}_{UNIQUE_ID}_{SHIFT DIRECTION (SE or NW)}_6000.jpg, for example 'NW_12937_NW_6000_background.jpg'.

 

3 - Synthetic images

 

Each synthetic image takes in labeled wind turbine examples from the source domain, a background image from the target domain, and a mask. It uses the mask to place wind turbine examples and blends those examples onto the background image using GP-GAN. Thus, the naming conventions for synthetic images are:

 

{BACKGROUND IMAGE NAME FROM TARGET DOMAIN}_{MASK NUMBER}.jpg.

 

For example, images/Synthetic/s_NW_t_SW/SW_2246_m15.jpg corresponds to a synthetic image created using labeled wind turbine examples from the Northwest and stylized in the image of the Southwest using Southwest background image SW_2246 and mask 15.

 

For any remaining questions, please reach out to the author point of contact at caleb.kornfein@gmail.com.

Files

Wind Turbine Data.zip

Files (179.8 MB)

Name Size Download all
md5:a6ca0386f6e92caa7659c0f5707814d2
179.8 MB Preview Download