Cape Hatteras Landsat8 RGB Images and Labels for Image Segmentation using the program, Segmentation Zoo

doi:10.5281/zenodo.7232051

Published October 20, 2022 | Version v3.0

Dataset Open

Cape Hatteras Landsat8 RGB Images and Labels for Image Segmentation using the program, Segmentation Zoo

Buscombe, Daniel¹

1. Marda Science

# Cape Hatteras Landsat8 RGB Images and Labels for Image Segmentation using the program, Segmentation Gym

## Overview
* Test datasets and files for testing the [segmentation gym](https://github.com/Doodleverse/segmentation_gym) program for image segmentation
* Data set made by Daniel Buscombe, Marda Science LLC.
* Dataset consists of a time-series of Landsat-8 images of Cape Hatteras National Seashore, courtesy of the U.S. Geological Survey.
* Imagery spans the period February 2015 to September 2021.
* Labels were created by Daniel Buscombe, Marda Science, using the labeling program [Doodler](https://github.com/Doodleverse/dash_doodler).

Download this file and unzip to somewhere on your machine (although *not* inside the `segmentation_gym` folder), then see the relevant page on the [segmentation gym wiki](https://github.com/Doodleverse/segmentation_gym/wiki) for further explanation.

This dataset and associated models were made by Dr Daniel Buscombe, Marda Science LLC, for the purposes of demonstrating the functionality of Segmentation Gym. The labels were created using [Doodler](https://github.com/Doodleverse/dash_doodler/).

Previous versions:

1.0. https://zenodo.org/record/5895128#.Y1G5s3bMIuU original release, Oct 2021, conforming to Segmentation Gym functionality on Oct 2021

2.0 https://zenodo.org/record/7036025#.Y1G57XbMIuU, Jan 23 2022, conforming to Segmentation Gym functionality on Jan 23 2022

This is version 3.0, created 10/20/22, and has been tested with Segmentation Gym using doodleverse-utils 0.0.11 https://pypi.org/project/doodleverse-utils/0.0.11/

## file structure

```{sh}
/Users/Someone/my_segmentation_zoo_datasets
                    │   ├── config
                    │   |    └── *.json
                    │   ├── capehatteras_data
                    |   |   ├── fromDoodler
                    |   |   |     ├──images
                    │   |   |     └──labels
                    |   |   ├──npzForModel
                    │   |   └──toPredict
                    │   └── modelOut
                    │       └── *.png
                    │   └── weights
                    │       └── *.h5

```

## config
There are 3 config files:
1. `/config/hatteras_l8_resunet.json`
2. `/config/hatteras_l8_vanilla_unet.json`
3. `/config/hatteras_l8_resunet_model2.json`

The first two are for res-unet and unet models respectively. The last one differs from the first only with specification of kernel size. It is provided as an example of how to conduct model training experiments, modifying one hyperparameter at a time in the effort to create an optimal model.

They all contain the same essential information and differ as indicated below

```
{
"TARGET_SIZE": [768,768], # the size of the imagery you wish the model to train on. This may not be the original size
"MODEL": "resunet", # model name. Otherwise, "unet"
"NCLASSES": 4, # number of classes
"KERNEL":9, # horizontal size of convolution kernel in pixels
"STRIDE":2, # stride in convolution kernel
"BATCH_SIZE": 7, # number of images/labels per batch
"FILTERS":6, # number of filters
"N_DATA_BANDS": 3, # number of image bands
"DROPOUT":0.1, # amount of dropout
"DROPOUT_CHANGE_PER_LAYER":0.0, # change in dropout per layer
"DROPOUT_TYPE":"standard", # type of dropout. Otherwise "spatial"
"USE_DROPOUT_ON_UPSAMPLING":false, # if true, dropout is used on upsampling as well as downsampling
"DO_TRAIN": false, # if false, the model will not train, but you will select this config file, data directory, and the program will load the model weights and test the model on the validation subset
if true, the model will train from scratch (warning! this will overwrite the existing weights file in h5 format)
"LOSS":"dice", # model training loss function, otherwise "cat" for categorical cross-entropy
"PATIENCE": 10, # number of epochs of no model improvement before training is aborted
"MAX_EPOCHS": 100, # maximum number of training epochs
"VALIDATION_SPLIT": 0.6, #proportion to use for validation
"RAMPUP_EPOCHS": 20, # [LR-scheduler] rampup to maximim
"SUSTAIN_EPOCHS": 0.0, # [LR-scheduler] sustain at maximum
"EXP_DECAY": 0.9, # [LR-scheduler] decay rate
"START_LR": 1e-7, # [LR-scheduler] start lr
"MIN_LR": 1e-7, # [LR-scheduler] min lr
"MAX_LR": 1e-4, # [LR-scheduler] max lr
"FILTER_VALUE": 0, #if >0, the size of a median filter to apply on outputs (not recommended unless you have noisy outputs)
"DOPLOT": true, #make plots
"ROOT_STRING": "hatteras_l8_aug_768", #data file (npz) prefix string
"USEMASK": false, # use the convention 'mask' in label image file names, instead of the preferred 'label'
"AUG_ROT": 5, # [augmentation] amount of rotation in degrees
"AUG_ZOOM": 0.05, # [augmentation] amount of zoom as a proportion
"AUG_WIDTHSHIFT": 0.05, # [augmentation] amount of random width shift as a proportion
"AUG_HEIGHTSHIFT": 0.05,# [augmentation] amount of random width shift as a proportion
"AUG_HFLIP": true, # [augmentation] if true, randomly apply horizontal flips
"AUG_VFLIP": false, # [augmentation] if true, randomly apply vertical flips
"AUG_LOOPS": 10, #[augmentation] number of portions to split the data into (recommended > 2 to save memory)
"AUG_COPIES": 5 #[augmentation] number iof augmented copies to make
"SET_GPU": "0" #which GPU to use. If multiple, list separated by a comma, e.g. '0,1,2'. If CPU is requested, use "-1"
"WRITE_MODELMETADATA": false, #if true, the prompts `seg_images_in_folder.py` to write detailed metadata for each sample file
"DO_CRF": true #if true, apply CRF post-processing to outputs

"LOSS_WEIGHTS": false, #if true, apply per-class weights to loss function

"MODE": "all", #'all' means use both non-augmented and augmented files, "noaug" means use non-augmented only, "aug" uses augmented only

"SET_PCI_BUS_ID": true, #if true, make keras aware of the PCI BUS ID (advanced or nonstandard GPU usage)

"TESTTIMEAUG": true, #if true, apply test-time augmentation when model in inference mode

"WRITE_MODELMETADATA": true,# if true, write model metadata per image when model in inference mode

"OTSU_THRESHOLD": true# if true, and NCLASSES=2 only, use per-image Otsu threshold rather than decision boundary of 0.5 on softmax scores

}
```

## capehatteras_data
Folder containing all the model input data

```{sh}
                    │   ├── capehatteras_data: folder containing all the model input data
                    |   |   ├── fromDoodler: folder containing images and labels exported from Doodler using [this program](https://github.com/dbuscombe-usgs/dash_doodler/blob/main/utils/gen_images_and_labels_4_zoo.py)
                    |   |   |     ├──images: jpg format files, one per label image
                    │   |   |     └──labels: jpg format files, one per image
                    |   |   ├──npzForModel: npz format files for model training using [this program](https://github.com/dbuscombe-usgs/segmentation_zoo/blob/main/train_model.py) that have been created following the workflow [documented here](https://github.com/dbuscombe-usgs/segmentation_zoo/wiki/Create-a-model-ready-dataset) using [this program](https://github.com/dbuscombe-usgs/segmentation_zoo/blob/main/make_nd_dataset.py)
                    │   |   └──toPredict: a folder of images to test model prediction using [this program](https://github.com/dbuscombe-usgs/segmentation_zoo/blob/main/seg_images_in_folder.py)
```

## modelOut
PNG format files containing example model outputs from the train ('_train_' in filename) and validation ('_val_' in filename) subsets as well as an image showing training loss and accuracy curves with `trainhist` in the filename. There are two sets of these files, those associated with the residual unet trained with dice loss contain `resunet` in their name, and those from the UNet are named with `vanilla_unet`.

## weights
There are model weights files associated with each config files.

Files

my_segmentation_gym_datasets.zip

Files (634.7 MB)

Name	Size	Download all
my_segmentation_gym_datasets.zip md5:3a28d2a0720750483f4dc259f6990ba6	634.7 MB	Preview Download
README.md md5:d7881519336e9fea739e72f6b99d0288	1.6 kB	Preview Download

	All versions	This version
Views	743	180
Downloads	329	55
Data volume	321.1 GB	30.5 GB

Cape Hatteras Landsat8 RGB Images and Labels for Image Segmentation using the program, Segmentation Zoo

Creators

Description

Files

my_segmentation_gym_datasets.zip

Files (634.7 MB)