Arctic Watershed Sentinel-2 RGB Images and Labels for Image Segmentation using the Segmentation Zoo program
Contributors
Data collector (2):
Supervisor:
Description
# Arctic Watershed Sentinel-2 RGB Images and Labels for Image Segmentation using the Segmentation Zoo program
## Overview
* Watershed dataset and files for testing the [segmentation gym](https://github.com/Doodleverse/segmentation_gym) program for image segmentation
* Watershed dataset collected by Noah Rupert, Aayla Kastning, and Addison Green
* Dataset consists of a series of imagery throughout the Arctic using the Sentinel-2 satellite.
* Imagery spans the period May-July through the years of 2019-2023.
* Label-image pairs were created by Noah Rupert, W&M, using the labeling program [Doodler](https://github.com/Doodleverse/dash_doodler).
Download and unload the zipped file to replicate results or use for similar purposes. Scripts are available at the personal fork (https://github.com/ncrupert/segmentation_gym), then see the relevant page on the [segmentation gym wiki](https://github.com/Doodleverse/segmentation_gym/wiki) for further explanation.
This dataset and associated models were made by Noah Rupert with instruction and guidance from Joanmarie Del Vecchio for the purposes of determining whether machine learning models are capable of identifying water tracks to the same accuracy as a human observer.
## file structure
```{sh}
/Users/Someone/my_segmentation_zoo_datasets
│ ├── config
│ | └── *.json
│ ├── capehatteras_data
| | ├── fromDoodler
| | | ├──images
│ | | └──labels
| | ├──npzForModel
│ | └──toPredict
│ └── modelOut
│ └── *.png
│ └── weights
│ └── *.h5
There are 3 config files:
1. `/watersheds_test_resunet.json`
2. `/watersheds_test_segformer.json`
3. `/watersheds_test_vanilla_unet.json.json`
The first and third files are for res-unet and unet models respectively. They differ with specification of kernel size. The second file is for the SegFormer model. Only the SegFormer model was utilized for our purposes.
The SegFormer model contains the info below as follows:
"AUG_WIDTHSHIFT": 0.05, # [augmentation] amount of random width shift as a proportion
"AUG_HEIGHTSHIFT": 0.05,# [augmentation] amount of random width shift as a proportion
"AUG_HFLIP": true, # [augmentation] if true, randomly apply horizontal flips
"AUG_VFLIP": false, # [augmentation] if true, randomly apply vertical flips
"AUG_LOOPS": 10, #[augmentation] number of portions to split the data into (recommended > 2 to save memory)
"AUG_COPIES": 5 #[augmentation] number iof augmented copies to make
"SET_GPU": "0" #which GPU to use. If multiple, list separated by a comma, e.g. '0,1,2'. If CPU is requested, use "-1"
## watersheds_test_run data
Folder containing all the model input data
/modelOut
| train_data | ├── train_images | ├── train_labels | ├── Train_npzs
| val_data | ├── val_images | ├── val_labels | ├── val_npzs
## modelOut
PNG format files containing example model outputs from the train ('_train_' in filename) and validation ('_val_' in filename) subsets as well as an image showing training loss and accuracy curves with `trainhist` in the filename. There are two sets of these files, those associated with the residual unet trained with dice loss contain `resunet` in their name, and those from the UNet are named with `vanilla_unet`.
## weights
There are model weights files associated with each config files.
Files
watersheds_test_run.zip
Files
(209.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:761469ef1dc323cb037b78bde6771880
|
209.8 MB | Preview Download |
Additional details
Funding
- U.S. National Science Foundation
- Elements: A workflow for efficient and reproducible permafrost geomorphology analysis #2311319
Dates
- Submitted
-
2025-05-14
Software
- Repository URL
- https://github.com/ncrupert/segmentation_gym
- Programming language
- Python