pyeo.classification

Contains every function to do with map classification. This includes model creation, map classification and processes for array manipulation into scikit-learn compatible forms.

pyeo.classification.autochunk(dataset, mem_limit=None)

EXPERIMENTAL Calculates the number of chunks to break a dataset into without a memory error. Presumes that 80% of the memory on the host machine is available for use by Pyeo. We want to break the dataset into as few chunks as possible without going over mem_limit. mem_limit defaults to total amount of RAM available on machine if not specified

Parameters
  • dataset – The dataset to chunk

  • mem_limit – The maximum amount of memory available to the process. Will be automatically populated from os.sysconf if missing.

Returns

Return type

The number of chunks to most efficiently break the image into.

pyeo.classification.change_from_composite(image_path, composite_path, model_path, class_out_path, prob_out_path=None)

Stacks an image with a composite and classifies each pixel change with a scikit-learn model The image that is classified is has the following bands

  1. composite blue

  2. composite green

  3. composite red

  4. composite IR

  5. image blue

  6. image green

  7. image red

  8. image IR

Parameters
  • image_path – The path to the image

  • composite_path – The path to the composite

  • model_path – The path to a .pkl of a scikit-learn classifier that takes 8 features

  • class_out_path – A location to save the resulting classification .tif

  • prob_out_path – A location to save the probability raster of each pixel

pyeo.classification.classify_directory(in_dir, model_path, class_out_dir, prob_out_dir=None, apply_mask=False, out_type='GTiff', num_chunks=10)

Classifies every file ending in .tif in in_dir using model at model_path. Outputs are saved in class_out_dir and prob_out_dir, named [input_name]_class and _prob, respectively.

See the documentation for classification.classify_image() for more details.

Parameters
  • in_dir – The path to the directory containing the rasters to be classified.

  • model_path – The path to the .pkl file containing the model.

  • class_out_dir – The directory that will store the classified maps

  • prob_out_dir – The directory that will store the probability maps of the classified maps

  • apply_mask – If present, uses the corresponding .msk files to mask the directories

  • out_type – The raster format of the class image. Defaults to GTiff (geotif)

  • num_chunks – The number of chunks to break an image into.

pyeo.classification.classify_image(image_path, model_path, class_out_path, prob_out_path=None, apply_mask=False, out_type='GTiff', num_chunks=10, nodata=0, skip_existing=False)

Produces a class map from a raster and a model. This applies the model’s fit() function to each pixel in the input raster, and saves the result into an output raster. The model is presumed to be a scikit-learn fitted model created using one of the other functions in this library (create_model_from_rasters, create_model_from_signatures).

To fit into a

Parameters
  • image_path – The path to the raster image to be classified.

  • model_path – The path to the .pkl file containing the model

  • class_out_path – The path that the classified map will be saved at.

  • prob_out_path – If present, the path that the class probability map will be stored at.

  • apply_mask – If True, uses the .msk file corresponding to the image at image_path to skip any invalid pixels.

  • out_type – The raster format of the class image. Defaults to GTiff (geotif)

  • num_chunks – The number of chunks the image is broken into prior to classification. The smaller this number, the faster classification will run - but the more likely you are to get a outofmemory error.

  • nodata – The value to write to masked pixels

  • skip_existing – If true, do not run if class_out_path already exists

Notes

If you want to create a custom model, the object is presumed to have the following methods and attributes:
  • model.n_classes_ : the number of classes the model will produce

  • model.n_cores : The number of CPU cores used to run the model

  • model.predict() : A function that will take a set of band inputs from a pixel and produce a class.

  • model.predict_proba()If called with prob_out_path, a function that takes a set of n band inputs from a pixel

    and produces n_classes_ outputs corresponding to the probabilties of a given pixel being that class

pyeo.classification.create_model_for_region(path_to_region, model_out, scores_out, attribute='CODE')

Takes all .tif files in a given folder and creates a pickled scikit-learn model for classifying them. Wraps classification.create_trained_model() ; see docs for that for the details.

Parameters
  • path_to_region – Path to the folder containing the tifs.

  • model_out – Path to location to save the .pkl file

  • scores_out – Path to save the cross-validation scores

  • attribute – The label of the field in the training shapefiles that contains the classification labels.

pyeo.classification.create_model_from_signatures(sig_csv_path, model_out, sig_datatype=<class 'numpy.int32'>)

Takes a .csv file containing class signatures - produced by extract_features_to_csv - and uses it to train and pickle a scikit-learn model.

Parameters
  • sig_csv_path – The path to the signatures file

  • model_out – The location to save the pickled model to.

  • sig_datatype – The datatype to read the csv as. Defaults to int32.

Notes

At present, the model is an ExtraTreesClassifier arrived at by tpot: model = ens.ExtraTreesClassifier(bootstrap=False, criterion=”gini”, max_features=0.55, min_samples_leaf=2,

min_samples_split=16, n_estimators=100, n_jobs=4, class_weight=’balanced’)

pyeo.classification.create_trained_model(training_image_file_paths, cross_val_repeats=5, attribute='CODE')

Creates a trained model from a set of training images with associated shapefiles.

This assumes that each image in training_image_file_paths has in the same directory a folder of the same name containing a shapefile of the same name. For example, in the folder training_data:

training_data

  • area1.tif

  • area1

    • area1.shp

    • area1.dbx

… rest of shapefile for area 1 …

  • area2.tif

  • area2

    • area2.shp

    • area2.dbx

… rest of shapefile for area 2 …

Parameters
  • training_image_file_paths – A list of filepaths to training images.

  • cross_val_repeats – The number of cross-validation repeats to use

  • attribute – The label of the field in the training shapefiles that contains the classification labels.

Returns

  • model – A fitted scikit-learn model. See notes.

  • scores – The cross-validation scores for model

Notes

For full details of how to create an appropriate shapefile, see [here](../index.html#training_data). At present, the model is an ExtraTreesClassifier arrived at by tpot: model = ens.ExtraTreesClassifier(bootstrap=False, criterion=”gini”, max_features=0.55, min_samples_leaf=2,

min_samples_split=16, n_estimators=100, n_jobs=4, class_weight=’balanced’)

pyeo.classification.extract_features_to_csv(in_ras_path, training_shape_path, out_path, attribute='CODE')

Given a raster and a shapefile containing training polygons, extracts all pixels into a CSV file for further analysis.

This produces a CSV file where each row corresponds to a pixel. The columns are as follows:

Column 1: Class labels from the shapefile field labelled as ‘attribute’. Column 2… : Band values from the raster at in_ras_path.

Parameters
  • in_ras_path – The path to the raster used for creating the training dataset

  • training_shape_path – The path to the shapefile containing classification polygons

  • out_path – The path for the new .csv file

  • attribute – The label of the field in the training shapefile that contains the classification labels.

pyeo.classification.get_training_data(image_path, shape_path, attribute='CODE', shape_projection_id=4326)

Given an image and a shapefile with categories, returns training data and features suitable for fitting a scikit-learn classifier.

This extracts every pixel in image_path touched by the polygons in shape_path

For full details of how to create an appropriate shapefile, see [here](../index.html#training_data).

Parameters
  • image_path – The path to the raster image to extract signatures from

  • shape_path – The path to the shapefile containing labelled class polygons

  • attribute – The field containing the class labels

  • shape_projection_id – The projection of the shapefile

Returns

  • training_data – A numpy array of shape (n_pixels, bands), where n_pixels is the number of pixels covered by the training polygons

  • features – A 1-d numpy array of length (n_pixels) containing the class labels for the corresponding pixel in training_data

Notes

For performance, this uses scikit’s sparse.nonzero() function to get the location of each training data pixel. This means that this will ignore any classes with a label of ‘0’.

pyeo.classification.load_signatures(sig_csv_path, sig_datatype=<class 'numpy.int32'>)

Extracts features and class labels from a signature CSV :param sig_csv_path: :param sig_datatype:

Returns

  • features – a numpy array of the shape (feature_count, sample_count)

  • class_labels – a 1d numpy array of class labels corresponding to the samples in features.

pyeo.classification.raster_reclass_binary(img_path, rcl_value, outFn, outFmt='GTiff', write_out=True)

Takes a raster and reclassifies rcl_value to 1, with all others becoming 0. In-place operation if write_out is True.

Parameters
  • img_path – Path to 1 band input raster.

  • rcl_value – Integer indication the value that should be reclassified to 1. All other values will be 0.

  • outFn – Output file name.

  • outFmt – Output format. Set to GTiff by default. Other GDAL options available.

  • write_out – Boolean. Set to True by default. Will write raster to disk. If False, only an array is returned

Returns

Return type

Reclassifies numpy array

pyeo.classification.reshape_ml_out_to_raster(classes, width, height)

Takes the output of a pixel classifier and reshapes to a single band image.

Parameters
  • classes – A 1-d numpy array of classes from a pixel classifier

  • width – The width in pixels of the image the produced the classification

  • height – The height in pixels of the image that produced the classification

Returns

Return type

A 2-dimensional Numpy array of shape(width, height)

pyeo.classification.reshape_prob_out_to_raster(probs, width, height)

Takes the probability output of a pixel classifier and reshapes it to a raster.

Parameters
  • probs – A numpy array of shape(n_pixels, n_classes)

  • width – The width in pixels of the image that produced the probability classification

  • height – The height in pixels of the image that produced the probability classification

Returns

Return type

The reshaped image array

pyeo.classification.reshape_raster_for_ml(image_array)

A low-level function that reshapes an array from gdal order [band, y, x] to scikit features order [x*y, band]

For classification, scikit-learn functions take a 2-dimensional array of features of the shape (samples, features). For pixel classification, features correspond to bands and samples correspond to specific pixels.

Parameters

image_array – A 3-dimensional Numpy array of shape (bands, y, x)

Returns

Return type

A 2-dimensional Numpy array of shape (samples, features)