pyeo.classification¶
Contains every function to do with map classification. This includes model creation, map classification and processes for array manipulation into scikit-learn compatible forms.
-
pyeo.classification.
autochunk
(dataset, mem_limit=None)¶ EXPERIMENTAL Calculates the number of chunks to break a dataset into without a memory error. Presumes that 80% of the memory on the host machine is available for use by Pyeo. We want to break the dataset into as few chunks as possible without going over mem_limit. mem_limit defaults to total amount of RAM available on machine if not specified
- Parameters
dataset – The dataset to chunk
mem_limit – The maximum amount of memory available to the process. Will be automatically populated from os.sysconf if missing.
- Returns
- Return type
The number of chunks to most efficiently break the image into.
-
pyeo.classification.
change_from_composite
(image_path, composite_path, model_path, class_out_path, prob_out_path=None)¶ Stacks an image with a composite and classifies each pixel change with a scikit-learn model The image that is classified is has the following bands
composite blue
composite green
composite red
composite IR
image blue
image green
image red
image IR
- Parameters
image_path – The path to the image
composite_path – The path to the composite
model_path – The path to a .pkl of a scikit-learn classifier that takes 8 features
class_out_path – A location to save the resulting classification .tif
prob_out_path – A location to save the probability raster of each pixel
-
pyeo.classification.
classify_directory
(in_dir, model_path, class_out_dir, prob_out_dir=None, apply_mask=False, out_type='GTiff', num_chunks=10)¶ Classifies every file ending in .tif in in_dir using model at model_path. Outputs are saved in class_out_dir and prob_out_dir, named [input_name]_class and _prob, respectively.
See the documentation for classification.classify_image() for more details.
- Parameters
in_dir – The path to the directory containing the rasters to be classified.
model_path – The path to the .pkl file containing the model.
class_out_dir – The directory that will store the classified maps
prob_out_dir – The directory that will store the probability maps of the classified maps
apply_mask – If present, uses the corresponding .msk files to mask the directories
out_type – The raster format of the class image. Defaults to GTiff (geotif)
num_chunks – The number of chunks to break an image into.
-
pyeo.classification.
classify_image
(image_path, model_path, class_out_path, prob_out_path=None, apply_mask=False, out_type='GTiff', num_chunks=10, nodata=0, skip_existing=False)¶ Produces a class map from a raster and a model. This applies the model’s fit() function to each pixel in the input raster, and saves the result into an output raster. The model is presumed to be a scikit-learn fitted model created using one of the other functions in this library (create_model_from_rasters, create_model_from_signatures).
To fit into a
- Parameters
image_path – The path to the raster image to be classified.
model_path – The path to the .pkl file containing the model
class_out_path – The path that the classified map will be saved at.
prob_out_path – If present, the path that the class probability map will be stored at.
apply_mask – If True, uses the .msk file corresponding to the image at image_path to skip any invalid pixels.
out_type – The raster format of the class image. Defaults to GTiff (geotif)
num_chunks – The number of chunks the image is broken into prior to classification. The smaller this number, the faster classification will run - but the more likely you are to get a outofmemory error.
nodata – The value to write to masked pixels
skip_existing – If true, do not run if class_out_path already exists
Notes
- If you want to create a custom model, the object is presumed to have the following methods and attributes:
model.n_classes_ : the number of classes the model will produce
model.n_cores : The number of CPU cores used to run the model
model.predict() : A function that will take a set of band inputs from a pixel and produce a class.
- model.predict_proba()If called with prob_out_path, a function that takes a set of n band inputs from a pixel
and produces n_classes_ outputs corresponding to the probabilties of a given pixel being that class
-
pyeo.classification.
create_model_for_region
(path_to_region, model_out, scores_out, attribute='CODE')¶ Takes all .tif files in a given folder and creates a pickled scikit-learn model for classifying them. Wraps classification.create_trained_model() ; see docs for that for the details.
- Parameters
path_to_region – Path to the folder containing the tifs.
model_out – Path to location to save the .pkl file
scores_out – Path to save the cross-validation scores
attribute – The label of the field in the training shapefiles that contains the classification labels.
-
pyeo.classification.
create_model_from_signatures
(sig_csv_path, model_out, sig_datatype=<class 'numpy.int32'>)¶ Takes a .csv file containing class signatures - produced by extract_features_to_csv - and uses it to train and pickle a scikit-learn model.
- Parameters
sig_csv_path – The path to the signatures file
model_out – The location to save the pickled model to.
sig_datatype – The datatype to read the csv as. Defaults to int32.
Notes
At present, the model is an ExtraTreesClassifier arrived at by tpot: model = ens.ExtraTreesClassifier(bootstrap=False, criterion=”gini”, max_features=0.55, min_samples_leaf=2,
min_samples_split=16, n_estimators=100, n_jobs=4, class_weight=’balanced’)
-
pyeo.classification.
create_trained_model
(training_image_file_paths, cross_val_repeats=5, attribute='CODE')¶ Creates a trained model from a set of training images with associated shapefiles.
This assumes that each image in training_image_file_paths has in the same directory a folder of the same name containing a shapefile of the same name. For example, in the folder training_data:
training_data
area1.tif
area1
area1.shp
area1.dbx
… rest of shapefile for area 1 …
area2.tif
area2
area2.shp
area2.dbx
… rest of shapefile for area 2 …
- Parameters
training_image_file_paths – A list of filepaths to training images.
cross_val_repeats – The number of cross-validation repeats to use
attribute – The label of the field in the training shapefiles that contains the classification labels.
- Returns
model – A fitted scikit-learn model. See notes.
scores – The cross-validation scores for model
Notes
For full details of how to create an appropriate shapefile, see [here](../index.html#training_data). At present, the model is an ExtraTreesClassifier arrived at by tpot: model = ens.ExtraTreesClassifier(bootstrap=False, criterion=”gini”, max_features=0.55, min_samples_leaf=2,
min_samples_split=16, n_estimators=100, n_jobs=4, class_weight=’balanced’)
-
pyeo.classification.
extract_features_to_csv
(in_ras_path, training_shape_path, out_path, attribute='CODE')¶ Given a raster and a shapefile containing training polygons, extracts all pixels into a CSV file for further analysis.
- This produces a CSV file where each row corresponds to a pixel. The columns are as follows:
Column 1: Class labels from the shapefile field labelled as ‘attribute’. Column 2… : Band values from the raster at in_ras_path.
- Parameters
in_ras_path – The path to the raster used for creating the training dataset
training_shape_path – The path to the shapefile containing classification polygons
out_path – The path for the new .csv file
attribute – The label of the field in the training shapefile that contains the classification labels.
-
pyeo.classification.
get_training_data
(image_path, shape_path, attribute='CODE', shape_projection_id=4326)¶ Given an image and a shapefile with categories, returns training data and features suitable for fitting a scikit-learn classifier.
This extracts every pixel in image_path touched by the polygons in shape_path
For full details of how to create an appropriate shapefile, see [here](../index.html#training_data).
- Parameters
image_path – The path to the raster image to extract signatures from
shape_path – The path to the shapefile containing labelled class polygons
attribute – The field containing the class labels
shape_projection_id – The projection of the shapefile
- Returns
training_data – A numpy array of shape (n_pixels, bands), where n_pixels is the number of pixels covered by the training polygons
features – A 1-d numpy array of length (n_pixels) containing the class labels for the corresponding pixel in training_data
Notes
For performance, this uses scikit’s sparse.nonzero() function to get the location of each training data pixel. This means that this will ignore any classes with a label of ‘0’.
-
pyeo.classification.
load_signatures
(sig_csv_path, sig_datatype=<class 'numpy.int32'>)¶ Extracts features and class labels from a signature CSV :param sig_csv_path: :param sig_datatype:
- Returns
features – a numpy array of the shape (feature_count, sample_count)
class_labels – a 1d numpy array of class labels corresponding to the samples in features.
-
pyeo.classification.
raster_reclass_binary
(img_path, rcl_value, outFn, outFmt='GTiff', write_out=True)¶ Takes a raster and reclassifies rcl_value to 1, with all others becoming 0. In-place operation if write_out is True.
- Parameters
img_path – Path to 1 band input raster.
rcl_value – Integer indication the value that should be reclassified to 1. All other values will be 0.
outFn – Output file name.
outFmt – Output format. Set to GTiff by default. Other GDAL options available.
write_out – Boolean. Set to True by default. Will write raster to disk. If False, only an array is returned
- Returns
- Return type
Reclassifies numpy array
-
pyeo.classification.
reshape_ml_out_to_raster
(classes, width, height)¶ Takes the output of a pixel classifier and reshapes to a single band image.
- Parameters
classes – A 1-d numpy array of classes from a pixel classifier
width – The width in pixels of the image the produced the classification
height – The height in pixels of the image that produced the classification
- Returns
- Return type
A 2-dimensional Numpy array of shape(width, height)
-
pyeo.classification.
reshape_prob_out_to_raster
(probs, width, height)¶ Takes the probability output of a pixel classifier and reshapes it to a raster.
- Parameters
probs – A numpy array of shape(n_pixels, n_classes)
width – The width in pixels of the image that produced the probability classification
height – The height in pixels of the image that produced the probability classification
- Returns
- Return type
The reshaped image array
-
pyeo.classification.
reshape_raster_for_ml
(image_array)¶ A low-level function that reshapes an array from gdal order [band, y, x] to scikit features order [x*y, band]
For classification, scikit-learn functions take a 2-dimensional array of features of the shape (samples, features). For pixel classification, features correspond to bands and samples correspond to specific pixels.
- Parameters
image_array – A 3-dimensional Numpy array of shape (bands, y, x)
- Returns
- Return type
A 2-dimensional Numpy array of shape (samples, features)