geospatial_learn package¶
Submodules¶
geospatial_learn.raster module¶
The geodata module.
Description¶
A series of tools for the manipulation of geospatial imagery/rasters such as masking or raster algebraic type functions and the conversion of Sentinel 2 data to gdal compatible formats.
-
raster.array2raster(array, bands, inRaster, outRas, dtype, FMT=None)¶ Save a raster from a numpy array using the geoinfo from another.
- Parameters
array (np array) – a numpy array.
bands (int) – the no of bands.
inRaster (string) – the path of a raster.
outRas (string) – the path of the output raster.
dtype (int) – though you need to know what the number represents! a GDAL datatype (see the GDAL website) e.g gdal.GDT_Int32
FMT (string) – (optional) a GDAL raster format (see the GDAL website) eg Gtiff, HFA, KEA.
-
raster.batch_translate(folder, wildcard, FMT='Gtiff')¶ Using the gdal python API, this function translates the format of files to commonly used formats
- Parameters
folder (string) – the folder containing the rasters to be translated
wildcard (string) – the format wildcard to search for e.g. ‘.tif’
FMT (string (optional)) – a GDAL raster format (see the GDAL website) eg Gtiff, HFA, KEA
-
raster.calc_ndvi(inputIm, outputIm, bandsList, blocksize=256, FMT=None, dtype=None)¶ Create a copy of an image with an ndvi band added
- Parameters
inputIm (string) – the granule folder
bands (list) – a list of band indicies to be used, eg - [3,4] for Sent2 data
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
blocksize (int) – the chunk of raster read in & write out
-
raster.clip_raster(inRas, inShape, outRas, nodata_value=None, blocksize=None, blockmode=True)¶ Clip a raster
- Parameters
inRas (string) – the input image
outPoly (string) – the input polygon file path
outRas (string (optional)) – the clipped raster
nodata_value (numerical (optional)) – self explanatory
blocksize (int (optional)) – the square chunk processed at any one time
blockmode (bool (optional)) – whether the raster will be clipped entirely in memory or by chunck
-
raster.color_raster(inRas, color_file, output_file)¶ Generate a txt colorfile and make a RGB image from a grayscale one
- Parameters
inRas (string) – Path to input raster (single band greyscale)
color_file (string) – Path to output colorfile.txt
-
raster.combine_scene(scl, c_scn, blocksize=256)¶ combine another scene classification with the sen2cor one
- sclstring
the sen2cor one
- c_scnstring
the independently derived one - this will be modified
- blocksizestring
chunck to process
-
raster.hist_match(inputImage, templateImage)¶ Adjust the pixel values of a grayscale image such that its histogram matches that of a target image.
Writes to the inputImage dataset so that it matches
As the entire band histogram is required this can become memory intensive with big rasters eg 10 x 10k+
Inspire by/adapted from something on stack on image processing - credit to that author
- Parameters
inputImage (string) – image to transform; the histogram is computed over the flattened array
templateImage (string) – template image can have different dimensions to source
-
raster.jp2_translate(folder, FMT=None, mode='L1C')¶ Translate all files from S2 download to a useable format
Default FMT is GTiff (leave blank), for .img FMT=’HFA’, for .vrt FMT=’VRT’
If you posses a gdal compiled with the corrext openjpg support use that
This function might be useful if you wish to retain seperate rasters, but the use of stack_S2 is recommended
- Parameters
folder (string) – S2 granule dir
mode (string) – ‘L2A’ , ‘20’, ‘10’, L1C (default)
FMT (string (optional)) – a GDAL raster format (see the GDAL website) eg Gtiff, HFA, KEA
-
raster.jp2_translate_batch(mainFolder, FMT=None, mode=None)¶ Batch version of jp2translate
Perhaps only useful for the old tile format
- Parameters
mainFolder (string) – the path to S2 tile folder to process
FMT (string) – a GDAL raster format (see the GDAL website) eg Gtiff, HFA, KEA
mode (string (optional)) – ‘L2A’ , ‘20’, ‘10’, L1C (default)
-
raster.mask_raster(inputIm, mval, overwrite=True, outputIm=None, blocksize=None, FMT=None)¶ Perform a numpy masking operation on a raster where all values corresponding to mask value are retained - does this in blocks for efficiency on larger rasters
- Parameters
inputIm (string) – the input raster
mval (int) – the mask value eg 1, 2 etc
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
outputIm (string (optional)) – optionally write a separate output image, if None, will mask the input
blocksize (int) – the chunk of raster to read in
- Returns
A string of the output file path
- Return type
string
-
raster.mask_raster_multi(inputIm, mval=1, outval=None, mask=None, blocksize=256, FMT=None, dtype=None)¶ Perform a numpy masking operation on a raster where all values corresponding to mask value are retained - does this in blocks for efficiency on larger rasters
- Parameters
inputIm (string) – the granule folder
mval (int) – the masking value that delineates pixels to be kept
outval (numerical dtype eg int, float) – the areas removed will be written to this value default is 0
mask (string) – the mask raster to be used (optional)
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
mode (string) – None > 10m data, ‘20’ >20m
blocksize (int) – the chunk of raster read in & write out
-
raster.multi_temp_filter(inRas, outRas, bands=None, windowSize=None)¶ The multi temp filter for radar data as outlined & published by Quegan et al, Uni of Sheffield
This is only suitable for small images, as it holds intermediate data in memory
- Parameters
inRas (string) – the input raster
outRas (string) – the output raster
blocksize (int) – the chunck processed
windowsize (int) – the filter window size
FMT (string) – gdal compatible (optional) defaults is tif
-
raster.multi_temp_filter_block(inRas, outRas, bands=None, blocksize=256, windowsize=7, FMT=None)¶ Multi temporal filter implementation for radar data
See Quegan et al., for paper
Requires an installation of OTB
- Parameters
inRas (string) – the input raster
outRas (string) – the output raster
blocksize (int) – the chunck processed
windowsize (int) – the filter window size
FMT (string) – gdal compatible (optional) defaults is tif
-
raster.polygonize(inRas, outPoly, outField=None, mask=True, band=1, filetype='ESRI Shapefile')¶ Lifted straight from the cookbook and gdal func docs.
http://pcjericks.github.io/py-gdalogr-cookbook
- Parameters
inRas (string) – the input image
- outPolystring
the output polygon file path
- outFieldstring (optional)
the name of the field containing burnded values
- maskbool (optional)
use the input raster as a mask
- bandint
the input raster band
-
raster.raster2array(inRas, bands=[1])¶ Read a raster and return an array, either single or multiband
- Parameters
inRas (string) – input raster
bands (list) – a list of bands to return in the array
-
raster.remove_cloud_S2(inputIm, sceneIm, blocksize=256, FMT=None, min_size=4, dist=1)¶ Remove cloud using the a scene classification
This saves back to the input raster by default
- Parameters
inputIm (string) – the input image
sceneIm (string) – the scenemap to use as a mask for removing cloud It is assumed the scene map consists of 1 shadow, 2 cloud, 3 land, 4 water
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
min_size (int) – size in pixels to retain of cloud mask
blocksize (int) – the square chunk processed at any one time
-
raster.remove_cloud_S2_stk(inputIm, sceneIm1, sceneIm2=None, baseIm=None, blocksize=256, FMT=None, max_size=10, dist=1)¶ remove cloud using the the c_utils scene classification the KEA format is recommended, .tif is the default,
no need to add the file extension this is done automatically
- Parameters
inputIm (string) – the input image
sceneIm1, 2 (string) – the classification rasters used to mask out the areas in
the input image
baseIm (string) – Another multiband raster of same size extent as the inputIm where the baseIm image values are used rather than simply converting to zero (in the use case of 2 sceneIm classifications)
Returns
———–
nowt
Notes
———–
Useful if you have a base image whic is a cloudless composite, which
you intend to replace with the current image for the next round of
classification/ change detection
-
raster.rgb_ind(inputIm, outputIm, blocksize=256, FMT=None, dtype=5)¶ Create a copy of an image with an ndvi band added
- Parameters
inputIm (string) – the input rgb image
outputIm (string) – the output image
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
blocksize (int) – the chunk of raster read in & write out
-
raster.stack_S2(granule, inFMT='jp2', FMT=None, mode=None, old_order=False, blocksize=2048, overwrite=True)¶ Stacks S2 bands downloaded from ESA site
Can translate directly from jp2 format (this is recommended and is default).
If you possess gdal 2.1 with jp2k support then alternatively use gdal_translate
- Parameters
granule (string) – the granule folder
inFMT (string (optional)) – the format of the bands will likely be jp2
FMT (string (optional)) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
mode (string (optional)) – None, ‘10’ ‘20’
old_order (bool (optional)) – this function used to order the 20m imagery 2,3,4,5,6,7,11,12,8a if false ordered like this 2,3,4,5,6,7,8a,11,12
blocksize (int (optional)) – the chunk of jp2 to read in - glymur seems to work fastest with 2048
- Returns
A string of the output file path
- Return type
string
-
raster.stack_ras(rasterList, outFile)¶ Stack some rasters for change classification
- Parameters
rasterList (string) – the input image
outFile (string) – the output file path including file extension
-
raster.stat_comp(inRas, outMap, bandList=None, stat='percentile', q=95, blocksize=256, FMT=None, dtype=6)¶ Calculate depth wise stat on a multi band raster with selected or all bands
- Parameters
inRas (string) – input Raster
outMap (string) – the output raster calculated
stat (string) – the statisitc to be calculated make sure there are no nans as nan percentile is far too slow
blocksize (int) – the chunck processed
q (int) – the ith percentile if percentile is the stat used
FMT (string) – gdal compatible (optional) defaults is tif
dtype (string) – gdal datatype (default gdal.GDT_Int32)
-
raster.temporal_comp(fileList, outMap, stat='percentile', q=95, folder=None, blocksize=None, FMT=None, dtype=5)¶ Calculate an image beased on a time series collection of imagery (eg a years woth of S2 data)
- Parameters
FileList (list of strings) – the files to be inputed, if None a folder must be specified
outMap (string) – the output raster calculated
- statstring
the statisitc to be calculated
blocksize (int) – the chunck processed
q (int) – the ith percentile if percentile is the stat used
FMT (string) – gdal compatible (optional) defaults is tif
dtype (string) – gdal datatype (default gdal.GDT_Int32)
-
raster.tile_rasters(inImage, outputImage, tilesize)¶ Split a large raster into smaller ones
- Parameters
inImage (string) – the path to input raster
outputImage (string) – the path to the output image
tilesize (int) – the side of a square tile
geospatial_learn.learning module¶
the learning module
Description¶
The learning module set of functions provide a framework to optimise and classify EO data for both per pixel or object properties
-
learning.RF_oob_opt(model, X_train, min_est, max_est, step, regress=False)¶ This function uses the oob score to find the best parameters.
This cannot be parallelized due to the warm start bootstrapping, so is potentially slower than the other cross val in the create_model function
This function is based on an example from the sklearn site
This function plots a graph diplaying the oob rate
- Parameters
model (string (.gz)) – path to model to be saved
X_train (np array) – numpy array of training data where the 1st column is labels
min_est (int) – min no of trees
max_est (int) – max no of trees
step (int) – the step at which no of trees is increased
regress (bool) – boolean where if True it is a regressor
Returns (tuple of np arrays)
———————–
error rate, best estimator
-
learning.classify_object(model, inShape, attributes, field_name=None)¶ Classify a polygon/point file attributes (‘object based’) using an sklearn model
- Parameters
model (string) – path to input model
inShape (string) – input shapefile path (must be .shp for now….)
attributes (list of stings) – list of attributes names
field_name (string) – name of classified label field (optional)
-
learning.classify_pixel(model, inputDir, bands, outMap, probMap)¶ A function to classify an image using a pre-saved model - assumes a folder of tiled rasters for memory management - classify_pixel_block is recommended instead of this function
Parameters
- modelsklearn model
a path to a scikit learn model that has been saved
- inputDirstring
a folder with images to be classified
- bandsint
the no of image bands eg 8
- outMapstring
path to output image excluding the file format ‘pathto/mymap’
- probMapstring
path to output prob image excluding the file format ‘pathto/mymap’
- FMTstring
optional parameter - gdal readable fmt
-
learning.classify_pixel_bloc(model, inputImage, bands, outMap, blocksize=None, FMT=None, ndvi=None, dtype=5)¶ A block processing classifier for large rasters, supports KEA, HFA, & Gtiff formats. KEA is recommended, Gtiff is the default
- Parameters
model (sklearn model) – a path to a scikit learn model that has been saved
inputImage (string) – path to image including the file fmt ‘Myimage.tif’
bands (band) – the no of image bands eg 8
outMap (string) – path to output image excluding the file format ‘pathto/mymap’
FMT (string) – optional parameter - gdal readable fmt
blocksize (int (optional)) – size of raster chunck in pixels 256 tends to be quickest if you put None it will read size from gdal (this doesn’t always pay off!)
dtype (int (optional - gdal syntax gdal.GDT_Int32)) – a gdal dataype - default is int32
Notes
Block processing is sequential, but quite a few sklearn models are parallel so that has been prioritised rather than raster IO
-
learning.create_model(X_train, outModel, clf='svc', random=False, cv=6, cores=-1, strat=True, regress=False, params=None, scoring=None)¶ Brute force or random model creating using scikit learn. Either use the default params in this function or enter your own (recommended - see sklearn)
- Parameters
X_train (np array) – numpy array of training data where the 1st column is labels
outModel (string) – the output model path which is a gz file
clf (string) – an sklearn or xgb classifier/regressor logit, sgd, linsvc, svc, svm, nusvm, erf, rf, gb, xgb
random (bool) – if True, a random param search
cv (int) – no of folds
cores (int or -1 (default)) – the no of parallel jobs
strat (bool) – a stratified grid search
regress (bool) – a regression model if True, a classifier if False
params (a dict of model params (see scikit learn)) – enter your own params dict rather than the range provided
scoring (string) – a suitable sklearn scoring type (see notes)
There are more sophisticated ways to tune a model, this greedily searches everything but can be computationally costly. Fine tuning in a more measured way is likely better. There are numerous books, guides etc… E.g. with gb- first tune no of trees for gb, then learning rate, then tree specific
From my own experience and reading around
sklearn svms tend to be not great on large training sets and are slower with these (i have tried on HPCs and they time out on multi fits)
sklearn ‘gb’ is very slow to train, though quick to predict
xgb is much faster, but rather different in algorithmic detail - ie won’t produce same results as sklearn…
xgb also uses the sklearn wrapper params which differ from those in xgb docs, hence they are commented next to the area of code
Scoring types - there are a lot - some of which won’t work for multi-class, regression etc - see the sklearn docs!
‘accuracy’, ‘adjusted_rand_score’, ‘average_precision’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_samples’, ‘f1_weighted’, ‘neg_log_loss’, ‘neg_mean_absolute_error’, ‘neg_mean_squared_error’, ‘neg_median_absolute_error’, ‘precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_samples’, ‘precision_weighted’, ‘r2’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_samples’, ‘recall_weighted’, ‘roc_auc’
-
learning.create_model_tpot(X_train, outModel, cv=6, cores=-1, regress=False, params=None, scoring=None)¶ Create a model using the tpot library where genetic algorithms are used to optimise pipline and params.
This also supports xgboost incidentally
- Parameters
X_train (np array) – numpy array of training data where the 1st column is labels
outModel (string) – the output model path (which is a .py file) from which to run the pipeline
cv (int) – no of folds
cores (int or -1 (default)) – the no of parallel jobs
strat (bool) – a stratified grid search
regress (bool) – a regression model if True, a classifier if False
params (a dict of model params (see tpot)) – enter your own params dict rather than the range provided
scoring (string) – a suitable sklearn scoring type (see notes)
-
learning.get_training(inShape, inRas, bands, field, outFile=None)¶ Collect training as an np array for use with create model function
- Parameters
inShape (string) – the input shapefile - must be esri .shp at present
inRas (string) – the input raster from which the training is extracted
bands (int) – no of bands
field (string) – the attribute field containing the training labels
outFile (string (optional)) – path to the training file saved as joblib format (eg - ‘training.gz’)
- Returns
A tuple containing
-np array of training data
-list of polygons with invalid geometry that were not collected
-
learning.get_training_point(inShape, inRas, bands, field)¶ - Collect training as a np array for use with create model function using
point data
- Parameters
inShape (string) – the input shapefile - must be esri .shp at present
inRas (string) – the input raster from which the training is extracted
bands (int) – no of bands
field (string) – the attribute field containing the training labels
outFile (string (optional)) – path to the training file saved as joblib format (eg - ‘training.gz’)
- Returns
A tuple containing
-np array of training data
-list of polygons with invalid geometry that were not collected
UNFINISHED DO NOT USE
-
learning.get_training_shp(inShape, label_field, feat_fields, outFile=None)¶ Collect training from a shapefile attribute table. Used for object-based classification (typically).
- Parameters
inShape (string) – the input shapefile - must be esri .shp at present
label_field (string) – the field name for the class labels
feat_fields (list) – the field names of the feature data
outFile (string (optional)) – path to training data to be saved (.gz)
- Returns
training data as a dataframe, first column is labels, rest are features
list of reject features
-
learning.plot_feature_importances(modelPth, featureNames)¶ Plot the feature importances of an ensemble classifier
- Parameters
modelPth (string) – A sklearn model path
featureNames (list of strings) – a list of feature names
-
learning.prob_pixel_bloc(model, inputImage, bands, outMap, classes, blocksize=None, FMT=None, one_class=None)¶ A block processing classifier for large rasters that produces a probability, output.
Supports KEA, HFA, & Gtiff formats -KEA is recommended, Gtiff is the default
- Parameters
model (string) – a path to a scikit learn model that has been saved
inputImage (string) – path to image including the file fmt ‘Myimage.tif’
bands (int) – the no of image bands eg 8
outMap (string) – path to output image excluding the file format ‘pathto/mymap’
classes (int) – no of classes
blocksize (int (optional)) – size of raster chunck 256 tends to be quickest if you put None it will read size from gdal (this doesn’t always pay off!)
FMT (string) – optional parameter - gdal readable fmt eg ‘Gtiff’
one_class (int) – choose a single class to produce output prob raster
Block processing is sequential, but quite a few sklearn models are parallel so that has been prioritised rather than raster IO
-
learning.rmse_vector_lyr(inShape, attributes)¶ Using sklearn get the rmse of 2 vector attributes (the actual and predicted of course in the order [‘actual’, ‘pred’])
- Parameters
inShape (string) – the input vector of OGR type
attributes (list) – a list of strings denoting the attributes
geospatial_learn.shape module¶
The shape module.
Description¶
This module contains various functions for the writing of data in OGR vector formats. The functions are mainly concerned with writing geometric or pixel based attributes, with the view to them being classified in the learning module
-
shape.meshgrid(inRaster, outShp, gridHeight=1, gridWidth=1)¶
-
shape.ms_snake(inShp, inRas, outShp, band=2, buf1=0, buf2=0, algo='ACWE', nodata_value=0, iterations=200, smoothing=1, lambda1=1, lambda2=1, threshold='auto', balloon=-1)¶ Deform a polygon using active contours on the values of an underlying raster.
This uses morphsnakes and explanations are from there.
- Parameters
inShp (string) – input shapefile
inRas (string) – input raster
outShp (string) – output shapefile
band (int) – an integer val eg - 2
algo (string) – either “GAC” (geodesic active contours) or the default “ACWE” (active contours without edges)
buf1 (int) – the buffer if any in map units for the bounding box of the poly which extracts underlying pixel values.
buf2 (int) – the buffer if any in map units for the expansion or contraction of the poly which will initialise the active contour. This is here as you may wish to adjust the init polygon so it does not converge on a adjacent one or undesired area.
nodata_value (numerical) – If used the no data val of the raster
iterations (uint) – Number of iterations to run.
smoothing (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.
lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.
lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.
threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.
balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.
-
shape.ransac_lines(inRas, outRas, sigma=3, row=True, col=True, binwidth=40)¶
-
shape.shape_props(inShape, prop, inRas=None, label_field='ID')¶ Calculate various geometric properties of a set of polygons Output will be relative to geographic units where relevant, but normalised where not (eg Eccentricity)
- Parameters
inShape (string) – input shape file path
- inRasstring
a raster to get the correct dimensions from (optional), required for scikit-image props
- propstring
Scikit image regionprops prop (see http://scikit-image.org/docs/dev/api/skimage.measure.html)
OGR is used to generate most of these as it is faster but the string keys are same as scikit-image see notes for which require raster
Notes
Only shape file needed (OGR / shapely / numpy based)
‘MajorAxisLength’, ‘MinorAxisLength’, Area’, ‘Eccentricity’, ‘Solidity’, ‘Extent’: ‘Extent’, ‘Perimeter’: ‘Perim’
Raster required
- ‘Orientation’ and the remainder of props calcualble with scikit-image. These
process a bit slower than the above ones
-
shape.shp2gj(inShape, outJson)¶ Converts a geojson/json to a shapefile
- Parameters
inShape (string) – input shapefile
- outJsonstring
output geojson
Notes
Credit to person who posted this on the pyshp site
-
shape.snake(inShp, inRas, outShp, band=1, buf=1, nodata_value=0, boundary='fixed', alpha=0.1, beta=30.0, w_line=0, w_edge=0, gamma=0.01, max_iterations=2500, smooth=True, eq=False, rgb=False)¶ Deform a line using active contours based on the values of an underlying
raster - based on skimage at present so
not quick!
Notes
Param explanations for snake/active contour from scikit-image api
- Parameters
vector_path (string) – input shapefile
raster_path (string) – input raster
band (int) – an integer val eg - 2
buf (int) – the buffer area to include for the snake deformation
alpha (float) – Snake length shape parameter. Higher values makes snake contract faster.
beta (float) – Snake smoothness shape parameter. Higher values makes snake smoother.
w_line (float) – Controls attraction to brightness. Use negative values to attract toward dark regions.
w_edge (float) – Controls attraction to edges. Use negative values to repel snake from edges.
gamma (float) – Explicit time stepping parameter.
max_iterations (int) – No of iterations to evolve snake
boundary (string) – Scikit-image text: Boundary conditions for the contour. Can be one of ‘periodic’, ‘free’, ‘fixed’, ‘free-fixed’, or ‘fixed-free’. ‘periodic’ attaches the two ends of the snake, ‘fixed’ holds the end-points in place, and ‘free’ allows free movement of the ends. ‘fixed’ and ‘free’ can be combined by parsing ‘fixed-free’, ‘free-fixed’. Parsing ‘fixed-fixed’ or ‘free-free’ yields same behaviour as ‘fixed’ and ‘free’, respectively.
nodata_value (numerical) – If used the no data val of the raster
rgb (bool) – read in bands 1-3 assuming them to be RGB
-
shape.texture_stats(vector_path, raster_path, band, gprop='contrast', offset=2, angle=0, write_stat=None, nodata_value=0, mean=False)¶ Calculate and optionally write texture stats for an OGR compatible polygon based on underlying raster values
- Parameters
vector_path (string) – input shapefile
raster_path (string) – input raster path
gprop (string) – a skimage gclm property entropy, contrast, dissimilarity, homogeneity, ASM, energy, correlation
offset (int) – distance in pixels to measure - minimum of 2!!!
angle (int) – angle in degrees from pixel (int)
135 90 45 | /
c - 0
mean (bool) – take the mean of all offsets
Important to note that the results will be unreliable for glcm
texture features if seg is true as non-masked values will be zero or
some weird no data and will affect results
Notes
Important
The texture of the bounding box is at present the “relible” measure
Using the segment only results in potentially spurious results due to the scikit-image algorithm measuring texture over zero/nodata to number pixels (e.g 0>54). The segment part will be developed in due course to overcome this issue
-
shape.thresh_seg(inShp, inRas, outShp, band, buf=0, algo='otsu', min_area=4, nodata_value=0)¶ Use an image processing technique to threshold foreground and background in a polygon segment.
This default is otsu’s method.
- Parameters
vector_path (string) – input shapefile
raster_path (string) – input raster
band (int) – an integer val eg - 2
algo (string) – ‘otsu’, niblack, sauvola
nodata_value (numerical) – If used the no data val of the raster
-
shape.write_text_field(inShape, fieldName, attribute)¶ Write a string to a ogr vector file
- Parameters
inShape (string) – input OGR vecotr file
fieldName (string) – name of field being written
attribute (string) – ‘text to enter in each entry of column’
-
shape.zonal_rgb_idx(vector_path, raster_path, nodata_value=0)¶ Calculate RGB-based indicies per segment/AOI
- Parameters
vector_path (string) – input shapefile
raster_path (string) – input raster
nodata_value (numerical) – If used the no data val of the raster
-
shape.zonal_stats(vector_path, raster_path, band, bandname, stat='mean', write_stat=None, nodata_value=0)¶ Calculate zonal stats for an OGR polygon file
- Parameters
vector_path (string) – input shapefile
raster_path (string) – input raster
band (int) – an integer val eg - 2
bandname (string) – eg - blue
stat (string) – string of a stat to calculate, if omitted it will be ‘mean’ others: ‘mode’, ‘min’,’mean’,’max’, ‘std’,’ sum’, ‘count’,’var’, skew’, ‘kurt (osis)’
write_stat (bool (optional)) – If True, stat will be written to OGR file, if false, dataframe only returned (bool)
nodata_value (numerical) – If used the no data val of the raster
-
shape.zonal_stats_all(vector_path, raster_path, bandnames, statList=['mean', 'min', 'max', 'median', 'std', 'var', 'skew', 'kurt'])¶ Calculate zonal stats for an OGR polygon file
- Parameters
vector_path (string) – input shapefile
raster_path (string) – input raster
band (int) – an integer val eg - 2
bandnames (list) – eg - [‘b’,’g’,’r’,’nir’]
nodata_value (numerical) – If used the no data val of the raster
geospatial_learn.utilities module¶
Created on Thu Sep 8 22:35:39 2016 @author: Ciaran Robb The utilities module - things here don’t have an exact theme or home yet so may eventually move elsewhere
If you use code to publish work cite/acknowledge me and authors of libs etc as appropriate
-
utilities.accum_gabor(inRas, outRas=None, size=(9, 9), stdv=1, no_angles=16, wave_length=3, eccen=1, phase_off=0, pltgrid=(4, 4), blockproc=False)¶ Process with custom gabor filters and output an raster containing each kernel output as a band
- Parameters
inRas (string) – input raster
outRas (string) – output raster
size (tuple) – size of in gabor kernel in pixels (ksize)
stdv (int) – size of stdv / of of gabor kernel (sigma/stdv)
no_angles (int) – number of angles in gabor kernel (theta)
wave_length (int) – width of stripe in gabor kernel (lambda/wavelength)
phase_off (int) – the phase offset of the kernel
eccen (int) – the elipticity of the kernel when = 1 the gaussian envelope is circular
blocproc (bool) – whether to process in chunks - necessary for very large images!
-
utilities.colorscale(seg, prop)¶
-
utilities.combine_hough_seg(inRas1, inRas2, outRas, outShp, min_area=None)¶
-
utilities.get_corners(bboxes)¶ Get corners of bounding boxes
- Parameters
bboxes (numpy.ndarray) – Numpy array containing bounding boxes of shape N X 4 where N is the number of bounding boxes and the bounding boxes are represented in the format x1 y1 x2 y2
- Returns
Numpy array of shape N x 8 containing N bounding boxes each described by their corner co-ordinates x1 y1 x2 y2 x3 y3 x4 y4
- Return type
numpy.ndarray
-
utilities.get_enclosing_box(corners)¶ Get an enclosing box for ratated corners of a bounding box
- Parameters
corners (numpy.ndarray) – Numpy array of shape N x 8 containing N bounding boxes each described by their corner co-ordinates x1 y1 x2 y2 x3 y3 x4 y4
- Returns
Numpy array containing enclosing bounding boxes of shape N X 4 where N is the number of bounding boxes and the bounding boxes are represented in the format x1 y1 x2 y2
- Return type
numpy.ndarray
-
utilities.hough2line(inRas, outShp, edge='canny', sigma=2, thresh=None, ratio=2, n_orient=6, n_scale=5, hArray=True, vArray=True, prob=False, line_length=100, line_gap=200, valrange=1, interval=10, band=2, min_area=None)¶
-
utilities.image_thresh(image)¶
-
utilities.iter_ransac(image, sigma=3, no_iter=10, order='col', mxt=2500)¶
-
utilities.min_bound_rectangle(points)¶ Find the smallest bounding rectangle for a set of points. Returns a set of points representing the corners of the bounding box. :Parameters: points (list) – An nx2 iterable of points
- Returns
an nx2 list of coordinates
- Return type
list
-
utilities.ms_toposeg(inRas, outShp, iterations=100, algo='ACWE', band=2, dist=30, se=3, usemin=False, imtype=None, useedge=True, burnedge=False, merge=False, close=True, sigma=4, hi_t=None, low_t=None, init=4, smooth=1, lambda1=1, lambda2=1, threshold='auto', balloon=1)¶ Topology preserveing segmentation, implemented in python/nump inspired by ms_topo and morphsnakes
This uses morphsnakes level sets to make the segments and param explanations are mainly from there.
- Parameters
inSeg (string) – input segmentation raster
raster_path (string) – input raster whose pixel vals will be used
band (int) – an integer val eg - 2
algo (string) – either “GAC” (geodesic active contours) or “ACWE” (active contours without edges)
sigma (the size of stdv defining the gaussian envelope if using canny edge) – a unitless value
iterations (uint) – Number of iterations to run.
smooth (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.
lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.
lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.
threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.
balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.
-
utilities.ms_toposnakes(inSeg, inRas, outShp, iterations=100, algo='ACWE', band=2, sigma=4, alpha=100, smooth=1, lambda1=1, lambda2=1, threshold='auto', balloon=-1)¶ Topology preserveing morphsnakes, implemented in python/numpy exclusively by C.Robb
This uses morphsnakes and explanations are from there.
- Parameters
inSeg (string) – input segmentation raster
raster_path (string) – input raster whose pixel vals will be used
band (int) – an integer val eg - 2
algo (string) – either “GAC” (geodesic active contours) or “ACWE” (active contours without edges)
sigma (the size of stdv defining the gaussian envelope if using canny edge) – a unitless value
iterations (uint) – Number of iterations to run.
smooth (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.
lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.
lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.
threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.
balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.
-
utilities.ms_toposnakes2(inSeg, inRas, outShp, iterations=100, algo='ACWE', band=2, sigma=4, smooth=1, lambda1=1, lambda2=1, threshold='auto', balloon=-1)¶ Topology preserveing morphsnakes, implmented by Jirka Borovec version with C++/cython elements- credit to him!
This is memory intensive so large images will likely fill RAM and produces similar resuts to ms_toposnakes
This uses morphsnakes and explanations are from there.
- Parameters
inSeg (string) – input segmentation raster
raster_path (string) – input raster whose pixel vals will be used
band (int) – an integer val eg - 2
algo (string) – either “GAC” (geodesic active contours) or “ACWE” (active contours without edges)
sigma (the size of stdv defining the gaussian envelope if using canny edge) – a unitless value
iterations (uint) – Number of iterations to run.
smooth (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.
lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.
lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.
threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.
balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.
-
utilities.otbMeanshift(inputImage, radius, rangeF, minSize, outShape)¶ OTB meanshift by calling the otb command line Written for convenience and due to otb python api being rather verbose
There is a maximum size for the .shp format otb doesn’t seem to want to move beyond (2gb), so enormous rasters may need to be sub divided
You will need to install OTB etc seperately
- Parameters
inputImage (string) – the input image
radius (int) – the kernel radius
rangeF (int) – the kernel range
minSize (int) – minimum segment size
outShape (string) – the ouput shapefile
-
utilities.ragmerge(inSeg, inRas, outShp, band, thresh=0.02)¶
-
utilities.raster2array(inRas, bands=[1])¶ Read a raster and return an array, either single or multiband
- Parameters
inRas (string) – input raster
bands (list) – a list of bands to return in the array
-
utilities.rotate_box(corners, angle, cx, cy, h, w)¶ Rotate the bounding box.
- Parameters
corners (numpy.ndarray) – Numpy array of shape N x 8 containing N bounding boxes each described by their corner co-ordinates x1 y1 x2 y2 x3 y3 x4 y4
angle (float) – angle by which the image is to be rotated
cx (int) – x coordinate of the center of image (about which the box will be rotated)
cy (int) – y coordinate of the center of image (about which the box will be rotated)
h (int) – height of the image
w (int) – width of the image
- Returns
Numpy array of shape N x 8 containing N rotated bounding boxes each described by their corner co-ordinates x1 y1 x2 y2 x3 y3 x4 y4
- Return type
numpy.ndarray
-
utilities.rotate_im(image, angle)¶ Rotate the image.
Rotate the image such that the rotated image is enclosed inside the tightest rectangle. The area not occupied by the pixels of the original image is colored black.
- Parameters
image (numpy.ndarray) – numpy image
angle (float) – angle by which the image is to be rotated
- Returns
Rotated Image
- Return type
numpy.ndarray
-
utilities.spinim(self, img, bboxes)¶
-
utilities.temp_match(vector_path, raster_path, band, nodata_value=0, ind=None)¶ Based on polygons return template matched images
- Parameters
vector_path (string) – input shapefile
raster_path (string) – input raster
band (int) – an integer val eg - 2
nodata_value (numerical) – If used the no data val of the raster
ind (int) – The feature ID to use - if used this will use one feature and rotate it 90 for the second
- Returns
- Return type
list of template match arrays same size as input
-
utilities.test_gabor(im, size=9, freq=0.1, angle=None, funct='cos', plot=True, smooth=True, interp='none')¶ Process image with gabor filter bank of specified orientation or derived from image positive values bounding box - implemented from numpy with more intuitive params
This is the numpy based one
- Parameters
inRas (string) – input raster
size (int) – size of in gabor kernel in pixels (ksize)
freq (float)
- angles: int
number of angles in gabor kernel (theta)
-
utilities.test_gabor_cv2(im, size=9, stdv=1, angle=None, wave_length=3, eccen=1, phase_off=0, plot=True, smooth=True, interp='none')¶ Process image with gabor filter bank of specified orientation or derived from image positive values bounding box
This is the open cv based one
- Parameters
inRas (string) – input raster
size (int) – size of in gabor kernel in pixels (ksize)
stdv (int) – stdv / of of gabor kernel (sigma/stdv)
- angles: int
number of angles in gabor kernel (theta)
- wave_length: int
width of stripe in gabor kernel (lambda/wavelength) optional best to leave none and hence same as size
- phase_off: int
the phase offset of the kernel
- eccen: int
the elipticity of the kernel when = 1 the gaussian envelope is circular (gamma)
-
utilities.visual_callback_2d(background, fig=None)¶ Returns a callback than can be passed as the argument iter_callback of morphological_geodesic_active_contour and morphological_chan_vese for visualizing the evolution of the levelsets. Only works for 2D images.
- Parameters
background ((M, N) array) – Image to be plotted as the background of the visual evolution.
fig (matplotlib.figure.Figure) – Figure where results will be drawn. If not given, a new figure will be created.
- Returns
callback – A function that receives a levelset and updates the current plot accordingly. This can be passed as the iter_callback argument of morphological_geodesic_active_contour and morphological_chan_vese.
- Return type
Python function