geosnap.analyze.cluster_spatial¶
-
geosnap.analyze.
cluster_spatial
(gdf, n_clusters=6, spatial_weights='rook', method=None, columns=None, threshold_variable='count', threshold=10, time_var='year', id_var='geoid', scaler='std', weights_kwargs=None, **kwargs)[source]¶ Create a spatial geodemographic typology by running a cluster analysis on the metro area’s neighborhood attributes and including a contiguity constraint.
- Parameters
- gdf
geopandas.GeoDataFrame
long-form geodataframe holding neighborhood attribute and geometry data.
- n_clusters
int
the number of clusters to model. The default is 6).
- spatial_weights[‘queen’, ‘rook’] or
libpysal.weights.W
object
spatial weights matrix specification`. By default, geosnap will calculate Rook weights, but you can also pass a libpysal.weights.W object for more control over the specification.
- method
str
in
[‘ward_spatial’, ‘spenc’, ‘skater’, ‘azp’, ‘max_p’] the clustering algorithm used to identify neighborhood types
- columnsarray_like
subset of columns on which to apply the clustering
- threshold_variable
str
for max-p, which variable should define p. The default is “count”, which will grow regions until the threshold number of polygons have been aggregated
- threshold
numeric
threshold to use for max-p clustering (the default is 10).
- time_var
str
which column on the dataframe defines time and or sequencing of the long-form data. Default is “year”
- id_var
str
which column on the long-form dataframe identifies the stable units over time. In a wide-form dataset, this would be the unique index
- weights_kwargs
dict
If passing a libpysal.weights.W instance to spatial_weights, these additional keyword arguments that will be passed to the weights constructor
- scaler
None
orscaler
class
fromsklearn.preprocessing
a scikit-learn preprocessing class that will be used to rescale the data. Defaults to sklearn.preprocessing.StandardScaler
- gdf
- Returns
- gdf
geopandas.GeoDataFrame
GeoDataFrame with a column of neighborhood cluster labels appended as a new column. If cluster method exists as a column on the DataFrame then the column will be incremented.
- models
dict
ofnamed
tuples
tab-completable dictionary of named tuples keyed on the Community’s time variable (e.g. year). The tuples store model results and have attributes X, columns, labels, instance, W, which store the input matrix, column labels, fitted model instance, and spatial weights matrix
- model_name
str
name of model to be stored in a Community
- gdf