geosnap.analyze.cluster_spatial¶
-
geosnap.analyze.
cluster_spatial
(gdf, n_clusters=6, spatial_weights='rook', method=None, columns=None, threshold_variable='count', threshold=10, time_var='year', id_var='geoid', return_model=False, scaler=None, **kwargs)[source]¶ Create a spatial geodemographic typology by running a cluster analysis on the metro area’s neighborhood attributes and including a contiguity constraint.
- Parameters
- gdfgeopandas.GeoDataFrame
long-form geodataframe holding neighborhood attribute and geometry data.
- n_clustersint
the number of clusters to model. The default is 6).
- weights_typestr ‘queen’ or ‘rook’
spatial weights matrix specification` (the default is “rook”).
- methodstr
the clustering algorithm used to identify neighborhood types
- columnslist-like
subset of columns on which to apply the clustering
- threshold_variablestr
for max-p, which variable should define p. The default is “count”, which will grow regions until the threshold number of polygons have been aggregated
- thresholdnumeric
threshold to use for max-p clustering (the default is 10).
- time_var: str
which column on the dataframe defines time and or sequencing of the long-form data. Default is “year”
- id_var: str
which column on the long-form dataframe identifies the stable units over time. In a wide-form dataset, this would be the unique index
- scaler: str or sklearn.preprocessing.Scaler
a scikit-learn preprocessing class that will be used to rescale the data. Defaults to StandardScaler
- Returns
- geopandas.GeoDataFrame with a column of neighborhood cluster labels
- appended as a new column. Will overwrite columns of the same name.