geosnap.analyze.cluster¶
-
geosnap.analyze.
cluster
(gdf, n_clusters=6, method=None, best_model=False, columns=None, verbose=False, time_var='year', id_var='geoid', scaler='std', pooling='fixed', **kwargs)[source]¶ Create a geodemographic typology by running a cluster analysis on the study area’s neighborhood attributes.
- Parameters
- gdf
geopandas.GeoDataFrame
, required long-form GeoDataFrame containing neighborhood attributes
- n_clusters
int
, required the number of clusters to model. The default is 6).
- method
str
in
[‘kmeans’, ‘ward’, ‘affinity_propagation’, ‘spectral’,’gaussian_mixture’, ‘hdbscan’], required the clustering algorithm used to identify neighborhood types
- best_modelbool, optional
if using a gaussian mixture model, use BIC to choose the best n_clusters. (the default is False).
- columnslist-like, required
subset of columns on which to apply the clustering
- verbosebool, optional
whether to print warning messages (the default is False).
- time_var
str
, optional which column on the dataframe defines time and or sequencing of the long-form data. Default is “year”
- id_var
str
, optional which column on the long-form dataframe identifies the stable units over time. In a wide-form dataset, this would be the unique index
- scaler
None
orscaler
fromsklearn.preprocessing
, optional a scikit-learn preprocessing class that will be used to rescale the data. Defaults to sklearn.preprocessing.StandardScaler
- pooling[“fixed”, “pooled”, “unique”], optional (default=’fixed’)
How to treat temporal data when applying scaling. Options include:
fixed : scaling is fixed to each time period
pooled : data are pooled across all time periods
unique : if scaling, apply the scaler to each time period, then generate clusters unique to each time period.
- gdf
- Returns
- gdf
geopandas.GeoDataFrame
GeoDataFrame with a column of neighborhood cluster labels appended as a new column. If cluster method exists as a column on the DataFrame then the column will be incremented.
- model
named
tuple
A tuple with attributes X, columns, labels, instance, W, which store the input matrix, column labels, fitted model instance, and spatial weights matrix
- model_name
str
name of model to be stored in a Community
- gdf