geosnap.analyze.cluster

geosnap.analyze.cluster(gdf, n_clusters=6, method=None, best_model=False, columns=None, verbose=False, time_var='year', id_var='geoid', scaler='std', pooling='fixed', **kwargs)[source]

Create a geodemographic typology by running a cluster analysis on the study area’s neighborhood attributes.

Parameters
gdfgeopandas.GeoDataFrame, required

long-form GeoDataFrame containing neighborhood attributes

n_clustersint, required

the number of clusters to model. The default is 6).

methodstr in [‘kmeans’, ‘ward’, ‘affinity_propagation’, ‘spectral’,’gaussian_mixture’, ‘hdbscan’], required

the clustering algorithm used to identify neighborhood types

best_modelbool, optional

if using a gaussian mixture model, use BIC to choose the best n_clusters. (the default is False).

columnslist-like, required

subset of columns on which to apply the clustering

verbosebool, optional

whether to print warning messages (the default is False).

time_varstr, optional

which column on the dataframe defines time and or sequencing of the long-form data. Default is “year”

id_varstr, optional

which column on the long-form dataframe identifies the stable units over time. In a wide-form dataset, this would be the unique index

scalerNone or scaler from sklearn.preprocessing, optional

a scikit-learn preprocessing class that will be used to rescale the data. Defaults to sklearn.preprocessing.StandardScaler

pooling[“fixed”, “pooled”, “unique”], optional (default=’fixed’)

How to treat temporal data when applying scaling. Options include:

  • fixed : scaling is fixed to each time period

  • pooled : data are pooled across all time periods

  • unique : if scaling, apply the scaler to each time period, then generate clusters unique to each time period.

Returns
gdfgeopandas.GeoDataFrame

GeoDataFrame with a column of neighborhood cluster labels appended as a new column. If cluster method exists as a column on the DataFrame then the column will be incremented.

modelnamed tuple

A tuple with attributes X, columns, labels, instance, W, which store the input matrix, column labels, fitted model instance, and spatial weights matrix

model_namestr

name of model to be stored in a Community