geosnap.analyze.cluster¶
-
geosnap.analyze.
cluster
(gdf, n_clusters=6, method=None, best_model=False, columns=None, verbose=False, time_var='year', id_var='geoid', return_model=False, scaler=None, **kwargs)[source]¶ - Create a geodemographic typology by running a cluster analysis on the
study area’s neighborhood attributes
- Parameters
- gdfpandas.DataFrame
long-form (geo)DataFrame containing neighborhood attributes
- n_clustersint
the number of clusters to model. The default is 6).
- methodstr
the clustering algorithm used to identify neighborhood types
- best_modelbool
if using a gaussian mixture model, use BIC to choose the best n_clusters. (the default is False).
- columnslist-like
subset of columns on which to apply the clustering
- verbosebool
whether to print warning messages (the default is False).
- time_var: str
which column on the dataframe defines time and or sequencing of the long-form data. Default is “year”
- id_var: str
which column on the long-form dataframe identifies the stable units over time. In a wide-form dataset, this would be the unique index
- scaler: str or sklearn.preprocessing.Scaler
a scikit-learn preprocessing class that will be used to rescale the data. Defaults to StandardScaler
- Returns
- pandas.DataFrame with a column of neighborhood cluster labels appended
- as a new column. Will overwrite columns of the same name.