geosnap.analyze.analytics module

Tools for the spatial analysis of neighborhood change.

geosnap.analyze.analytics.cluster(gdf, n_clusters=6, method=None, best_model=False, columns=None, verbose=False, time_var='year', id_var='geoid', return_model=False, scaler=None, **kwargs)[source]
Create a geodemographic typology by running a cluster analysis on the

study area’s neighborhood attributes

Parameters
gdfpandas.DataFrame

long-form (geo)DataFrame containing neighborhood attributes

n_clustersint

the number of clusters to model. The default is 6).

methodstr

the clustering algorithm used to identify neighborhood types

best_modelbool

if using a gaussian mixture model, use BIC to choose the best n_clusters. (the default is False).

columnslist-like

subset of columns on which to apply the clustering

verbosebool

whether to print warning messages (the default is False).

time_var: str

which column on the dataframe defines time and or sequencing of the long-form data. Default is “year”

id_var: str

which column on the long-form dataframe identifies the stable units over time. In a wide-form dataset, this would be the unique index

scaler: str or sklearn.preprocessing.Scaler

a scikit-learn preprocessing class that will be used to rescale the data. Defaults to StandardScaler

Returns
pandas.DataFrame with a column of neighborhood cluster labels appended
as a new column. Will overwrite columns of the same name.
geosnap.analyze.analytics.cluster_spatial(gdf, n_clusters=6, spatial_weights='rook', method=None, columns=None, threshold_variable='count', threshold=10, time_var='year', id_var='geoid', return_model=False, scaler=None, **kwargs)[source]

Create a spatial geodemographic typology by running a cluster analysis on the metro area’s neighborhood attributes and including a contiguity constraint.

Parameters
gdfgeopandas.GeoDataFrame

long-form geodataframe holding neighborhood attribute and geometry data.

n_clustersint

the number of clusters to model. The default is 6).

weights_typestr ‘queen’ or ‘rook’

spatial weights matrix specification` (the default is “rook”).

methodstr

the clustering algorithm used to identify neighborhood types

columnslist-like

subset of columns on which to apply the clustering

threshold_variablestr

for max-p, which variable should define p. The default is “count”, which will grow regions until the threshold number of polygons have been aggregated

thresholdnumeric

threshold to use for max-p clustering (the default is 10).

time_var: str

which column on the dataframe defines time and or sequencing of the long-form data. Default is “year”

id_var: str

which column on the long-form dataframe identifies the stable units over time. In a wide-form dataset, this would be the unique index

scaler: str or sklearn.preprocessing.Scaler

a scikit-learn preprocessing class that will be used to rescale the data. Defaults to StandardScaler

Returns
geopandas.GeoDataFrame with a column of neighborhood cluster labels
appended as a new column. Will overwrite columns of the same name.