geosnap.Community.sequence

Community.sequence(self, cluster_col, seq_clusters=5, subs_mat=None, dist_type=None, indel=None, time_var='year', id_var='geoid')[source]

Pairwise sequence analysis to evaluate the distance/dissimilarity between every two neighborhood sequences.

The sequence approach should be adopted after neighborhood segmentation since the column name of neighborhood labels is a required input.

Parameters
cluster_colstr or int

Column name for the neighborhood segmentation, such as “ward”, “kmeans”, etc.

seq_clustersint, optional

Number of neighborhood sequence clusters. Agglomerative Clustering with Ward linkage is now used for clustering the sequences. Default is 5.

subs_matarray

(k,k), substitution cost matrix. Should be hollow ( 0 cost between the same type), symmetric and non-negative.

dist_typestr

“hamming”: hamming distance (substitution only and its cost is constant 1) from sklearn.metrics; “markov”: utilize empirical transition probabilities to define substitution costs; “interval”: differences between states are used to define substitution costs, and indel=k-1; “arbitrary”: arbitrary distance if there is not a strong theory guidance: substitution=0.5, indel=1. “tran”: transition-oriented optimal matching. Sequence of transitions. Based on [Bie11].

indelfloat, optional

insertion/deletion cost.

time_varstr, optional

Column defining time and or sequencing of the long-form data. Default is “year”.

id_varstr, optional

Column identifying the unique id of spatial units. Default is “geoid”.

Returns
gdf_newCommunity instance

New Community instance with attribute “gdf” having a new column for sequence labels.

df_widepandas.DataFrame

Wide-form DataFrame with k (k is the number of periods) columns of neighborhood types and 1 column of sequence labels.

seq_dis_matarray

(n,n), distance/dissimilarity matrix for each pair of sequences