geosnap.Community.sequence¶
-
Community.
sequence
(self, cluster_col, seq_clusters=5, subs_mat=None, dist_type=None, indel=None, time_var='year', id_var='geoid')[source]¶ Pairwise sequence analysis to evaluate the distance/dissimilarity between every two neighborhood sequences.
The sequence approach should be adopted after neighborhood segmentation since the column name of neighborhood labels is a required input.
- Parameters
- cluster_col
str
orint
Column name for the neighborhood segmentation, such as “ward”, “kmeans”, etc.
- seq_clusters
int
, optional Number of neighborhood sequence clusters. Agglomerative Clustering with Ward linkage is now used for clustering the sequences. Default is 5.
- subs_mat
array
(k,k), substitution cost matrix. Should be hollow ( 0 cost between the same type), symmetric and non-negative.
- dist_type
str
“hamming”: hamming distance (substitution only and its cost is constant 1) from sklearn.metrics; “markov”: utilize empirical transition probabilities to define substitution costs; “interval”: differences between states are used to define substitution costs, and indel=k-1; “arbitrary”: arbitrary distance if there is not a strong theory guidance: substitution=0.5, indel=1. “tran”: transition-oriented optimal matching. Sequence of transitions. Based on [Bie11].
- indel
float
, optional insertion/deletion cost.
- time_var
str
, optional Column defining time and or sequencing of the long-form data. Default is “year”.
- id_var
str
, optional Column identifying the unique id of spatial units. Default is “geoid”.
- cluster_col
- Returns
- gdf_new
Community
instance New Community instance with attribute “gdf” having a new column for sequence labels.
- df_wide
pandas.DataFrame
Wide-form DataFrame with k (k is the number of periods) columns of neighborhood types and 1 column of sequence labels.
- seq_dis_mat
array
(n,n), distance/dissimilarity matrix for each pair of sequences
- gdf_new