geosnap.data.Community.sequence¶
-
Community.
sequence
(self, cluster_col, seq_clusters=5, subs_mat=None, dist_type=None, indel=None, time_var='year', id_var='geoid')[source]¶ Pairwise sequence analysis to evaluate the distance/dissimilarity between every two neighborhood sequences.
The sequence approach should be adopted after neighborhood segmentation since the column name of neighborhood labels is a required input.
- Parameters
- cluster_colstring or int
Column name for the neighborhood segmentation, such as “ward”, “kmeans”, etc.
- seq_clustersint, optional
Number of neighborhood sequence clusters. Agglomerative Clustering with Ward linkage is now used for clustering the sequences. Default is 5.
- subs_matarray
(k,k), substitution cost matrix. Should be hollow ( 0 cost between the same type), symmetric and non-negative.
- dist_typestring
“hamming”: hamming distance (substitution only and its cost is constant 1) from sklearn.metrics; “markov”: utilize empirical transition probabilities to define substitution costs; “interval”: differences between states are used to define substitution costs, and indel=k-1; “arbitrary”: arbitrary distance if there is not a strong theory guidance: substitution=0.5, indel=1. “tran”: transition-oriented optimal matching. Sequence of transitions. Based on [Bie11].
- indelfloat, optional
insertion/deletion cost.
- time_varstring, optional
Column defining time and or sequencing of the long-form data. Default is “year”.
- id_varstring, optional
Column identifying the unique id of spatial units. Default is “geoid”.