ClusterSet¶
- class ClusterSet(clusters, times, linkage, distance_matrix, distance_metric, cluster_method, n_clusters_max, n_max_type)[source]¶
Base class for a set of clusters (partition) of timepoints
- Variables
clusters (list of int) – Clusters as a list of cluster labels
times (list of (int or float)) – Sorted list of time associated to each clustered snapshot
n_clusters (int) – Number of clusters in the cluster set (partition)
cluster_method (float) – Method used to cluster the snapshots . Examples : ‘k_means’, ‘centroid’, ‘average’, ‘complete’, ‘weighted’, ‘median’, ‘single’, ‘ward’
n_max_type (float) – Method that was used to determine when to stop clustering when creating this cluster set. e.g. A cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’).
n_max (int) – Value corresponding to the n_max_type described above.
distance_metric (str) – Distance metric used to compute the distance between snapshots, e.g. ‘euclidean’, with sklearn.metrics.pairwise.paired_distances. It must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter (e.g. ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘euclidean’, ‘hamming’, ‘jaccard’, etc.), or a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS.
- Parameters
clusters (list of int) – Clusters as a list of cluster labels
times (list of (int or float)) – Sorted list of time associated to each clustered snapshot
linkage – Linkage of the clustering
distance_matrix (phasik.DistanceMatrix) – Distance matrix from which the clusters were computed
cluster_method (float) – Method used to cluster the snapshots . Examples : k_means’, ‘centroid’, ‘average’, ‘complete’, ‘weighted’, ‘median’, ‘single’, ‘ward’
n_max_type (float) – Method that was used to determine when to stop clustering when creating this cluster set. e.g. A cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’).
n_clusters_max (int) – Value corresponding to the n_max_type described above.
distance_metric (str) – Distance metric used to compute the distance between snapshots, e.g. ‘euclidean’, with sklearn.metrics.pairwise.paired_distances. It must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter (e.g. ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘euclidean’, ‘hamming’, ‘jaccard’, etc.), or a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS.
- property cluster_method¶
Returns the clustering method used to cluster the temporal data
- property clusters¶
Returns the clusters, i.e. a list of cluster labels (int)
- property distance_metric¶
Returns the distance metric used to compute the distance between snapshots, e.g. ‘euclidean’
- distance_threshold()[source]¶
Calculate the distance at which clustering stops
- Parameters
None
- Returns
Smallest number d such that the distance between any two clusters is < d.
- Return type
int
- classmethod from_distance_matrix(distance_matrix, n_max_type, n_clusters_max, cluster_method)[source]¶
Generates a ClusterSet from a distance matrix
- Parameters
distance_matrix (phasik.DistanceMatrix) – Distance matrix from which to cluster
cluster_method (str) – Clustering method used to cluster the temporal network snapshots. Examples : ‘k_means’, ‘centroid’, ‘average’, ‘complete’, ‘weighted’, ‘median’, ‘single’, ‘ward’
n_max_type (str) – The method that determines when to stop clustering. For example, cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’).
n_clusters_max (int) – Value corresponding to the n_max_type described above.
- Returns
- Return type
- classmethod from_temporal_network(temporal_network, distance_metric, clustering_method, n_max_type, n_clusters_max)[source]¶
Generates a ClusterSet from a temporal network
- Parameters
temporal_network (TemporalNetwork) – Temporal network from which to compute the distance matrix
distance_metric (str) – Distance metric used to compute the distance between snapshots, e.g. ‘euclidean’, with sklearn.metrics.pairwise.paired_distances. It must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter (e.g. ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘euclidean’, ‘hamming’, ‘jaccard’, etc.), or a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS.
clustering_method (str) – Clustering method used to cluster the temporal network snapshots. Examples : ‘k_means’, ‘centroid’, ‘average’, ‘complete’, ‘weighted’, ‘median’, ‘single’, ‘ward’
n_max_type (str) – The method that determines when to stop clustering. For example, cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’).
n_clusters_max (int) – Value corresponding to the n_max_type described above.
- Returns
- Return type
- property n_max¶
Returns the value corresponding to the n_max_type described above.
- property n_max_type¶
Returns the method (str) that determines when to stop clustering
- plot(ax=None, y_height=0, cmap=<matplotlib.colors.ListedColormap object>, number_of_colors=10, colors=None)[source]¶
Plots this cluster set as a scatter graph
- Parameters
ax (matplotlib.Axes, optional) – Axes on which to plot
y_height (int or float, optional) – Height at which to plot (default 0)
cmap (matplotlib.cm, optional) – Desired colour map (default ‘tab10’)
number_of_colors (int, optional) – Desired number of colours to use for the colormap (default 10)
colors
- Returns
- Return type
None
- plot_dendrogram(ax=None, distance_threshold=None, leaf_rotation=90, leaf_font_size=6)[source]¶
Plot this cluster set as a dendrogram
- Parameters
ax (matplotlib.Axes, optional) – Axes on which to plot
leaf_rotation (int or float, optional) – Rotation to apply to the x-axis (leaf) labels (default 90)
leaf_font_size (int or str, optional) – Desired size of the x-axis (leaf) labels (default 6)
- Returns
- Return type
None
- plot_silhouette_samples(ax=None)[source]¶
Plot the silhouette samples from this cluster set
- Parameters
ax (matplotlib.Axes, optional) – Axes on which to plot
- Returns
- Return type
None
- property times¶
Returns the list of times corresponding to datapoints clustered
ClusterSets¶
- class ClusterSets(cluster_sets, n_max_type, ns_max)[source]¶
Base class for sets of clusters (partition) of timepoints
- Variables
cluster_sets (iterable of phasik.ClusterSet) – List of ClusterSets
clusters (numpy array of int) – Summary array of the cluster labels, with dim (len(ns_max), len(times))
n_clusters (list of int) – Number of clusters in the cluster set (partition)
times (list of (int or float)) – Sorted list of time associated to each clustered snapshot
distance_metric (str) – Distance metric used to compute the distance between snapshots, e.g. ‘euclidean’, with sklearn.metrics.pairwise.paired_distances. It must be one of the options allowed by scipy.spatial.distance.pdist for its metric parameter (e.g. ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘euclidean’, ‘hamming’, ‘jaccard’, etc.), or a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS.
n_max_type (float) – Method that was used to determine when to stop clustering when creating this cluster set. e.g. A cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’).
ns_max (list of int) – List of values corresponding to the n_max_type described above, in other words, list of numbers clusters to be computed. The number of elements in this list is the number of ClusterSet computed.
silhouettes_average (numpy array) – Value of average silouette for each clustering
- Parameters
cluster_sets (iterable of ClusterSet)
n_max_type (str) – Method that was used to determine when to stop clustering when creating these cluster sets. e.g. A cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’)
ns_max (list of int) – List of values corresponding to the n_max_type described above, in other words, list of numbers clusters to be computed. The number of elements in this list is the number of ClusterSet computed.
- property clusters_sets¶
Returns the list of ClusterSet
- classmethod from_distance_matrix(distance_matrix, n_max_type, ns_clusters_max, cluster_method)[source]¶
Generates ClusterSets from a distance matrix
- Parameters
distance_matrix (phasik.DistanceMatrix) – Distance matrix from which to cluster
cluster_method (str) – Clustering method used to cluster the temporal network snapshots. Examples : ‘k_means’, ‘centroid’, ‘average’, ‘complete’, ‘weighted’, ‘median’, ‘single’, ‘ward’
n_max_type (str) – The method that determines when to stop clustering. For example, cluster set can be created by clustering until a particular number of clusters has been reached (‘maxclust’), or until every cluster is at least a certain distance away from each other (‘distance’).
ns_clusters_max (list of int) – List of values corresponding to the n_max_type described above, in other words, list of numbers clusters to be computed. The number of elements in this list is the number of ClusterSet computed.
- Returns
- Return type
- plot(axs=None, coloring='consistent', with_silhouettes=False, with_n_clusters=False)[source]¶
Plots these cluster sets as a scatter graph
- Parameters
ax (matplotlib.Axes, optional) – Axes on which to plot
coloring ({‘ascending’, ‘consistent’, None})
with_silhouettes (bool) – If True, also plot the average silhouettes on a 2nd axis. Defaults to False.
with_n_clusters (bool) – If True, also plot the actual number of clusters on a 3rd axis. Defaults to False.
- Returns
- Return type
None
- plot_and_format_with_average_silhouettes(axs, events, phases, time_ticks=None, coloring='consistent')[source]¶
Plot and format these cluster sets as a scatter graph, along with the average silhouettes and cluster set sizes
Our pattern generally has been to leave all formatting in the jupyter notebooks, but this method is used by several different notebooks, so it makes sense to put it somewhere common.
- Parameters
axs (list of matplotlib.Axes) – Axes on which to plot; should be an indexable object with at least three items
events – Any events that should be plotted on the scatter graph
phases – Any phases that should be plotted on the scatter graph
time_ticks (list or array) – The ticks that should be displayed along the x-axis (time axis)
- Returns
- Return type
None
- plot_silhouette_samples(axs, coloring='consistent')[source]¶
Plot the average silhouettes across this range of cluster sets
- Parameters
axs (list of matplotlib.Axes) – Axes on which to plot; should be an iterable object with at least as many items as there are cluster sets in this class.
- Returns
- Return type
None