ClassNeRV
Description
The code of ClassNeRV method and associated quality indicators described in the paper Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction by Benoît Colange, Jaakko Peltonen, Michaël Aupetit, Denys Dutykh and Sylvain Lespinats, accepted to NeurIPS conference in 2020, available here.
ClassNeRV provides an embedding of multidimensional data with additional class-information which preserves the neighbourhood structure while avoiding as much as possible to distort the classes structure.
To cite the paper:
@inproceedings{colange_steering_2020,
author = {Colange, Beno\^{\i}t and Peltonen, Jaakko and Aupetit, Michael and Dutykh, Denys and Lespinats, Sylvain},
booktitle = {Advances in Neural Information Processing Systems},
editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
pages = {13214--13225},
publisher = {Curran Associates, Inc.},
title = {Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction},
url = {https://proceedings.neurips.cc/paper/2020/file/99607461cdb9c26e2bd5f31b12dcf27a-Paper.pdf},
volume = {33},
year = {2020}
}
The dependencies of ClassNeRV code (using Python 3.8 or higher) may be installed with conda using the following commands:
conda install numpy=1.18.5
conda install matplotlib=3.2.2
conda install scipy=1.5.0
conda install scikit-learn=0.23.1
The dataset constituted of \(N\) data instances characterized by \(\delta\) features is stored in the \(N\times \delta \) numpy array data, while the class labels are stored in the \(N\times 1\) numpy array labels. Examples of datasets used in the paper may be loaded from the repository using the following Python code (where 'three_gaussian_clusters.mat' may be replaced by 'digits_true.mat' or 'digits_rand.mat'):
from scipy import io
data,labels=map(io.loadmat('three_gaussian_clusters.mat).get,['data','labels'])
Then, an embedding of the data may be obtained using the following Python code. The position of points in the embedding space is stored in the \(N\times d\) numpy array pos, where \(d\) is the dimensionality of the embedding space.
from embedder import ClassNeRV
model=ClassNeRV(perplex=32,scale_out=None,tradeoff_intra=1,tradeoff_inter=0,dim_out=2)
pos=model.fit_transform(data,labels)
The parameters of ClassNeRV are:
-perplex: the perplexity \(p\) (equivalent to a number of neighbours of interest) for defining the data space scaling parameter \(\sigma_i\).
-scale_out: the embedding space scaling parameter \(s_i\) (if it is equal to None \(s_i=\sigma_i\), otherwise \(s_i\) is equal to the specified value).
-tradeoff_intra: the within-class trade-off \(\tau^{\in}\), controlling the balance between false and missed neighbours within-class. A higher value of \(\tau^{\in}\) leads to more false neighbours and less missed neighbours. \(\tau^{\in}\) should be between 0 and 1, and higher than \(\tau^{\notin}\).
-tradeoff_inter: the between-class trade-off \(\tau^{\notin} \), controlling the balance between false and missed neighbours between-class. A lower value of \(\tau^{\notin}\) leads to more missed neighbours and less false neighbours. \(\tau^{\notin}\) should be between 0 and 1, and lower than \(\tau^{\in}\).
-dim_out: the embedding space dimensionality \(d\).
The resulting embedding of data (map) may be displayed with matplotlib, using the tab10 colour palette for the classes:
import matplotlib.pyplot as plt
fig,ax=plt.subplots()
ax.scatter(*pos.T,c=plt.get_cmap('tab10')(labels.flatten()))
Files
read_me.txt
Files
(123.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c277faa81d5c01e91d6dff015540cdb8
|
35.6 kB | Download |
|
md5:fb487e80a59d7ee0bdf3e645f7a41d0a
|
33.5 kB | Download |
|
md5:49bd53856a079614b8617bd07f68910d
|
13.8 kB | Download |
|
md5:a0b11890ecc794dcfb388e86624b38d5
|
170 Bytes | Download |
|
md5:7bc2d5bc2945594812460eca0cd3fc20
|
13.0 kB | Download |
|
md5:01f94631b66e1b7853568b98ce6d26ea
|
4.3 kB | Download |
|
md5:3a7d20e4473d791c8f2d569429906622
|
865 Bytes | Preview Download |
|
md5:8bc4ca85fa18fee1ed917825de79f6ff
|
708 Bytes | Download |
|
md5:1455404bd71738f3e55d3f89ed46fe05
|
21.8 kB | Download |