Published October 22, 2020 | Version v1
Software Open

ClassNeRV

Authors/Creators

  • 1. CEA

Description

The code of ClassNeRV method and associated quality indicators described in the paper Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction by Benoît Colange, Jaakko Peltonen, Michaël Aupetit, Denys Dutykh and Sylvain Lespinats, accepted to NeurIPS conference in 2020, available here.

ClassNeRV provides an embedding of multidimensional data with additional class-information which preserves the neighbourhood structure while avoiding as much as possible to distort the classes structure.

To cite the paper:

@inproceedings{colange_steering_2020,
 author = {Colange, Beno\^{\i}t and Peltonen, Jaakko and Aupetit, Michael and Dutykh, Denys and Lespinats, Sylvain},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
 pages = {13214--13225},
 publisher = {Curran Associates, Inc.},
 title = {Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction},
 url = {https://proceedings.neurips.cc/paper/2020/file/99607461cdb9c26e2bd5f31b12dcf27a-Paper.pdf},
 volume = {33},
 year = {2020}
}

 

The dependencies of ClassNeRV code (using Python 3.8 or higher) may be installed with conda using the following commands:

conda install numpy=1.18.5
conda install matplotlib=3.2.2
conda install scipy=1.5.0
conda install scikit-learn=0.23.1

The dataset constituted of \(N\) data instances characterized by \(\delta\) features is stored in the \(N\times \delta \) numpy array data, while the class labels are stored in the \(N\times 1\) numpy array labels. Examples of datasets used in the paper may be loaded from the repository using the following Python code (where 'three_gaussian_clusters.mat' may be replaced by 'digits_true.mat' or 'digits_rand.mat'):

from scipy import io
data,labels=map(io.loadmat('three_gaussian_clusters.mat).get,['data','labels'])

 Then, an embedding of the data may be obtained using the following Python code. The  position of points in the embedding space is stored in the \(N\times d\) numpy array pos, where \(d\) is the dimensionality of the embedding space. 

from embedder import ClassNeRV
model=ClassNeRV(perplex=32,scale_out=None,tradeoff_intra=1,tradeoff_inter=0,dim_out=2)
pos=model.fit_transform(data,labels)

The parameters of ClassNeRV are:
-perplex: the perplexity \(p\) (equivalent to a number of neighbours of interest) for defining the data space scaling parameter \(\sigma_i\).
-scale_out: the embedding space scaling parameter \(s_i\) (if it is equal to None \(s_i=\sigma_i\), otherwise \(s_i\) is equal to the  specified value).
-tradeoff_intra: the within-class trade-off \(\tau^{\in}\), controlling the balance between false and missed neighbours within-class. A higher value of \(\tau^{\in}\) leads to more false neighbours and less missed neighbours. \(\tau^{\in}\) should be between 0 and 1, and higher than \(\tau^{\notin}\).
-tradeoff_inter: the between-class trade-off \(\tau^{\notin} \), controlling the balance between false and missed neighbours between-class. A lower value of \(\tau^{\notin}\) leads to more missed neighbours and less false neighbours. \(\tau^{\notin}\) should be between 0 and 1, and lower than \(\tau^{\in}\).
-dim_out: the embedding space dimensionality \(d\).

The resulting embedding of data (map) may be displayed with matplotlib, using the tab10 colour palette for the classes:

import matplotlib.pyplot as plt
fig,ax=plt.subplots()
ax.scatter(*pos.T,c=plt.get_cmap('tab10')(labels.flatten()))

 

Files

read_me.txt

Files (123.9 kB)

Name Size Download all
md5:c277faa81d5c01e91d6dff015540cdb8
35.6 kB Download
md5:fb487e80a59d7ee0bdf3e645f7a41d0a
33.5 kB Download
md5:49bd53856a079614b8617bd07f68910d
13.8 kB Download
md5:a0b11890ecc794dcfb388e86624b38d5
170 Bytes Download
md5:7bc2d5bc2945594812460eca0cd3fc20
13.0 kB Download
md5:01f94631b66e1b7853568b98ce6d26ea
4.3 kB Download
md5:3a7d20e4473d791c8f2d569429906622
865 Bytes Preview Download
md5:8bc4ca85fa18fee1ed917825de79f6ff
708 Bytes Download
md5:1455404bd71738f3e55d3f89ed46fe05
21.8 kB Download