Per Class Feature Importance
Authors/Creators
Description
Per Class Feature Importance (PCFI) for Classifiers based on Decision Tree Models
Machine learning models are accurate and effective, but often difficult to interpret. Here, we define a method to explain decision tree-based classifiers named Per Class Feature Importance (PCFI), offering an alternative to SHAP values. As the name suggests, PCFI builds upon the default feature importance, which is calculated from the mean impurity decrease of the feature's splits in the decision tree.
Our aim is to provide a quantitative description for the contribution of each feature to the prediction of one specific class label, thus returning a more detailed description on the model structure and handling of the input data.
The underlying idea is simple. The importance of a given feature to a certain class is the sum of the mean impurity decreases from those (and only those) nodes splitting on that feature and leading to a leaf node that predicts such class (thinking the model as a directed tree graph only with paths from the initial root node to the leaves). In particular, the mean impurity decreases in one node is calculated as if the variables in had a binary label: 1 if the variable belongs to the class under consideration, 0 otherwise. In the case of ensemble methods, such as Random Forest Classifiers, the PCFI is obtained averaging the importance, for one combination of class and feature, across all trees in the ensemble.
In this way, PCFI coincides with the default feature importance in the case of binary classification. Moreover, each vector of feature importances, associated to a given class, effectively provides a measure of how well the model can discriminate such class from the others.
A measure of importance that considers each class separately is particularly relevant to imbalanced datasets, as features that are specific to rare classes would not emerge with the default feature importance. Of note, the latter only provides a ?global? view of the model.
In fact, PCFI was inspired during the analysis of biology data, where rare classes (e.g. different cell types) often are the main subject of investigation. This is often the case in cell type discovery studies, or during the analyses of rare cell types as hematopoietic stem cells in the murine bone marrow.
Files
Pcfi-master (1).zip
Files
(666.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:f21d5984697b5accc0001a897aeeb0b3
|
666.5 kB | Preview Download |
Additional details
Identifiers
- Other
- https://github.com/GiulioPr/Pcfi
Software
- Repository URL
- https://github.com/GiulioPr/Pcfi
- Programming language
- Python , Jupyter Notebook