Yellowbrick v0.4
Authors/Creators
- 1. District Data Labs
Description
Yellowbrick is an open source, pure Python project that extends the scikit-learn API with visual analysis and diagnostic tools. The Yellowbrick API also wraps matplotlib to create publication-ready figures and interactive data explorations while still allowing developers fine-grain control of figures. For users, Yellowbrick can help evaluate the performance, stability, and predictive value of machine learning models and assist in diagnosing problems throughout the machine learning workflow.
Changes
This release is the culmination of the Spring 2017 DDL Research Labs that focused on developing Yellowbrick as a community effort guided by a sprint/agile workflow. We added several more visualizers, did a lot of user testing and bug fixes, updated the documentation, and generally discovered how best to make Yellowbrick a friendly project to contribute to.
Notable in this release is the inclusion of two new feature visualizers that use few, simple dimensions to visualize features against the target. The JointPlotVisualizer graphs a scatter plot of two dimensions in the data set and plots a best fit line across it. The ScatterVisualizer also uses two features, but also colors the graph by the target variable, adding a third dimension to the visualization.
This release also adds support for clustering visualizations, namely the elbow method for selecting K, KElbowVisualizer and a visualization of cluster size and density using the SilhouetteVisualizer. The release also adds support for regularization analysis using the AlphaSelection visualizer. Both the text and classification modules were also improved with the inclusion of the PosTagVisualizerand the ConfusionMatrix visualizer respectively.
This release also added an Anaconda repository and distribution so that users can conda installyellowbrick. Even more notable, we got yellowbrick stickers! We've also updated the documentation to make it more friendly and a bit more visual; fixing the API rendering errors. All-in-all, this was a big release with a lot of contributions and we thank everyone that participated in the lab!
The FreqDistVisualizer implements frequency distribution plot that tells us the frequency of each vocabulary item in the text. In general, it could count any kind of observable event. It is a distribution because it tells us how the total number of word tokens in the text are distributed across the vocabulary items.
- Part of speech tags visualizer --
PosTagVisualizer. - Alpha selection visualizer for regularized regression --
AlphaSelection - Confusion Matrix Visualizer --
ConfusionMatrix - Elbow method for selecting K vis --
KElbowVisualizer - Silhouette score cluster visualization --
SilhouetteVisualizer - Joint plot visualizer with best fit --
JointPlotVisualizer - Scatter visualization of features --
ScatterVisualizer - Added three more example datasets: mushroom, game, and bike share
- Contributor's documentation and style guide
- Maintainers listing and contacts
- Light/Dark background color selection utility
- Structured array detection utility
- Updated classification report to use colormesh
- Added anacondas packaging and distribution
- Refactoring of the regression, cluster, and classification modules
- Image based testing methodology
- Docstrings updated to a uniform style and rendering
- Submission of several more user studies
Files
yellowbrick-0.4.zip
Files
(11.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:0deb01696d951170649d377a5720416e
|
11.8 MB | Preview Download |
Additional details
Related works
- Is documented by
- http://www.scikit-yb.org/en/stable/ (URL)
- Is supplemented by
- https://github.com/DistrictDataLabs/yellowbrick/releases/tag/v0.4 (URL)