Nils Murrugarra Llerena;
Diane M. Napolitano;
Chee Wee Leong
The 1.0 release is finally here! It's been a little over a year since our first public release, and we're ready to say that SKLL is 1.0. Read our massive release notes:
We did make some API- and config-file-breaking changes. They are listed at the end of the release notes. They should all be addressable by a quick find-and-replace.
Fixed path problems in iris example (issue #103, PR #171)
Fixed bug where ablated_features field was incorrect when config file contained multiple feature sets (issue #125)
Fixed bug where CV would crash with rare classes (issue #109, PR #165)
Fixed issue where warning about extremely large feature values was being issued before rescaling
Fixed issue where some warning messages used mix of new-style and old-style replacement strings with old-style formatting.
Fixed a number of bugs with filtering FeatureSet objects and writing filtered sets to files.
Fixed bug in FeatureSet.__sub__ where feature names were being passed instead of indices.
Fixed issue where MegaMWriter could not print numbers in Python 2.7.
SKLL releases are now for specific versions of scikit-learn. 1.0.0 requires scikit-learn 0.15.2 (issue #138, PR #170)
Added tutorial to documentation that walks new users through using SKLL in much the same way as our PyData talks.
Added support for custom learners (issue #92, PR #183)
Added two command-line utilities, join_features and filter_features, for joining and filtering feature files. These replace join_megam and filter_megam (issue #79, PR #198)
Added support for specifying the field in ARFF, CSV, or TSV files that contains the IDs for each instance (issue #204, PR #206)
Added train/test set sizes to result files (issue #150, PR #161)
Added intercept to print_model_weights output (issue #155, PR #163)
Added total time and end time-stamp to experiment results (issue #91, PR #167)
Added exception when featureset_name is longer than 210 characters (issue #121, PR #168)
Added regression example data, boston (issue #162)
Added ability to specify number of grid search folds (issue #122, PR #175)
Added warning message when number of features in training model are different than those for FeatureSet passed to Learner.predict() (issue #145)
Added conda.yaml file to repository to make conda package creation simpler (issue #159, PR #173)
Added loads more unit tests, greatly increased unit test coverage, and generally cleaned up test modules (issues #97, #148, #157, #188, and #202; PRs #176, #184, #196, #203, and #205)
Added train_file and test_file fields to config files, which can be used to specify single file feature sets. This greatly simplifies running simple experiments (issue #12, PR #197)
Added support for merging feature sets with IDs in different orders (issue #149, PR #177)
Added ValueError when invalid tuning objective is specified (issues #117 and #179; PRs #174 and #181)
Added shuffle option to config files to decide whether training data should be shuffled before training. By default this is False, but if grid_search is True, we will automatically shuffle. Previously, the default was True, and there was no option in the config files. (issue #189, PR #190)
Updated documentation to indicate that we're using StratifiedKFold (issue #160)
Added FeatureSet.__eq__ and FeatureSet.__getitem__ methods.
Minor changes without issues
Updated docstrings all over the place to be more accurate.
Updated generate_predictions to use new Reader API.
Added argv optional argument to all utility script main functions to simplify testing.
Added mock tests, so SKLL now requires mock to work with Python 2.7.
Added prettier SVG badges to README.
Added link to Data Science at the Command Line to README.
LibSVMReader now converts UTF-8 replacement characters that are used by LibSVMWriter when a feature name contains an =, |, #, :, or back to the original ASCII characters.