There is a newer version of this record available.

Software Open Access

SciKit-Learn Laboratory (SKLL) 1.0.0

Dan Blanchard; Nitin Madnani; Michael Heilman; Nils Murrugarra Llerena; Diane M. Napolitano; Aoife Cahill; Keelan Evanini; Chee Wee Leong

The 1.0 release is finally here! It's been a little over a year since our first public release, and we're ready to say that SKLL is 1.0. Read our massive release notes:

We did make some API- and config-file-breaking changes. They are listed at the end of the release notes. They should all be addressable by a quick find-and-replace.

Bug fixes

  • Fixed path problems in iris example (issue #103, PR #171)
  • Fixed bug where ablated_features field was incorrect when config file contained multiple feature sets (issue #125)
  • Fixed bug where CV would crash with rare classes (issue #109, PR #165)
  • Fixed issue where warning about extremely large feature values was being issued before rescaling
  • Fixed issue where some warning messages used mix of new-style and old-style replacement strings with old-style formatting.
  • Fixed a number of bugs with filtering FeatureSet objects and writing filtered sets to files.
  • Fixed bug in FeatureSet.__sub__ where feature names were being passed instead of indices.
  • Fixed issue where MegaMWriter could not print numbers in Python 2.7.

New features

  • SKLL releases are now for specific versions of scikit-learn. 1.0.0 requires scikit-learn 0.15.2 (issue #138, PR #170)
  • Added tutorial to documentation that walks new users through using SKLL in much the same way as our PyData talks.
  • Added support for custom learners (issue #92, PR #183)
  • Added two command-line utilities, join_features and filter_features, for joining and filtering feature files. These replace join_megam and filter_megam (issue #79, PR #198)
  • Added support for specifying the field in ARFF, CSV, or TSV files that contains the IDs for each instance (issue #204, PR #206)
  • Added train/test set sizes to result files (issue #150, PR #161)
  • Added intercept to print_model_weights output (issue #155, PR #163)
  • Added total time and end time-stamp to experiment results (issue #91, PR #167)
  • Added exception when featureset_name is longer than 210 characters (issue #121, PR #168)
  • Added regression example data, boston (issue #162)
  • Added ability to specify number of grid search folds (issue #122, PR #175)
  • Added warning message when number of features in training model are different than those for FeatureSet passed to Learner.predict() (issue #145)
  • Added conda.yaml file to repository to make conda package creation simpler (issue #159, PR #173)
  • Added loads more unit tests, greatly increased unit test coverage, and generally cleaned up test modules (issues #97, #148, #157, #188, and #202; PRs #176, #184, #196, #203, and #205)
  • Added train_file and test_file fields to config files, which can be used to specify single file feature sets. This greatly simplifies running simple experiments (issue #12, PR #197)
  • Added support for merging feature sets with IDs in different orders (issue #149, PR #177)
  • Added ValueError when invalid tuning objective is specified (issues #117 and #179; PRs #174 and #181)
  • Added shuffle option to config files to decide whether training data should be shuffled before training. By default this is False, but if grid_search is True, we will automatically shuffle. Previously, the default was True, and there was no option in the config files. (issue #189, PR #190)
  • Updated documentation to indicate that we're using StratifiedKFold (issue #160)
  • Added FeatureSet.__eq__ and FeatureSet.__getitem__ methods.

Minor changes without issues

  • Updated docstrings all over the place to be more accurate.
  • Updated generate_predictions to use new Reader API.
  • Added argv optional argument to all utility script main functions to simplify testing.
  • Added mock tests, so SKLL now requires mock to work with Python 2.7.
  • Added prettier SVG badges to README.
  • Added link to Data Science at the Command Line to README.
  • LibSVMReader now converts UTF-8 replacement characters that are used by LibSVMWriter when a feature name contains an =, |, #, :, or back to the original ASCII characters.

API breaking changes

  • FeatureSetWriter Writer
  • load_examples(path) Reader.for_path(path).read()
  • write_feature_file(...) Writer.for_path(FeatureSet(...)).write()
  • FeatureSet.classes FeatureSet.labels
  • All other instances of word "classes" changed to "labels" (#166)
  • FeatureSet.feat_vectorizer FeatureSet.vectorizer
  • run_ablation(all_combos=True) run_configuration(ablation=None)
  • run_ablation() run_configuration(ablation=1)
  • ExamplesTuple FeatureSet
  • Removed feature_hasher argument to all Learner methods, because its unnecessary
  • Learner.model_type is now the actual type of the underlying model instead of just a string.
  • FeatureSet.__len__ now returns the number of examples instead of the number of features.
  • Removed skll.learner._REGRESSION_MODELS and now we check for regression by seeing if model is subclass of RegressorMixin.

Config file breaking changes

  • Removed all short names for learners (PR #199)
  • Can no longer use classifiers instead of learners
  • train_location train_directory
  • test_location train_directory
  • cv_folds_location cv_folds_file

Files (179.3 kB)
Name Size
skll-1.0.0.zip md5:9a5ba5d08cf37034a93365ba3dee3c1e 179.3 kB Download

Share

Cite as