skll: SKLL 1.0.0

doi:10.5281/zenodo.12828

Published November 23, 2014 | Version v1.0.0

Software Open

skll: SKLL 1.0.0

1. Educational Testing Service
2. Bitdeli

The 1.0 release is finally here! It's been a little over a year since our first public release, and we're ready to say that SKLL is 1.0. Read our massive release notes:

We did make some API- and config-file-breaking changes. They are listed at the end of the release notes. They should all be addressable by a quick find-and-replace.

Bug fixes

Fixed path problems in iris example (issue #103, PR #171)
Fixed bug where ablated_features field was incorrect when config file contained multiple feature sets (issue #125)
Fixed bug where CV would crash with rare classes (issue #109, PR #165)
Fixed issue where warning about extremely large feature values was being issued before rescaling
Fixed issue where some warning messages used mix of new-style and old-style replacement strings with old-style formatting.
Fixed a number of bugs with filtering FeatureSet objects and writing filtered sets to files.
Fixed bug in FeatureSet.__sub__ where feature names were being passed instead of indices.
Fixed issue where MegaMWriter could not print numbers in Python 2.7.

New features

SKLL releases are now for specific versions of scikit-learn. 1.0.0 requires scikit-learn 0.15.2 (issue #138, PR #170)
Added tutorial to documentation that walks new users through using SKLL in much the same way as our PyData talks (issue #153).
Added support for custom learners (issue #92, PR #183)
Added two command-line utilities, join_features and filter_features, for joining and filtering feature files. These replace join_megam and filter_megam (issue #79, PR #198)
Added support for specifying the field in ARFF, CSV, or TSV files that contains the IDs for each instance (issue #204, PR #206)
Added train/test set sizes to result files (issue #150, PR #161)
Added intercept to print_model_weights output (issue #155, PR #163)
Added total time and end time-stamp to experiment results (issue #91, PR #167)
Added exception when featureset_name is longer than 210 characters (issue #121, PR #168)
Added regression example data, boston (issue #162)
Added ability to specify number of grid search folds (issue #122, PR #175)
Added warning message when number of features in training model are different than those for FeatureSet passed to Learner.predict() (issue #145)
Added conda.yaml file to repository to make conda package creation simpler (issue #159, PR #173)
Added loads more unit tests, greatly increased unit test coverage, and generally cleaned up test modules (issues #97, #148, #157, #188, and #202; PRs #176, #184, #196, #203, and #205)
Added train_file and test_file fields to config files, which can be used to specify single file feature sets. This greatly simplifies running simple experiments (issue #12, PR #197)
Added support for merging feature sets with IDs in different orders (issue #149, PR #177)
Added ValueError when invalid tuning objective is specified (issues #117 and #179; PRs #174 and #181)
Added shuffle option to config files to decide whether training data should be shuffled before training. By default this is False, but if grid_search is True, we will automatically shuffle. Previously, the default was True, and there was no option in the config files. (issue #189, PR #190)
Updated documentation to indicate that we're using StratifiedKFold (issue #160)
Added FeatureSet.__eq__ and FeatureSet.__getitem__ methods.

Minor changes without issues

Overhauled and cleaned up all documentation. Look how pretty it is!
Updated docstrings all over the place to be more accurate.
Updated generate_predictions to use new Reader API.
Added argv optional argument to all utility script main functions to simplify testing.
Added mock tests, so SKLL now requires mock to work with Python 2.7.
Added prettier SVG badges to README.
Added link to Data Science at the Command Line to README.
LibSVMReader now converts UTF-8 replacement characters that are used by LibSVMWriter when a feature name contains an =, |, #, :, or back to the original ASCII characters.

API breaking changes

FeatureSetWriter Writer
load_examples(path) Reader.for_path(path).read()
write_feature_file(...) Writer.for_path(FeatureSet(...)).write()
FeatureSet.classes FeatureSet.labels
All other instances of word "classes" changed to "labels" (#166)
FeatureSet.feat_vectorizer FeatureSet.vectorizer
run_ablation(all_combos=True) run_configuration(ablation=None)
run_ablation() run_configuration(ablation=1)
ExamplesTuple FeatureSet
Removed feature_hasher argument to all Learner methods, because its unnecessary
Learner.model_type is now the actual type of the underlying model instead of just a string.
FeatureSet.__len__ now returns the number of examples instead of the number of features.
Removed skll.learner._REGRESSION_MODELS and now we check for regression by seeing if model is subclass of RegressorMixin.

Config file breaking changes

Removed all short names for learners (PR #199)
Can no longer use classifiers instead of learners
train_location train_directory
test_location train_directory
cv_folds_location cv_folds_file

Files

skll-v1.0.0.zip

Files (179.3 kB)

Name	Size	Download all
skll-v1.0.0.zip md5:e2f6e66d6215f25e6c56a900b278857e	179.3 kB	Preview Download

Additional details

Is supplement to: https://github.com/EducationalTestingService/skll/tree/v1.0.0 (URL)

	All versions	This version
Views	1,051	72
Downloads	58	1
Data volume	10.5 MB	179.3 kB

skll: SKLL 1.0.0

Creators

Description

Files

skll-v1.0.0.zip

Files (179.3 kB)

Additional details

Related works