Published November 23, 2014
| Version v1.0.0
Software
Open
skll: SKLL 1.0.0
Creators
- 1. Educational Testing Service
- 2. Bitdeli
Description
The 1.0 release is finally here! It's been a little over a year since our first public release, and we're ready to say that SKLL is 1.0. Read our massive release notes:
We did make some API- and config-file-breaking changes. They are listed at the end of the release notes. They should all be addressable by a quick find-and-replace.
Bug fixes- Fixed path problems in iris example (issue #103, PR #171)
- Fixed bug where ablated_features field was incorrect when config file contained multiple feature sets (issue #125)
- Fixed bug where CV would crash with rare classes (issue #109, PR #165)
- Fixed issue where warning about extremely large feature values was being issued before rescaling
- Fixed issue where some warning messages used mix of new-style and old-style replacement strings with old-style formatting.
- Fixed a number of bugs with filtering FeatureSet objects and writing filtered sets to files.
- Fixed bug in FeatureSet.__sub__ where feature names were being passed instead of indices.
- Fixed issue where MegaMWriter could not print numbers in Python 2.7.
- SKLL releases are now for specific versions of scikit-learn. 1.0.0 requires scikit-learn 0.15.2 (issue #138, PR #170)
- Added tutorial to documentation that walks new users through using SKLL in much the same way as our PyData talks (issue #153).
- Added support for custom learners (issue #92, PR #183)
- Added two command-line utilities, join_features and filter_features, for joining and filtering feature files. These replace join_megam and filter_megam (issue #79, PR #198)
- Added support for specifying the field in ARFF, CSV, or TSV files that contains the IDs for each instance (issue #204, PR #206)
- Added train/test set sizes to result files (issue #150, PR #161)
- Added intercept to print_model_weights output (issue #155, PR #163)
- Added total time and end time-stamp to experiment results (issue #91, PR #167)
- Added exception when featureset_name is longer than 210 characters (issue #121, PR #168)
- Added regression example data, boston (issue #162)
- Added ability to specify number of grid search folds (issue #122, PR #175)
- Added warning message when number of features in training model are different than those for FeatureSet passed to Learner.predict() (issue #145)
- Added conda.yaml file to repository to make conda package creation simpler (issue #159, PR #173)
- Added loads more unit tests, greatly increased unit test coverage, and generally cleaned up test modules (issues #97, #148, #157, #188, and #202; PRs #176, #184, #196, #203, and #205)
- Added train_file and test_file fields to config files, which can be used to specify single file feature sets. This greatly simplifies running simple experiments (issue #12, PR #197)
- Added support for merging feature sets with IDs in different orders (issue #149, PR #177)
- Added ValueError when invalid tuning objective is specified (issues #117 and #179; PRs #174 and #181)
- Added shuffle option to config files to decide whether training data should be shuffled before training. By default this is False, but if grid_search is True, we will automatically shuffle. Previously, the default was True, and there was no option in the config files. (issue #189, PR #190)
- Updated documentation to indicate that we're using StratifiedKFold (issue #160)
- Added FeatureSet.__eq__ and FeatureSet.__getitem__ methods.
- Overhauled and cleaned up all documentation. Look how pretty it is!
- Updated docstrings all over the place to be more accurate.
- Updated generate_predictions to use new Reader API.
- Added argv optional argument to all utility script main functions to simplify testing.
- Added mock tests, so SKLL now requires mock to work with Python 2.7.
- Added prettier SVG badges to README.
- Added link to Data Science at the Command Line to README.
- LibSVMReader now converts UTF-8 replacement characters that are used by LibSVMWriter when a feature name contains an =, |, #, :, or back to the original ASCII characters.
- FeatureSetWriter Writer
- load_examples(path) Reader.for_path(path).read()
- write_feature_file(...) Writer.for_path(FeatureSet(...)).write()
- FeatureSet.classes FeatureSet.labels
- All other instances of word "classes" changed to "labels" (#166)
- FeatureSet.feat_vectorizer FeatureSet.vectorizer
- run_ablation(all_combos=True) run_configuration(ablation=None)
- run_ablation() run_configuration(ablation=1)
- ExamplesTuple FeatureSet
- Removed feature_hasher argument to all Learner methods, because its unnecessary
- Learner.model_type is now the actual type of the underlying model instead of just a string.
- FeatureSet.__len__ now returns the number of examples instead of the number of features.
- Removed skll.learner._REGRESSION_MODELS and now we check for regression by seeing if model is subclass of RegressorMixin.
- Removed all short names for learners (PR #199)
- Can no longer use classifiers instead of learners
- train_location train_directory
- test_location train_directory
- cv_folds_location cv_folds_file
Files
skll-v1.0.0.zip
Files
(179.3 kB)
Name | Size | Download all |
---|---|---|
md5:e2f6e66d6215f25e6c56a900b278857e
|
179.3 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/EducationalTestingService/skll/tree/v1.0.0 (URL)