There is a newer version of the record available.

Published September 27, 2017 | Version v0.9
Software Open

rhiever/tpot: Sparse matrix support, early stopping, and checkpointing

Description

  • TPOT now supports sparse matrices with a new built-in TPOT configurations, "TPOT sparse". We are using a custom OneHotEncoder implementation that supports missing values and continuous features.

  • We have added an "early stopping" option for stopping the optimization process if no improvement is made within a set number of generations. Look up the early_stop parameter to access this functionality.

  • TPOT now reduces the number of duplicated pipelines between generations, which saves you time during the optimization process.

  • TPOT now supports custom scoring functions via the command-line mode.

  • We have added a new optional argument, periodic_checkpoint_folder, that allows TPOT to periodically save the best pipeline so far to a local folder during optimization process.

  • TPOT no longer uses sklearn.externals.joblib when n_jobs=1 to avoid the potential freezing issue that scikit-learn suffers from.

  • We have added pandas as a dependency to read input datasets instead of numpy.recfromcsv. NumPy's recfromcsv function is unable to parse datasets with complex data types.

  • Fixed a bug that DEFAULT in the parameter(s) of nested estimator raises KeyError when exporting pipelines.

  • Fixed a bug related to setting random_state in nested estimators. The issue would happen with pipeline with SelectFromModel (ExtraTreesClassifier as nested estimator) or StackingEstimator if nested estimator has random_state parameter.

  • Fixed a bug in the missing value imputation function in TPOT to impute along columns instead rows.

  • Refined input checking for sparse matrices in TPOT.

Files

rhiever/tpot-v0.9.zip

Files (2.5 MB)

Name Size Download all
md5:e76727b785d262f126adc346a29a5a5f
2.5 MB Preview Download

Additional details

Related works