Summary Report

Out[6]:

This report presents the analysis for model_comparison: a comparison of the results of the rsmtool sample experiment, rsmeval sample experiment and once again the rsmtool sample experiment

Out[7]:
Thu Aug 18 16:32:28 2016
Out[8]:

The report compares the following models:

ASAP2: A model which uses all features and a LinearRegression.

ASAP2_evaluation: Evaluation of the scores generated using rsmtool.

ASAP2: A model which uses all features and a LinearRegression.

Model

The table shows main model parameters for each experiment.

Model summary

model N features N negative learner train_label
ASAP2 4 0 LinearRegression score
ASAP2 4 0 LinearRegression score

Model fit

model N responses N features R2 R2_adjusted
ASAP2 500 4 0.485 0.481
ASAP2 500 4 0.485 0.481

Evaluation results

Overall association statistics

The tables in this section show the standard association metrics between human scores and different types of machine scores. These results are computed on the evaluation set. The scores for each model have been truncated to [min-0.4998, max+.4998].When indicated, scaled scores are computed by re-scaling the predicted scores using mean and standard deviation of human scores as observed on the training data and mean and standard deviation of machine scores as predicted for the training set.

Descriptive holistic score statistics

The table shows distributional properties of human and system scores. SMD values lower then -0.15 or higher than 0.15 are highlighted.

model N system score type h_mean h_sd sys_mean sys_sd SMD
ASAP2 200 scale 3.500 0.924 3.553 0.934 0.057
ASAP2_evaluation 200 scale 3.500 0.924 3.553 0.934 0.057
ASAP2 200 scale 3.500 0.924 3.553 0.934 0.057

Association statistics

The table shows the standard association metrics between human scores and machine scores. Note that some evaluations are based on rounded (Trim-round) scores computed by first truncating and then rounding the predicted score.

model N system score type corr R2 RMSE wtkappa (rounded) kappa (rounded) exact_agr (rounded) adj_agr (rounded)
ASAP2 200 scale 0.798 0.589 0.591 0.814 0.494 64.500 100
ASAP2_evaluation 200 scale 0.798 0.589 0.591 0.814 0.494 64.500 100
ASAP2 200 scale 0.798 0.589 0.591 0.814 0.494 64.500 100

System information

This report was generated using rsmtool v5.1 on a 64bit computer running Linux.

Python packages

alabaster==0.7.8
alignment==1.0.9
almisc==2.0
babel==2.3.3
backports-abc==0.4
beautifulsoup4==4.3.2
cycler==0.10.0
decorator==4.0.9
docutils==0.12
imagesize==0.7.1
ipykernel==4.3.1
ipython-genutils==0.1.0
ipython==4.1.2
ipywidgets==4.1.1
jinja2==2.8
joblib==0.9.4
jsonschema==2.4.0
jupyter-client==4.2.2
jupyter-console==4.1.1
jupyter-core==4.1.0
jupyter==1.0.0
markupsafe==0.23
matplotlib==1.5.1
mistune==0.7.2
nbconvert==4.1.0
nbformat==4.0.1
nose==1.3.7
notebook==4.1.0
numpy==1.10.4
pandas==0.18.0
path.py==0.0.0
patsy==0.4.1
pexpect==4.0.1
pickleshare==0.5
pip==8.1.1
prettytable==0.7.2
ptyprocess==0.5
pyflakes==1.2.3
pygments==2.1.1
pyparsing==2.0.3
python-dateutil==2.5.2
pytz==2016.3
pyyaml==3.11
pyzmq==15.2.0
qtconsole==4.2.1
rsmextra==1.0.0
rsmtool==5.1
scikit-learn==0.17.1
scipy==0.17.0
seaborn==0.7.0
setuptools==20.3
simplegeneric==0.8.1
six==1.10.0
skll==1.2
snowballstemmer==1.2.1
sphinx-rtd-theme==0.1.9
sphinx==1.4.1
srparse==1.0
statsmodels==0.6.1
terminado==0.5
tornado==4.3
traitlets==4.2.1
wheel==0.29.0