Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

There is a newer version of the record available.

Published October 26, 2019 | Version v1
Working paper Open

Testing prediction algorithms as null hypotheses: Application to assessing the performance of deep neural networks

Description

Bayesian models use posterior predictive distributions to quantify the uncertainty of their predictions. Similarly, the point predictions of neural networks and other machine learning algorithms may be converted to predictive distributions by various bootstrap methods. The predictive performance of each algorithm can then be assessed by quantifying the performance of its predictive distribution. Previous methods for assessing such performance are relative, indicating whether certain algorithms perform better than others. This paper proposes performance measures that are absolute in the sense that they indicate whether or not an algorithm performs adequately without requiring comparisons to other algorithms. The first proposed performance measure is a predictive p value that generalizes a prior predictive p value with the prior distribution equal to the posterior distribution of previous data. The other proposed performance measures use the generalized predictive p value for each prediction to estimate the proportion of target values that are compatible with the predictive distribution. The new performance measures are illustrated by using them to evaluate the predictive performance of deep neural networks when applied to the analysis of a large housing price data set that is used as a standard in machine learning.

Files

assessment-preprint191026.pdf

Files (432.1 kB)

Name Size Download all
md5:319bd8a9a81dd881dd7e3f698a457838
432.1 kB Preview Download