A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
Description
We review accuracy estimation methods and compare the two most common methods: crossvalidation and bootstrap. Recent experimental results on arti cial data and theoretical results in restricted settings have shown that for selecting a good classi er from a set of classiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment| over half a million runs of C4. 5 and a Naive-Bayes algorithm| to estimate the e ects of di erent parameters on these algorithms on real-world datasets. For crossvalidation, we vary the number of folds and whether the folds are strati ed or not; for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, the best method to use for model selection is ten-fold strati ed cross validation, even if computation power allows using more folds.
Files
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection.pdf
Files
(216.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:33852bd6d7e7b3ec6863b94629b681b7
|
216.4 kB | Preview Download |