In order to get an unbiased estimate of the performance on new data, it is generally not enough to simply use repeated crossvalidations for a given set of hyperparamters and methods (see tuning), as this might produce an overly optimistic result.
A better (although more time-consuming) approach is nesting two resampling methods. To make the explanation easier, let's take cross-validations, in this case also called "double cross-validation". In the so called "outer" cross-validation the data is split repeatedly into a (larger) training set and a (smaller) test set in the usual way. Now, in every outer iteration the learner is tuned on the training set by performing an "inner" cross-validation. The best found hyperparameters are selected , with these the learner is fitted to the complete "outer" training set and the resulting model is used to access the (outer) test set. This results in much more reliable estimates of true performance distribution of the leraner for unseen data. These can now be used to estimate locations (e.g. of the mean or median performance value) and to compare learning methods in a fair way.
Using mlr, setting up such an experiment becomes very easy:
# Classification task with iris data set ct <- make.classif.task(data = iris, target = "Species")# Very small grid for svm hyperparameters r <- list(C = 2^seq(-1,1), sigma = 2^seq(-1,1))# Define "inner" cross-validation indices inner.res <- make.res.desc("cv", iters = 3)# Tune a SVM svm.tuner <- make.tune.wrapper("kernlab.svm.classif", method = "grid", resampling = inner.res, control = grid.control(ranges=r))# Three learner to be compared learners <- c("lda", "qda", svm.tuner)# Define "outer" cross-validation indices res <- make.res.desc("cv", iters = 5)# Merge it to a benchmark experiment result <- bench.exp(learners, ct, res)Benchmark result mean sd LDA 0.02000000 0.02981424 qda 0.02000000 0.02981424 tuned-svm 0.05333333 0.03800585
The above code should be mainly self-explanatory. In the result every row corresponds to one learner. The entries show the mean test error and its standard deviation for the final fitted model.
But the Benchmark result contains much more information, which you can access if you want to see details. Let's have a look to the benchmark result from the example above:
# Access further information # The single performances of the outer crossvalidation result["perf"]LDA qda tuned-svm 1 0.03333333 0.10000000 0.06666667 2 0.00000000 0.00000000 0.00000000 3 0.00000000 0.00000000 0.06666667 4 0.06666667 0.06666667 0.10000000 5 0.00000000 0.00000000 0.03333333 # A list of the tuned parameters with tune- and test-performance result["tuned.pars"][[1]]$LDA [1] NA [[1]]$qda [1] NA [[1]]$`tuned-svm` C sigma tune.perf test.perf 1 1.0 0.5 0.02500000 0.06666667 2 2.0 0.5 0.03333333 0.00000000 3 1.0 0.5 0.04166667 0.06666667 4 0.5 0.5 0.01666667 0.10000000 5 0.5 0.5 0.04166667 0.03333333 # Confusion matrices - one for each learner result["conf.mats"][[1]]$LDA predicted true setosa versicolor virginica -SUM- setosa 50 0 0 0 versicolor 0 48 2 2 virginica 0 1 49 1 -SUM- 0 1 2 3 [[1]]$qda predicted true setosa versicolor virginica -SUM- setosa 50 0 0 0 versicolor 0 46 4 4 virginica 0 1 49 1 -SUM- 0 1 4 5 [[1]]$`tuned-svm` predicted true setosa versicolor virginica -SUM- setosa 50 0 0 0 versicolor 0 46 4 4 virginica 0 4 46 4 -SUM- 0 4 4 8
Of course everything works the same way if you exchange the resampling strategy either in the outer or inner run.
They can be freely mixed.
We show an example with outer bootstrap and inner cross-validation, our learner will be k-nearest-neighbor.
# Classification task with iris data set ct <- make.classif.task(data = iris, target = "Species")# Range of hyperparameter k r <- list(k = 1:5)# Define "inner" cross-validation indices inner.res <- make.res.desc("cv", iters = 3)# Tune a SVM knn.tuner <- make.tune.wrapper("kknn.classif", method = "grid", resampling = inner.res, control = grid.control(ranges=r))# Define "outer" bootstrap indices res <- make.res.desc("bs", iters = 5)# Merge it to a benchmark experiment result <- bench.exp(knn.tuner, ct, res)Benchmark result mean sd [1,] 0.05747409 0.02422707 # Which performances did we get in the single runs? result["perf"]tuned-knn 1 0.07272727 2 0.08000000 3 0.05263158 4 0.06382979 5 0.01818182 # Which parameter belong to the perfomances? result["tuned.pars"]$`tuned-knn` k tune.perf test.perf 1 2 0.013333333 0.07272727 2 4 0.006666667 0.08000000 3 1 0.026666667 0.05263158 4 1 0.013333333 0.06382979 5 5 0.026666667 0.01818182 # What does the confusion matrix look like? result["conf.mats"]$`tuned-knn` predicted true setosa versicolor virginica -SUM- setosa 87 1 0 1 versicolor 0 89 5 5 virginica 0 9 73 9 -SUM- 0 10 5 15
When you want to add another learner to your existing benchmark experiment, this works easily in mlr.
The big advantage is, that the same resample pairing is used as for the other learners.
Let's take Example 1 and add another learner - Naive Bayes.
new.result <- bench.add(learner = "naiveBayes", task = ct, result = result)---EINFÜGEN----