Tuning of hyperparameters

Many classification algorithms feature a set of hyperparameters that either need to be selected by the user or through resampling, e.g. cross-validation. Setting them by hand was already covered in the section about training and resampling - simply use the "parset" argument in the "train" or "resample.fit" methods

Assuming, you have understood how resampling works, it's quite simple to implement a grid search, which is one of the standard - albeit slow - ways to choose an appropriate set of parameters from a given range of values.

Classification example

We again use the iris data set and now want to tune a SVM with a polynomial kernel.

	# Classification task
	ct <- make.classif.task(data = iris, target = "Species")

	# Range of the values
	r <- list(C = 2^(-1:1), sigma = 2^(-1:1))

	# Evaluation with 3-fold cross-validation
	res <- make.res.desc("cv", iters = 3)

	# Tune SVM
	tune("kernlab.svm.classif", ct, res, control = grid.control(ranges=r))
	
	$par
	$par$C
	[1] 0.5

	$par$sigma
	[1] 0.5


	$perf
	[1] 0.04

	$all.perfs
	    C sigma       mean         sd time
	1 0.5   0.5 0.04000000 0.02000000 0.25
	2   1   0.5 0.04666667 0.05033223 0.26
	3   2   0.5 0.05333333 0.01154701 0.25
	4 0.5     1 0.06666667 0.04618802 0.27
	5   1     1 0.06666667 0.04618802 0.25
	6   2     1 0.05333333 0.02309401 0.25
	7 0.5     2 0.06666667 0.03055050 0.27
	8   1     2 0.06666667 0.03055050 0.26
	9   2     2 0.06666667 0.04618802 0.25
	

Let's take a look at the above code.

The parameter grid has to be a named list, where every entry has to be named according to the corresponding parameter of the underlying R function (in this case "ksvm" from the kernlab package, see the help page of "kernlab.svm.classif"). Its value is a vector of feasible values for this hyperparameter. The complete grid is just the cross-product of all feasible values.

(Please note that with ksvm we encounter a somewhat special case, as the parameters should be passed through the "kernel" and "kpar" structures. To make this simpler, t.svm allows direct passing. Again, see documentation.)

Tune now simply performs the CV for every element of the cross-product and selects the one with the best mean performance measure.


SVMs exhibit another special case with regard to tuning, as one generally does not want to optimize over a complete cross-product, when using different kernels with different kernel parameters. mlr therefore allows "ranges" to be a list of ranges:
Let's tune SVMs with polynomial and gaussian kernels on iris

	# Classification task
	ct <- make.classif.task(data = iris, target = "Species")

	# Different kernels with different kernel parameters 
	r1 <- list(C = c(0.5, 1, 2), kernel = "polydot", degree = 1:3)
	r2 <- list(C = c(0.5, 1, 2), kernel = "rbfdot", sigma = c(0.1, 0.2, 0.3))

	# Evaluation with 5-fold cross-validation
	res <- make.res.desc("cv", iters = 5)
	
	# Combine grids
	ranges <- list(r1, r2)

	# Tune SVMs
	tune("kernlab.svm.classif", ct, res, ranges)

	
	$best.parameters
	$best.parameters$C
	[1] 0.5

	$best.parameters$kernel
	[1] "polydot"

	$best.parameters$degree
	[1] 1

	$best.parameters$sigma
	[1] NA


	$best.performance
	[1] 0.03333333

	$best.sd
	[1] 0

	$performances
	     C  kernel degree sigma       mean         sd
	1  0.5 polydot      1    NA 0.03333333 0.00000000
	2  1.0 polydot      1    NA 0.04000000 0.02788867
	3  2.0 polydot      1    NA 0.04666667 0.01825742
	4  0.5 polydot      2    NA 0.03333333 0.00000000
	5  1.0 polydot      2    NA 0.03333333 0.02357023
	6  2.0 polydot      2    NA 0.04000000 0.02788867
	7  0.5 polydot      3    NA 0.04000000 0.02788867
	8  1.0 polydot      3    NA 0.04000000 0.02788867
	9  2.0 polydot      3    NA 0.04666667 0.01825742
	10 0.5  rbfdot     NA   0.1 0.04000000 0.03651484
	11 1.0  rbfdot     NA   0.1 0.03333333 0.02357023
	12 2.0  rbfdot     NA   0.1 0.03333333 0.02357023
	13 0.5  rbfdot     NA   0.2 0.03333333 0.04082483
	14 1.0  rbfdot     NA   0.2 0.04000000 0.02788867
	15 2.0  rbfdot     NA   0.2 0.03333333 0.02357023
	16 0.5  rbfdot     NA   0.3 0.04000000 0.02788867
	17 1.0  rbfdot     NA   0.3 0.04000000 0.02788867
	18 2.0  rbfdot     NA   0.3 0.04000000 0.02788867
	

Regression example

Let's tune a k-nearest-neighbor-regression from kknn on the BostonHousing data set.

	# Regression task
	rt <- make.regr.task(data = BostonHousing, formula = medv~.)

	# Range of the value k 
	range <- list(k=1:7)

	# Evaluate with 5-fold cross-validation
	res <- make.res.desc("cv", iters=5)
	
	# Tune k-nearest-neighbor-regression with default loss-function
	tune("kknn.regr", rt, res, control=grid.control(ranges=range), loss="squared")

	
	$par
	$par$k
	[1] 5


	$perf
	[1] 16.01436

	$all.perfs
	  k     mean       sd time
	1 1 21.14526 5.111700 0.50
	2 2 21.25564 8.288623 0.48
	3 3 16.13007 8.476711 0.50
	4 4 16.41579 5.511934 0.50
	5 5 16.01436 4.695017 0.50
	6 6 17.24456 2.548165 0.50
	7 7 17.40761 1.587839 0.50