The quality of predictions is normally measured w.r.t some loss function (which is to be minimized), or sometimes w.r.t. some positive performance measure (which is to be maximized). Typical loss functions are misclassification error or deviance for classification and SSE or absolute deviations for regression.
At the moment only the most common measures are implemented, but it's easy to write your own. Also, a good starting point for classification, measures connected to ROCR curves and nice plots the package ROCR. I will probably integrate this more tightly at some point in the future.
| Short | Name | Description | Min/Max | Aggregate |
|---|---|---|---|---|
| zero-one | Mean misclassification error | Counts misclassification errors, divided by number of observations in test set | minimize | mean |
| Short | Name | Description | Min/Max | Aggregate |
|---|---|---|---|---|
| squared | Mean squared error | 1/n sum_i=1^n (y_i - pred_i)^2 | minimize | sum |
# Classification task with iris data set ct <- make.classif.task(data = iris, formula = Species~.)# Training and test set indices train.set <- seq(from = 1, to = 150, by = 2) test.set <- seq(from = 2, to = 150, by = 2)# Decision Tree on training set model <- train("rpart.classif", ct, subset = train.set)# Prediction on test set data preds <- predict(model, newdata = iris[test.set,])# Compare predicted and true label with default loss-function "zero-one" performance(true.y = iris[test.set, "Species"], pred.y = preds)$aggr [1] 0.05333333 $vals [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [39] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 ---NOCH VERVOLLSTÄNDIGEN:----# Compare predicted and true label with default loss-function "----" performance(true.y = iris[test.set, "Species"], pred.y = preds, loss="-----")
Very analogous to above example
# Regression task with BostonHousing data set rt <- make.regr.task(data = BostonHousing, formula = medv~.)# Training and test set indices train.set <- seq(from = 1, to = 506, by = 2) test.set <- seq(from = 2, to = 506, by = 2)# Gradient Boosting Machine on training set model <- train("gbm.regr", rt, subset = train.set, parset = list(n.trees=10000))# Prediction on test set data preds <- predict(model, newdata = BostonHousing[test.set,])[1] 22.395020 36.117794 25.031917 16.369151 17.798636 20.611659 22.606421 [8] 22.671347 20.158904 20.559315 20.347968 15.692052 16.319054 16.349748 ... [253] 23.055107 # Compare predicted and true label with default loss-function "squared" performance(true.y = BostonHousing[test.set, "medv"], pred.y = preds)$aggr [1] 17.03216 $vals [1] 6.186657e-01 8.751260e+00 1.189271e+01 1.169115e+02 1.462081e+00 [6] 2.720436e+00 5.587959e+00 8.576545e+00 7.208765e+00 5.800308e+00 ... [251] 3.202702e-03 8.105602e+00 1.274056e+02