Learning tasks are the basic element of the package to encapsulate the data set and all relevant information regarding the purpose of the task. This will be at least the target variable, but might also be information about excluded (ID) variables or misclassification costs.
The following example defines a classification task for the data set BreastCancer (from package mlbench) and excludes an ID variable from all further modell fitting.
ct <- make.classif.task(data = BreastCancer, target = "Class", excluded = "Id")
Instead of specifying the target, we could have also used the formula interface:
make.classif.task(data = BreastCancer, formula = Class~., excluded = "Id") make.classif.task(data = iris, formula = Species ~ Sepal.Length + I(Sepal.Width)^2)
The now defined task also gives you some convenience methods to access properties of the data set:
# before we start, let's examine ct # print useful info about the data set ctClassification problem BreastCancer Features Nums:0 Ints:0 Factors:9 Chars:0 Observations: 699 Missings: TRUE in 16 observations and 1 features Classes:2 benign malignant 458 241 # access more information # get target name and column index - same for regression ct["target.name"] ct["target.col"]# get target values for all / some observations - same for regression ct["targets"] ct["targets", c(1:5, 100:15)]# get all possible classes - classification specific ct["class.levels"]
We can include further information like costs, weights or the type of the prediction optionally:
# non-default costs for wrong predictions: costs <- matrix (c(0, 1, 2, 0), 2, 2) costs[,1] [,2] [1,] 0 2 [2,] 1 0 ct <- make.classif.task(data = BreastCancer , target = "Class", costs = costs)# weights for the cases ct <- make.classif.task(data = BreastCancer, target = "Class", weights = 1:699)# when we are interested in predicting probabilities ct <- make.classif.task(data = BreastCancer, target = "Class", type = "prob")
From this classification task we can now train various models, e.g. by training one model on different subparts of BreastCancer or by training different models. This will be covered in the subsequent section Training . Before that, let's look at a very similar way to set up a regression experiment:
# We will generally take the Boston Housing data set as regression example library(mlbench) data(BostonHousing)# and this time we use formula instead of target: rt <- make.regr.task("BostonHousing", data = BostonHousing, formula = medv~.)Regression problem BostonHousing Features Nums:12 Ints:0 Factors:1 Chars:0 Observations: 506 Missings: FALSE # the further steps work analogous to a classification task