"Training" a learner just means fitting model to a given data set. We are not concerned with the specifics of the fitting process as such here - this will be taken care of by the underlying R methods that this package employs. Rather more important is, that training and all subsequent operations can be performed by using a unified interface.
This is in this case achieved by calling the method "train" on the classification task. The most important parameters are the indices of the subset which are used for training and a list of named elements which specify the hyperparameters of the learner. The parameters will generally be named the same way as in the underlying R method, if not, differences are documented on the R help for the learning method. The return value is always an object of class "model" which wraps the concrete model of the used R classification or regression method. It can subsequently be used to perform prediction for new observations.
Let's have a look at the iris dataset:
# Classification task: ct <- make.classif.task(data = iris, target="Species")# Let's train some Decision Trees: # on whole data set m1 <- train("rpart.classif", ct)# on a subset (every second observation) m2 <- train("rpart.classif", ct, subset=seq(from=1, to=150, by=2))# with hyperparameters m3 <- train("rpart.classif", ct, subset=seq(from=1, to=150, by=2), parset=list(minsplit=7, cp=0.03))# You can print some basic information of the model to the console m3Learner model for RPART Hyperparameters: minsplit=7 cp=0.03 Trained on obs: 75 # Some accessor m3["parset"]$minsplit [1] 7 $cp [1] 0.03 m3["subset"][1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 [20] 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 [39] 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 [58] 115 117 119 121 123 125 127 129 131 133 135 137 139 141 143 145 147 149 # access the wrapped rpart model - in most cases you won't need to... m3["learner.model"]n= 75 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 75 50 setosa (0.3333333 0.3333333 0.3333333) 2) Petal.Length< 2.45 25 0 setosa (1.0000000 0.0000000 0.0000000) * 3) Petal.Length>=2.45 50 25 versicolor (0.0000000 0.5000000 0.5000000) 6) Petal.Width< 1.65 25 1 versicolor (0.0000000 0.9600000 0.0400000) * 7) Petal.Width>=1.65 25 1 virginica (0.0000000 0.0400000 0.9600000) *
As regression example we use the BostonHousing data set:
# Regression task: rt <- make.regr.task(data = BostonHousing, formula = medv~.)# Let's train some Gradient Boosting Machines: # on whole data set m1 <- train("gbm.regr", rt)# on a subset (every second observation) m2 <- train("gbm.regr", rt, subset=seq(1, 506, 2))# with a set of hyperparameters m3 <- train("gbm.regr", rt, subset = seq(1, 506, 2), parset = list(n.trees = 500, distribution = "laplace", interaction.depth = 3))# You can print some basic information of the model to the console m3Learner model for Gradient Boosting Machine Trained on obs: 253 Hyperparameters: n.trees=500 distribution=laplace interaction.depth=3 # rest is analogous to example above
As you can read in section Wrapped learners there is another possibility to access the learning algorithm: We again take the regression example from above and show how we easily build two models with different hyperparameter sets:
# Regression task: rt <- make.regr.task(data = BostonHousing, formula = medv~.)# Construct the wrapped learner wl <- make.learner("gbm.regr")# First setting of hyperparameters wl_1 <- set.train.par(wl, n.trees = 500, distribution = "laplace", interaction.depth = 3)# Second setting of hyperparameters wl_2 <- set.train.par(wl, n.trees = 250, distribution = "laplace", interaction.depth = 5)# And merge the information in each case model_1 <- train(wl_1, rt) model_2 <- train(wl_2, rt)
Normally you should better define your hyperparameters in parset. Use the set.train.par-function for technical parameters which you do not change anymore. See ---LINK EINFÜGEN---- for details.