Learning task

Learning tasks are the basic element of the package to encapsulate the data set and all relevant information regarding the purpose of the task. This will be at least the target variable, but might also be information about excluded (ID) variables or misclassification costs.

Classification example

The following example defines a classification task for the data set BreastCancer (from package mlbench) and excludes an ID variable from all further modell fitting.

	ct <- make.classif.task(data = BreastCancer, target = "Class", excluded = "Id")

Instead of specifying the target, we could have also used the formula interface:

	make.classif.task(data = BreastCancer, formula = Class~., excluded = "Id")
	make.classif.task(data = iris, formula = Species ~ Sepal.Length + I(Sepal.Width)^2)
  • The second example uses the iris dataset which includes numeric variables (in opposition to BreastCancer) and thus can be an example for including just a subset of the variables and transformations of the variables. This construction will built a new internal data frame by calling model.frame.

    The now defined task also gives you some convenience methods to access properties of the data set:

    	# before we start, let's examine ct
    	# print useful info about the data set
    	ct
    	
    	Classification problem BreastCancer
    	Features Nums:0 Ints:0 Factors:9 Chars:0
    	Observations: 699
    	Missings: TRUE
    	in 16 observations and 1 features
    	Classes:2
    	   benign malignant 
    	      458       241 
    		
    	
    	# access more information 
    	# get target name and column index - same for regression
    	ct["target.name"]
    	ct["target.col"]
    
    	# get target values for all / some observations  - same for regression 
    	ct["targets"]
    	ct["targets", c(1:5, 100:15)]
    
    	# get all possible classes - classification specific		 
    	ct["class.levels"]
    

    We can include further information like costs, weights or the type of the prediction optionally:

    	 # non-default costs for wrong predictions: 
    	costs <- matrix (c(0, 1, 2, 0), 2, 2)
    	costs
    	
    	     [,1] [,2]
    	[1,]    0    2
    	[2,]    1    0
    	
    	ct <- make.classif.task(data = BreastCancer , target = "Class", costs = costs)
    	
    	 # weights for the cases 
    	ct <- make.classif.task(data = BreastCancer, target = "Class", weights = 1:699)
    	
    	 # when we are interested in predicting probabilities 
    	ct <- make.classif.task(data = BreastCancer, target = "Class", type = "prob")
    

    From this classification task we can now train various models, e.g. by training one model on different subparts of BreastCancer or by training different models. This will be covered in the subsequent section Training . Before that, let's look at a very similar way to set up a regression experiment:

    Regression example

    	# We will generally take the Boston Housing data set as regression example
    	library(mlbench)
    	data(BostonHousing)
    
    	# and this time we use formula instead of target: 
    	rt <- make.regr.task("BostonHousing", data = BostonHousing, formula = medv~.)
    
    	
    	Regression problem BostonHousing
    	Features Nums:12 Ints:0 Factors:1 Chars:0
    	Observations: 506
    	Missings: FALSE
    		
    	
    	# the further steps work analogous to a classification task