-
Notifications
You must be signed in to change notification settings - Fork 3
/
vip.Rmd
92 lines (63 loc) · 2.4 KB
/
vip.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
title: "The vip R package"
date: "`r format(Sys.time(), '%d-%m-%Y')`"
output:
html_document:
toc: true
toc_float: true
---
This report aims to present the capabilities of the package `vip`.
The document is a part of the paper "Landscape of R packages for eXplainable Machine Learning", S. Maksymiuk, A. Gosiewska, and P. Biecek.
(https://arxiv.org/abs/2009.13248). It contains a real life use-case with a hand of [titanic_imputed](https://modeloriented.github.io/DALEX/reference/titanic.html) data set described in Section *Example gallery for XAI packages* of the article.
We did our best to show the entire range of the implemented explanations. Please note that the examples may be incomplete. If you think something is missing, feel free to make a pull request at the GitHub repository [MI2DataLab/XAI-tools](https://github.com/MI2DataLab/XAI-tools).
The list of use-cases for all packages included in the article is [here](http://xai-tools.drwhy.ai/).
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
```
Load [`titanic_imputed`](https://modeloriented.github.io/DALEX/reference/titanic.html) data set.
```{r}
data(titanic_imputed, package = "DALEX")
head(titanic_imputed)
```
```{r}
library(vip)
library(randomForest)
set.seed(123)
```
Fit a random forest and logistic regression to the titanic imputed data.
```{r}
rf_model <- ranger::ranger(survived~., data = titanic_imputed, classification = TRUE, probability = TRUE)
rf_model_2 <- randomForest(factor(survived)~., data = titanic_imputed)
glm_model <- glm(survived~., data = titanic_imputed)
```
# Model parts
## Variance-based Variable Importance
```{r}
vip(object = rf_model, method = "firm")
```
## Model-specific Variable Importance
```{r}
vip(object = rf_model_2, method = "model")
```
## Permutation-based Variable Importance
```{r}
pred_fun = function(X.model, newdata) {
predict(X.model, newdata)$predictions[,2]
}
vip(object = rf_model, method = "permute", target = "survived", metric = "auc", pred_wrapper = pred_fun, reference_class = 1)
```
## Shapley-based Variable Importance
```{r}
pred_fun = function(X.model, newdata) {
predict(X.model, newdata)$predictions[,2]
}
vip(object = rf_model, method = "shap", pred_wrapper = pred_fun)
```
## Interaction strength for pairs of features
```{r}
vint(rf_model, feature_names = c("age", "fare"))
```
# Session info
```{r}
sessionInfo()
```