forked from asadoughi/stat-learning
-
Notifications
You must be signed in to change notification settings - Fork 0
/
6.Rmd
81 lines (73 loc) · 3.09 KB
/
6.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
Chapter 9: Exercise 6
========================================================
### a
We randomly generate 1000 points and scatter them across line $x = y$ with wide margin. We also create noisy points along the line $5x -4y - 50 = 0$. These points make the classes barely separable and also shift the maximum margin classifier.
```{r 6a}
set.seed(3154)
# Class one
x.one = runif(500, 0, 90)
y.one = runif(500, x.one + 10, 100)
x.one.noise = runif(50, 20, 80)
y.one.noise = 5 / 4 * (x.one.noise - 10) + 0.1
# Class zero
x.zero = runif(500, 10, 100)
y.zero = runif(500, 0, x.zero - 10)
x.zero.noise = runif(50, 20, 80)
y.zero.noise = 5 / 4 * (x.zero.noise - 10) - 0.1
# Combine all
class.one = seq(1, 550)
x = c(x.one, x.one.noise, x.zero, x.zero.noise)
y = c(y.one, y.one.noise, y.zero, y.zero.noise)
plot(x[class.one], y[class.one], col="blue", pch="+", ylim=c(0, 100))
points(x[-class.one], y[-class.one], col="red", pch=4)
```
The plot shows that classes are barely separable. The noisy points create a fictitious boundary $5x - 4y - 50 = 0$.
### b
We create a z variable according to classes.
```{r}
library(e1071)
set.seed(555)
z = rep(0, 1100)
z[class.one] = 1
data = data.frame(x=x, y=y, z=z)
tune.out = tune(svm, as.factor(z)~., data=data, kernel="linear", ranges=list(cost=c(0.01, 0.1, 1, 5, 10, 100, 1000, 10000)))
summary(tune.out)
data.frame(cost=tune.out$performances$cost, misclass=tune.out$performances$error * 1100)
```
The table above shows train-misclassification error for all costs. A cost of 10000 seems to classify all points correctly. This also corresponds to a cross-validation error of 0.
### c
We now generate a random test-set of same size. This test-set satisfies the true decision boundary $x = y$.
```{r 6c}
set.seed(1111)
x.test = runif(1000, 0, 100)
class.one = sample(1000, 500)
y.test = rep(NA, 1000)
# Set y > x for class.one
for(i in class.one) {
y.test[i] = runif(1, x.test[i], 100)
}
# set y < x for class.zero
for (i in setdiff(1:1000, class.one)) {
y.test[i] = runif(1, 0, x.test[i])
}
plot(x.test[class.one], y.test[class.one], col="blue", pch="+")
points(x.test[-class.one], y.test[-class.one], col="red", pch=4)
```
We now make same predictions using all linear svms with all costs used in previous part.
```{r}
set.seed(30012)
z.test = rep(0, 1000)
z.test[class.one] = 1
all.costs = c(0.01, 0.1, 1, 5, 10, 100, 1000, 10000)
test.errors = rep(NA, 8)
data.test = data.frame(x=x.test, y=y.test, z=z.test)
for (i in 1:length(all.costs)) {
svm.fit = svm(as.factor(z)~., data=data, kernel="linear", cost=all.costs[i])
svm.predict = predict(svm.fit, data.test)
test.errors[i] = sum(svm.predict != data.test$z)
}
data.frame(cost=all.costs, "test misclass"=test.errors)
```
$\tt{cost} = 10$ seems to be performing better on test data, making the least number of classification errors. This is much smaller than optimal value of 10000 for training data.
### d
We again see an overfitting phenomenon for linear kernel. A large cost tries to fit correctly classify noisy-points and hence overfits the train data. A small cost, however, makes a few errors on the noisy test points and performs better on test data.