forked from statOmics/HDDA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Lab1-Intro-SVD.Rmd
executable file
·674 lines (488 loc) · 18.6 KB
/
Lab1-Intro-SVD.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
---
title: "Lab 1: Introduction and Singular Value Decomposition"
subtitle: "High Dimensional Data Analysis practicals"
author: "Milan Malfait"
date: "7 Feb 2022 <br/> (Last updated: 2022-02-07)"
---
```{r setup, include=FALSE, cache=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
options(width = 80)
```
### [Change log](https://github.com/statOmics/HDDA/commits/master/Lab1-Intro-SVD.Rmd) {-}
***
# Introduction
The purpose of the following exercises is mainly to get more familiar with SVD and its applications.
It is recommended to perform the exercises in an `RMarkdown` document.
For a brief introduction to RMarkdown, see [Introduction to RMarkdown](./Introduction-RMarkdown.html).
For an introduction to working with matrices in R, see [Working with Matrices in R](./Introduction-Matrices-R.html).
## Libraries {-}
Packages used in this document.
Installation code is commented, uncomment and paste this code in an R console to install the packages.
```{r libraries, message=FALSE, warning=FALSE}
## Install necessary packages with:
# install.packages("tidyverse")
# if (!requireNamespace("remotes", quietly = TRUE)) {
# install.packages("remotes")
# }
# remotes::install_github("statOmics/HDDAData")
library(tidyverse)
library(HDDAData)
```
# Exercises
## Cheese data
### Data prep {-}
Load in the `cheese` data, which characterizes 30 cheeses on various metrics.
More information can be found at `?cheese`
```{r}
## Load the 'cheese' dataset from the HDDAData package
data("cheese")
cheese
```
>__Q__: what is the *dimensionality* of this data?
<details><summary>Answer</summary>
__A__: 4, since we have 4 features (the first column is just an identifier for the observation, so we don't regard this as a feature).
```{r}
ncol(cheese) - 1
```
</details>
Convert the `cheese` table to a matrix for easier calculations.
We will drop the first column (as it's not a feature) and instead use it to create `rownames`.
```{r}
cheese_mx <- as.matrix(cheese[, -1])
rownames(cheese_mx) <- paste("case", cheese$Case, sep = "_")
cheese_mx
```
Check rank of `cheese_mx`.
```{r}
qr(cheese_mx)$rank
```
We will now **center** the data matrix through the $\mathbf{H}$ matrix:
$$ \mathbf{H} = \mathbf{I}_{n \times n} - \frac{1}{n} \mathbf{1}_n\mathbf{1}_n^T $$
```{r cheese-H_matrix}
n <- nrow(cheese_mx)
## 11^T
## Alternatively: one_mat <- rep(1, n) %o% rep(1, n)
(one_mat <- matrix(rep(1, n * n), ncol = n, nrow = n))
## Calculate H, diag(n) is the nxn identity matrix
cheese_H <- diag(n) - (1/n) * one_mat
cheese_H[1:8, 1:8] # showing subset of H
# Centering the data matrix
cheese_centered <- cheese_H %*% cheese_mx
## Note that using `scale(X, center = TRUE, scale = FALSE)` is much more
## efficient to center a matrix
## Verify colMeans are 0
round(colMeans(cheese_centered), 14)
```
### Tasks {-}
*Note*: no need for mathematical derivations, just verify code-wise in R.
We obtained the column-centered data matrix $\mathbf{X}$ after multiplying the original matrix with
$$
\mathbf{H} = \mathbf{I} - \frac{1}{n} \mathbf{1}\mathbf{1}^T
$$
##### 1. Show that $\mathbf{X}$ (here: `cheese_centered`) is indeed column-centered (and not row-centered) {-}
<details><summary>Solution</summary>
```{r}
# X is indeed column-centered:
round(colMeans(cheese_centered), 14) ## practically zero
# but it is not row-centered:
rowMeans(cheese_centered)
```
</details>
##### 2. Verify that whenever $\mathbf{X}$ is column-centered, the equality $\mathbf{HX = X}$ holds {-}
<details><summary>Solution</summary>
```{r}
# verifying that HX = X
all.equal(cheese_H %*% cheese_centered, cheese_centered)
```
</details>
##### 3. Perform an SVD on `cheese_centered`, and store the matrices $\mathbf{U}$, $\mathbf{V}$ and $\mathbf{\Delta}$ as separate objects {-}
<details><summary>Solution</summary>
```{r cheese_svd}
cheese_svd <- svd(cheese_centered)
str(cheese_svd)
U <- cheese_svd$u
V <- cheese_svd$v
D <- diag(cheese_svd$d)
```
</details>
##### 4. Show that $\mathbf{u_1}$ is a normalized vector; show the same for $\mathbf{u_2}$. Show that $\mathbf{u_1}$ and $\mathbf{u_2}$ are orthogonal vectors. Then show the orthonormality of all vectors $\mathbf{u_j}$ in a single calculation (using the matrix $\mathbf{U}$). Similarly, show the orthonormality of all vectors $\mathbf{v_j}$ in a single calculation (using the matrix $\mathbf{V}$). {-}
<details><summary>Solution</summary>
```{r}
# Verifying orthonormality
# ------------------------
# The vectors u1 and u2 are orthonormal
t(U[, 1]) %*% U[, 1]
t(U[, 2]) %*% U[, 2]
t(U[, 1]) %*% U[, 2]
# Verifying that U forms an orthonormal basis in one step:
t(U) %*% U # computational imperfections
round(t(U) %*% U, digits = 15)
# Verifying that V forms an orthonormal basis:
t(V) %*% V # computational imperfections
round(t(V) %*% V, digits = 15)
```
</details>
##### 5. Check that the SVD was performed correctly, i.e. calculate the matrix $\mathbf{X}$ from the elements of the SVD. {-}
<details><summary>Solution</summary>
There are 2 ways to do this
- Using the sum definition of the SVD
$\mathbf{X} = \sum_{j=1}^r \delta_j \mathbf{u}_j\mathbf{v}_j^T$
```{r}
# Calculating X via the sum definition of the SVD:
# ------------------------------------------------
## Initialize empty matrix
X_sum <- matrix(0, nrow = nrow(U), ncol = ncol(V))
## Compute sum by looping over columns
for (j in 1:ncol(U)) {
X_sum <- X_sum + (diag(D)[j] * U[, j] %*% t(V[, j]))
}
```
- using the matrix notation of the SVD
$\mathbf{X}=\mathbf{U}_{n\times n}\boldsymbol{\Delta}_{n\times p}\mathbf{V}^T_{p \times p}$
```{r}
# Calculating X via the SVD matrix multiplication:
# ------------------------------------------------
X_mult <- U %*% D %*% t(V)
```
- Verify that the obtained results are identical to the matrix $\mathbf{X}$.
```{r}
## Remove dimnames with unname for comparison
all.equal(X_sum, unname(cheese_centered))
all.equal(X_mult, unname(cheese_centered))
```
</details>
##### 6. Approximate the matrix $\mathbf{\tilde{\mathbf{X}}}$, for $k = 2$ using the truncated SVD. {-}
<details><summary>Solution</summary>
Using the matrix notation of the SVD $\tilde{\mathbf{X}}=\mathbf{U}_{n\times k}\boldsymbol{\Delta}_{k\times k}\mathbf{V}_{p \times k}^T$
```{r}
k <- 2
X_tilde <- U[, 1:k] %*% D[1:k,1:k] %*% t(V[,1:k])
X_tilde
```
- Compare the obtained results with the matrix $\mathbf{X}$ (`cheese_centered`).
Just at a first glance, does it seem that $\mathbf{\tilde{X}}$ is a good approximation of $\mathbf{X}$?
##### 7. SVD and linear regression: perform a linear regression using SVD to estimate the effects of the `Acetic`, `H2S` and `Lactic` variables on the `taste`. {-}
Note: we cannot use the SVD from before as this was calculated from the complete `cheese` table, also including the `taste` column, which is the response variable of interest here.
Instead, we need to create a new design matrix $\mathbf{X}$ containing the predictors and a separate vector $\mathbf{y}$ containing the response.
```{r}
cheese_y <- cheese$taste
cheese_design <- cbind(Intercept = 1, cheese[c("Acetic", "H2S", "Lactic")])
```
Also perform the regression with `lm` and compare the results.
<details><summary>Solution</summary>
```{r}
## Fit with lm
lm_fit <- lm(taste ~ Acetic + H2S + Lactic, data = cheese)
## Fit with SVD
design_svd <- svd(cheese_design)
svd_coef <- design_svd$v %*% diag(1/design_svd$d) %*% t(design_svd$u) %*% cheese_y
## Compare
cbind(
"lm" = coef(lm_fit),
"svd" = drop(svd_coef)
)
```
</details>
## Exercise: employment by industry in European countries
In this exercise we will focus on the interpretation of the *biplot*.
### Data prep {-}
The `"industries"` dataset contains data on the distribution of employment between 9 industrial
sectors, in 26 European countries. The dataset stems from the Cold-War era; the data are expressed
as percentages. Load the data and explore its contents.
```{r read-industries-data}
## Load 'industries' data from the HDDAData package
data("industries")
# Explore contents
industries
dim(industries)
summary(industries)
```
Create data matrix $\mathbf{X}$.
```{r}
# Create matrix without first column ("country", which will be used for rownames)
indus_X <- as.matrix(industries[, -1])
rownames(indus_X) <- industries$country
# Check the dimensionality
dim(indus_X)
# and the rank
qr(indus_X)$rank
# n will be used subsequently
n <- nrow(indus_X)
```
### Tasks {-}
##### 1. Perform a truncated SVD for $k=2$, and construct the biplot accordingly. {-}
<details><summary>Solution</summary>
```{r}
# Centering the data matrix first
# H <- diag(n) - 1 / n * matrix(1, ncol = n, nrow = n)
# indus_centered <- H %*% as.matrix(indus_X)
indus_centered <- scale(indus_X, scale = FALSE)
```
```{r}
# Perform SVD
indus_svd <- svd(indus_centered)
str(indus_svd)
# Extract singular vectors for k = 2 and calculate k=2 projection Zk
k <- 2
Uk <- indus_svd$u[ , 1:k]
Dk <- diag(indus_svd$d[1:k])
Vk <- indus_svd$v[, 1:k]
rownames(Vk) <- colnames(indus_X)
colnames(Vk) <- c("V1", "V2")
Zk <- Uk %*% Dk
rownames(Zk) <- industries$country
colnames(Zk) <- c("Z1", "Z2")
Zk
```
Biplot with *ggplot2*:
```{r, fig.width=8, fig.height=6}
## Scale factor to draw Vk arrows (can be set arbitrarily)
scale_factor <- 20
## Create tibble with rownames in "country" column
as_tibble(Zk, rownames = "country") %>%
ggplot(aes(Z1, Z2)) +
geom_point() +
geom_text(aes(label = country), size = 3, nudge_y = 0.5) +
## Plot Singular vectors Vk
geom_segment(
data = as_tibble(Vk, rownames = "sector"),
aes(x = 0, y = 0, xend = V1 * scale_factor, yend = V2 * scale_factor),
arrow = arrow(length = unit(0.4, "cm")),
color = "firebrick"
) +
geom_text(
data = as_tibble(Vk, rownames = "sector"),
aes(V1 * scale_factor, V2 * scale_factor, label = sector),
nudge_x = 0.5, nudge_y = ifelse(Vk[, 2] >= 0, 0.5, -0.5),
color = "firebrick", size = 3
) +
theme_minimal()
```
Using base R:
```{r, fig.width=8, fig.height=6}
# # Constructing the biplot for Z1 and Z2
# # -------------------------------------
plot(Zk[, 1:2],
type = "n", xlim = c(-30, 60), ylim = c(-15, 15),
xlab = "Z1", ylab = "Z2"
)
text(Zk[, 1:2], rownames(Zk), cex = 0.9)
# alpha <- 1
alpha <- 20 # rescaling to get better visualisation
for (i in 1:9) {
arrows(0, 0, alpha * Vk[i, 1], alpha * Vk[i, 2], length = 0.2, col = 2)
text(alpha * Vk[i, 1], alpha * Vk[i, 2], rownames(Vk)[i], col = 2)
}
```
</details>
##### 2. To see if we can learn more when retaining more dimensions, repeat the truncated SVD for $k=3$. Construct two-dimensional biplots for: {-}
- Z1 and Z3
- Z2 and Z3
<details><summary>Solution</summary>
No need to re-do SVD, just extract singular vectors for $k=3$ from previous SVD.
```{r}
# Extract singular vectors for k = 3 and calculate projection Zk
k <- 3
Uk <- indus_svd$u[ , 1:k]
Dk <- diag(indus_svd$d[1:k])
Vk <- indus_svd$v[, 1:k]
rownames(Vk) <- colnames(indus_X)
colnames(Vk) <- c("V1", "V2", "V3")
Zk <- Uk %*% Dk
rownames(Zk) <- industries$country
colnames(Zk) <- c("Z1", "Z2", "Z3")
Zk
```
Create biplot as before.
- Z1 *vs.* Z3
```{r, fig.width=8, fig.height=6}
## Scale factor to draw Vk arrows (can be set arbitrarily)
scale_factor <- 20
## Create tibble with rownames in "country" column
as_tibble(Zk, rownames = "country") %>%
ggplot(aes(Z1, Z3)) +
geom_point() +
geom_text(aes(label = country), size = 3, nudge_y = 0.5) +
## Plot Singular vectors Vk
geom_segment(
data = as_tibble(Vk, rownames = "sector"),
aes(x = 0, y = 0, xend = V1 * scale_factor, yend = V3 * scale_factor),
arrow = arrow(length = unit(0.4, "cm")),
color = "firebrick"
) +
geom_text(
data = as_tibble(Vk, rownames = "sector"),
aes(V1 * scale_factor, V3 * scale_factor, label = sector),
nudge_x = 0.5, nudge_y = ifelse(Vk[, 3] >= 0, 0.5, -0.5),
color = "firebrick", size = 3
) +
theme_minimal()
```
- Z2 *vs.* Z3
```{r, fig.width=8, fig.height=6}
# Scale factor to draw Vk arrows (can be set arbitrarily)
scale_factor <- 20
## Create tibble with rownames in "country" column
as_tibble(Zk, rownames = "country") %>%
ggplot(aes(Z2, Z3)) +
geom_point() +
geom_text(aes(label = country), size = 3, nudge_y = 0.5) +
## Plot Singular vectors Vk
geom_segment(
data = as_tibble(Vk, rownames = "sector"),
aes(x = 0, y = 0, xend = V2 * scale_factor, yend = V3 * scale_factor),
arrow = arrow(length = unit(0.4, "cm")),
color = "firebrick"
) +
geom_text(
data = as_tibble(Vk, rownames = "sector"),
aes(V2 * scale_factor, V3 * scale_factor, label = sector),
nudge_x = 0.5, nudge_y = ifelse(Vk[, 3] >= 0, 0.5, -0.5),
color = "firebrick", size = 3
) +
theme_minimal()
```
##### 3. Can you give a meaningful interpretation to each dimension? {-}
# Multidimensional Scaling (MDS) demonstration
See [course notes](https://statomics.github.io/HDDA/svd.html#7_SVD_and_Multi-Dimensional_Scaling_(MDS)) for background.
* We will use `UScitiesD` data as an example
* Our goal is to use the distance matrix $\mathbf D_X$ without knowledge of $\mathbf X$ to represent the rows of $\mathbf X$ in a low dimensional space, say 2D or 3D.
* We search for $\mathbf V_k$ that orthogonally projects the rows of $\mathbf X$ ($\mathbf x^T_i$) onto a $k$-dimensional space spanned by the columns of $\mathbf V_k$. In fact we are looking for $\mathbf Z_k$, such that $\mathbf Z_k=\mathbf X \mathbf V_k$
* But we do not know $\mathbf X$, so how do we get $\mathbf Z_k$? We will use the $\mathbf G_X$ (gram matrix) trick, mentioned in the course notes
## Example: Distances between US cities
As an example, we will use the `UScitiesD` data set, which is part of base R.
This data gives "straight line" distances (in km) between 10 cities in the US.
```{r, R.options=list(width=100)}
UScitiesD
class(UScitiesD)
```
Note that the `UScitiesD` object is of class `"dist"`, which is a special type of object to represent that it is a __distance matrix__ (we'll denote this as $\mathbf{D}_X$), i.e. the result from computing distances from an original matrix $\mathbf{X}$.
In this case, the original matrix $\mathbf{X}$ was likely a matrix with a row for every city and columns specifying its coordinates.
Note though that we don't know $\mathbf{X}$ exactly.
Still, we can use the distance matrix and MDS to approximate a low-dimensional representation of $\mathbf{X}$.
### Exploring the distance matrix
We first convert the `UScitiesD` to a matrix for easier manipulation and calculation.
Note that this creates a "symmetrical" matrix, with 0s on the diagonal (distance of a city to itself).
```{r}
(dist_mx <- as.matrix(UScitiesD))
```
The dimensions of `dist_mx`:
```{r}
# 10 x 10 square matrix
dim(dist_mx)
```
And the rank of `dist_mx`
```{r}
qr(dist_mx)$rank
```
>Q: is this matrix of full rank?
<details><summary>Answer</summary>
A: Yes, it is.
```{r}
qr(dist_mx)$rank == min(dim(dist_mx))
```
</details>
### $\mathbf{H}$ and $\mathbf{G}_X$ matrices
Now let's create the $\mathbf H$ matrix.
$$ \mathbf{H} = \mathbf{I}_{n \times n} - \frac{1}{n} \mathbf{1}_n\mathbf{1}_n^T $$
```{r H_matrix}
n <- nrow(dist_mx)
## 11^T
## Alternatively: one_mat <- rep(1, n) %o% rep(1, n)
(one_mat <- matrix(rep(1, n * n), ncol = n, nrow = n))
## Calculate H, diag(n) is the nxn identity matrix
(H <- diag(n) - (1/n) * one_mat)
```
We can use $\mathbf{H}$ to center our distance matrix:
```{r dist_mx_centered}
(dist_mx_centered <- H %*% dist_mx)
## Verify colMeans are 0
round(colMeans(dist_mx_centered), 8)
## Note that using `scale(X, center = TRUE, scale = FALSE)` is much more efficient
## to center a matrix
## Here we use the approach with H because we need it further on
```
We will use this matrix to calculate $\mathbf{G}_X$ (Gram matrix of $\mathbf{X}$).
$$
\mathbf{G}_X = -\frac{1}{2}\mathbf{H}\mathbf{D}_X\mathbf{H} = \mathbf{X}\mathbf{X}^T
$$
Where $\mathbf{D}_X$ is the matrix of __*squared* distances__.
So we will first have to square our `dist_mx`.
```{r Gram_matrix}
## D_X = squared distance matrix
D_X <- dist_mx ^ 2
## Gram matrix
(G_X <- -1/2 * H %*% (D_X) %*% H)
```
### The SVD
We can now compute the SVD of the Gram matrix and use it to project our original matrix $\mathbf{X}$ (which is still unknown to us!) into a lower dimensional space while preserving the Euclidean distances as well as possible.
This is the essence of MDS.
```{r GX matrix}
## singular value decomposition on gram matrix
Gx_svd <- svd(G_X)
## Use `str` to explore structure of the SVD object
str(Gx_svd)
```
Components of the `Gx_svd` object:
- `Gx_svd$d`: diagonal elements of the $\mathbf{\Delta}$ matrix, to recreate the matrix, use the `diag()` function
- `Gx_svd$u`: the matrix $\mathbf{U}$ of left singular vectors
- `Gx_svd$v`: the matrix $\mathbf{V}$ of right singular vectors
### Truncated SVD and projection into lower dimensional space
The truncated SVD from the Gram matrix can be used to find projections $Z_k$ of $\mathbf{X}$ in a lower dimensional space.
Here we will use $k = 2$.
```{r}
# k=2 approximation
k <- 2
Uk <- Gx_svd$u[, 1:k]
delta_k <- diag(Gx_svd$d[1:k])
Zk <- Uk %*% sqrt(delta_k)
rownames(Zk) <- colnames(D_X)
colnames(Zk) <- c("Z1", "Z2")
Zk
# Plotting Zk in 2-D
## Using base R
# plot(Zk, type = "n", xlab = "Z1", ylab = "Z2", xlim = c(-1500, 1500))
# text(Zk, rownames(Zk), cex = 1.25)
## Using ggplot, by first converting Zk to a tibble
Zk %>%
# create tibble and change rownames to column named "city"
as_tibble(rownames = "city") %>%
ggplot(aes(Z1, Z2, label = city)) +
geom_point() +
# adding the city names as label
geom_text(nudge_y = 50) +
# setting limits of the x-axis to center the plot around 0
xlim(c(-1500, 1500)) +
ggtitle("MDS plot of the UScitiesD data") +
theme_minimal()
```
What can you say about the plot?
Think about the real locations of these cities on a map of the US.
<details><summary>Answer</summary>
$Z_1$ can be interpreted as the *longitude*, i.e. the East-West position.
$Z_2$ reflects the *latitude*, or the North-South position.
</details>
## The short way
The calculations above demonstrate how MDS works and what the underlying components are.
However, in a real data analysis, one would typically not go through all the hassle of calculating all the intermediate steps.
Fortunately, the MDS is already implemented in base R (in the `stats` package).
So the whole derivation we did above can be reproduced with a single line of code, using the `cmdscale` function (see `?cmdscale` for details).
```{r}
## Calculate MDS in 2 dimensions from distance matrix
(us_mds <- cmdscale(UScitiesD, k = 2))
colnames(us_mds) <- c("Z1", "Z2")
## Plot MDS
us_mds %>%
as_tibble(rownames = "city") %>%
ggplot(aes(Z1, Z2, label = city)) +
geom_point() +
geom_text(nudge_y = 50) +
xlim(c(-1500, 1500)) +
theme_minimal()
```
Which gives us the same result as before (which is a good check that we didn't make mistakes!).
```{r, child="_session-info.Rmd"}
```