forked from mpeeples2008/ArchNetSci
-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-exploratory-analysis.Rmd
619 lines (441 loc) · 34.3 KB
/
03-exploratory-analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
# Exploratory Network Analysis{#Exploratory}
![](images/image_break.png){width=100%}
Exploratory network analysis is simply exploratory data analysis applied to network data. This covers a range of statistical and visual techniques designed to explore the structure of networks as well as the relative positions of nodes and edges. These methods can be used to look for particular structures or patterning of interest, such as the most central nodes, or to summarize and describe the structure of the network to paint a general picture of it before further analysis. This section serves as a companion to Chapter 4 in the Brughmans and Peeples book (2023) and provides basic examples of the exploratory network analysis methods outlined in the book as well as a few others.
Note that we have created a distinct section on [exponential random graph models (ERGM)](#ERGM) in the "Going Beyond the Book" section of this document as that approach necessitates extended discussion. We replicate the boxed example from Chapter 4 of the book in that section.
## Example Network Objects{#ExampleNetworkObjects}
In order to facilitate the exploratory analysis examples in this section, we want to first create a set of `igraph` and `statnet` network objects that will serve our purposes across all of the analyses below. Specifically, we will generate and define:
* **`simple_net`** - A simple undirected binary network with isolates
* **`simple_net_noiso`** - A simple undirected binary network without isolates
* **`directed_net`** - A directed binary network
* **`weighted_net`** - An undirected weighted network
* **`sim_net_i`** - A similarity network with edges weighted by similarity in the `igraph` format
* **`sim_net`** - A similarity network with edges weighted by similarity in the `network` format
* **`sim_mat`** - A data frame object containing a weighted similarity matrix
Each of these will be used as appropriate to illustrate particular methods.
In the following chunk of code we initialize all of the packages that we will use in this section and define all of the network objects that we will use (using the object names above). In these examples we will once again use the [Cibola technological similarity data](#Cibola) we used in the [Network Data Formats](#NetworkData) section previously.
```{r, warning=F, message=F, results=F}
# initialize packages
library(igraph)
library(statnet)
library(intergraph)
library(vegan)
# read in csv data
cibola_edgelist <-
read.csv(file = "data/Cibola_edgelist.csv", header = TRUE)
cibola_adj_mat <- read.csv(file = "data/Cibola_adj.csv",
header = T,
row.names = 1)
# Simple network with isolates
simple_net <-
igraph::graph_from_adjacency_matrix(as.matrix(cibola_adj_mat),
mode = "undirected")
# Simple network with no isolates
simple_net_noiso <-
igraph::graph_from_edgelist(as.matrix(cibola_edgelist),
directed = FALSE)
#Create a directed network by sub-sampling edge list
set.seed(45325)
el2 <- cibola_edgelist[sample(seq(1, nrow(cibola_edgelist)), 125,
replace = FALSE), ]
directed_net <- igraph::graph_from_edgelist(as.matrix(el2),
directed = TRUE)
# Create a weighted undirected network by adding column of random
# weights to edge list
cibola_edgelist$weight <- sample(seq(1, 4), nrow(cibola_edgelist),
replace = TRUE)
weighted_net <-
igraph::graph_from_edgelist(as.matrix(cibola_edgelist[, 1:2]),
directed = FALSE)
E(weighted_net)$weight <- cibola_edgelist$weight
# Create a similarity network using the Brainerd-Robinson metric
cibola_clust <-
read.csv(file = "data/Cibola_clust.csv",
header = TRUE,
row.names = 1)
clust_p <- prop.table(as.matrix(cibola_clust), margin = 1)
sim_mat <-
(2 - as.matrix(vegan::vegdist(clust_p, method = "manhattan"))) / 2
sim_net <- network(
sim_mat,
directed = FALSE,
ignore.eval = FALSE,
names.eval = "weight"
)
sim_net_i <- asIgraph(sim_net)
```
## Calculating Network Metrics in R {#CalcMetric}
Although the calculations behind the scenes for centrality metrics, clustering algorithms, and other network measures may be somewhat complicated, calculating these measures in R using network objects is usually quite straight forward and typically only involves a single function and a couple of arguments within it. There are, however, some things that need to be kept in mind when applying these methods to network data. In this document, we provide examples of some of the most common functions you may use as well as a few caveats and potential problems.
Certain network metrics require networks with specific properties and may produce unexpected results if the wrong kind of network is used. For example, closeness centrality is only well defined for binary networks that have no isolates. If you were to use the `igraph::closeness` command to calculate closeness centrality on a network with isolates, you would get results but you would also get a warning telling you "closeness centrality is not well-defined for disconnected graphs." For other functions if you provide data that does not meet the criteria required by that function you my instead get an error and have no results returned. In some cases, however, a function may simply return results and not provide any warning so it is important that you are careful when selecting methods to avoid providing data that violates assumptions of the method provided. Remember, that if you have questions about how a function works or what it requires you can type `?function_name` at the console with the function in question and you will get the help document that should provide more information. You can also include package names in the help call to ensure you get the correct function (i.e., `?igraph::degree`)
## Centrality{#Centrality}
One of the most common kinds of exploratory network analysis involves calculating basic network centrality and centralization statistics. There are a wide array of methods available in R through the `igraph` and `statnet` packages. In this section we highlight a few examples as well as a few caveats to keep in mind.
### Degree Centrality{#Degree}
Degree centrality can be calculated using the `igraph::degree` function for simple networks with or without isolates as well as simple directed networks. This method is not, however, appropriate for weighted networks or similarity networks (because it expects binary values). If you apply the `igraph::degree` function to a weighted network object you will simply get the binary network degree centrality values. The alternative for calculating weighted degree for weighted and similarity networks is to simply calculate the row sums of the underlying similarity matrix (minus 1 to account for self loops) or adjacency matrix. For the degree function the returned output is a vector of values representing degree centrality which can further be assigned to an R object, plotted, or otherwise used. We provide a few examples here to illustrate. Note that for directed graphs you can also specify the mode as `in` for in-degree or `out` for out-degree or `all` for the sum of both.
Graph level degree centralization is equally simple to call using the `centr_degree` function. This function returns an object with multiple parts including a vector of degree centrality scores, the graph level centralization metric, and the theoretical maximum number of edges (n \* [n-1]). This metric can be normalized such that the maximum centralization value would be 1 using the `normalize = TRUE` argument as we demonstrate below. See the comments in the code chunk below to follow along with which type of network object is used in each call. In most cases we only display the first 5 values to prevent long lists of output (using the `[1:5]` command).
```{r, warning=F, message=F}
# simple network with isolates
igraph::degree(simple_net)[1:5]
# simple network no isolates
igraph::degree(simple_net_noiso)[1:5]
# directed network
igraph::degree(directed_net, mode = "in")[1:5] # in-degree
igraph::degree(directed_net, mode = "out")[1:5] # out-degree
# weighted network - rowSums of adjacency matrix
(rowSums(as.matrix(
as_adjacency_matrix(weighted_net,
attr = "weight")
)) - 1)[1:5]
# similarity network. Note we use the similarity matrix here and
# not the network object
(rowSums(sim_mat) - 1)[1:5]
# If you want to normalize your degree centrality metric by the
# number of nodes present you can do that by adding the normalize=TRUE
# command to the function calls above. For weighted and similarity
# networks you can simply divide by the number of nodes minus 1.
igraph::degree(simple_net, normalize = T)[1:5]
# it is also possible to directly plot the degree distribution for
# a given network using the degree.distribution function.
# Here we embed that call directly in a call for a histogram plot
# using the "hist" function
hist(igraph::degree.distribution(simple_net))
# graph level centralization
igraph::centr_degree(simple_net)
# To calculate centralization score for a similarity matrix, use the
# sna::centralization function
sna::centralization(sim_mat, normalize = TRUE, sna::degree)
```
If you are interested in calculating graph level density you can do this using the `edge_density` function. Note that just like the degree function above, this only works for binary networks and if you submit a weighted network object you will simply get the binary edge density value.
```{r, warning=F, message=F}
edge_density(simple_net_noiso)
edge_density(weighted_net)
```
### Betweenness Centrality{#Betweenness}
The betweenness functions work very much like the degree function calls above. Betweenness centrality in `igraph` can be calculated for simple networks with and without isolates, directed networks, and weighted networks. In the case of weighted networks or similarity networks, the shortest paths between sets of nodes are calculated such that the path of greatest weight is taken at each juncture. You can normalize your results by using `normalize = TRUE` just like you could for degree. The `igraph::betweenness` function will automatically detect if a graph is directed or weighted and use the appropriate method but you can also specify a particular edge attribute to use for weight if you perhaps have more than one weighting scheme.
```{r}
# calculate betweenness for simple network
igraph::betweenness(simple_net)[1:5]
# calculate betweenness for weighted network
igraph::betweenness(weighted_net, directed = FALSE)[1:5]
# calculate betweenness for weighted network specifying weight attribute
igraph::betweenness(weighted_net, weights = E(weighted_net)$weight)[1:5]
# calculate graph level centralization
centr_betw(simple_net)
```
### Eigenvector Centrality{#Eigenvector}
The `igraph::eigen_centrality` function can be calculated for simple networks with and without isolates, directed networks, and weighted networks. By default scores are scaled such that the maximum score of 1. You can turn this scaling of by using the `scale = FALSE` argument. This function automatically detects whether a network object is directed or weighted but you can also call edge attributes to specify a particular weight attribute. By default this function outputs many other features of the analysis such as the number of steps toward convergence and the number of iterations but if you just want the centrality results you can use the atomic vector call to \$vector.
```{r}
eigen_centrality(simple_net,
scale = TRUE)$vector[1:5]
eigen_centrality(
weighted_net,
weights = E(weighted_net)$weight,
directed = FALSE,
scale = FALSE
)$vector[1:5]
```
### Page Rank Centrality{#PageRank}
The `igraph::page_rank` function can be calculated for simple networks with and without isolates, directed networks, and weighted networks. By default scores are scaled such that the maximum score is 1. You can turn this scaling off by using the `scale = FALSE` argument. This function automatically detects whether a network object is directed or weighted but you can also call edge attributes to specify a particular weight attribute. You can change the algorithm used to implement the page rank algorithm (see help for details) and can also change the damping factor if desired.
```{r, warning=F, message=F}
page_rank(directed_net,
directed = TRUE)$vector[1:5]
page_rank(
weighted_net,
weights = E(weighted_net)$weight,
directed = FALSE,
algo = "prpack"
)$vector[1:5]
```
### Closeness Centrality {#Closeness}
The `igraph::closeness` function calculates closeness centrality and can be calculated for directed and undirected simple or weighted networks with no isolates. This function can also be used for networks with isolates, but you may receive an additional message suggesting that closeness is undefined for networks that are not fully connected. For very large networks you can use the `igraph::estimate_closeness` function with a cutoff setting that will consider paths of length up to cutoff to calculate closeness scores. For directed networks you can also specify whether connections in, out, or in both directions should be used.
```{block, type="rmdwarning"}
Note that the function `igraph::closeness()` should not normally be used with networks with multiple components. Depending on your settings, however, the function call may not return an error so be careful.
```
Let's take a look at some examples:
```{r}
igraph::closeness(simple_net)[1:5]
igraph::closeness(simple_net_noiso)[1:5]
igraph::closeness(weighted_net, weights = E(weighted_net)$weight)[1:5]
igraph::closeness(directed_net, mode = "in")[1:5]
```
### Hubs and Authorities {#HubsAndAuthorities}
In directed networks it is possible to calculate hub and authority scores to identify nodes that are characterized by high in-degree and high out-degree in particular. Because this is a measure that depends on direction it is only appropriate for directed network objects. If you run this function for an undirected network hub scores and authority scores will be identical. These functions can also be applied networks that are both directed and weighted. If you do not want all options printed you can use the atomic vector \$vector call as well.
```{r}
igraph::hub_score(directed_net)$vector[1:5]
igraph::authority_score(directed_net)$vector[1:5]
```
## Triads and clustering{#TriadsAndClustering}
Another important topic in network science concerns considerations of the overall structure and clustering of connections across a network as a whole. There are a variety of methods which have been developed to characterize the overall degree of clustering and closure in networks, many of which are based on counting triads of various configurations. In this section, we briefly outline approaches toward evaluating triads, transitivity, and clustering in R.
### Triads{#Triads}
A triad is simply a set of three nodes and a description of the configuration of edges among them. For undirected graphs, there are four possibilities for describing the connections among those nodes (empty graph, 1 connection, 2 connections, 3 connections). For directed graphs the situation is considerably more complicated because ties can be considered in both directions and an edge in one direction isn't necessarily reciprocated. Thus there are 16 different configurations that can exist (see Brughmans and Peeples 2023: Figure 4.4).
One common method for outlining the overall structural properties of a network is to conduct a "triad census" which counts each of the 4 or 16 possible triads for a given network. Although a triad census can be conducted on an undirected network using the `igraph::triad_census` function, a warning will be returned along with 0 results for all impossible triad configurations so be aware. The results are returned as a vector of counts of each possible node configuration in an order outlined in the help document associated with the function (see `?triad_census` for more).
```{r}
igraph::triad_census(directed_net)
igraph::triad_census(simple_net)
```
Often can be useful to visualize the motifs defined for each entry in the triad census and this can be done using the `graph_from_isomorphism_class()` function which outputs every possible combination of nodes of a given size you specify (3 in this case). We can plot these configurations in a single plot using the `ggraph` and `ggpubr` packages. These packages are described in more detail in the visualization section of this document. We label each configuration using the "isomporhism code" that are frequently used to describe triads.
```{r, warning=F, message=F, fig.height=6, fig.width=6}
library(ggraph)
library(ggpubr)
g <- list()
xy <-
as.data.frame(matrix(
c(0.1, 0.1, 0.9, 0.1, 0.5, 0.45),
nrow = 3,
ncol = 2,
byrow = TRUE
))
names <- c("003", "012", "102", "021D", "021U", "021C",
"111D", "111U", "030T", "030C", "201", "120D",
"120U", "120C", "210", "300")
for (i in 0:15) {
g_temp <- graph_from_isomorphism_class(size = 3,
number = i,
directed = TRUE)
g[[i + 1]] <- ggraph(g_temp,
layout = "manual",
x = xy[, 1],
y = xy[, 2]) +
xlim(0, 1) +
ylim(0, 0.5) +
geom_node_point(size = 6, col = "purple") +
geom_edge_fan(
arrow = arrow(length = unit(4, "mm"),
type = "closed"),
end_cap = circle(6, "mm"),
start_cap = circle(6, "mm"),
edge_colour = "black"
) +
theme_graph(
plot_margin =
margin(2, 2, 2, 2)
) +
ggtitle(names[i + 1]) +
theme(plot.title = element_text(hjust = 0.5))
}
# motifs ordered by order in triad_census function
ggarrange(
g[[1]], g[[2]], g[[4]], g[[7]],
g[[3]], g[[5]], g[[6]], g[[10]],
g[[8]], g[[12]], g[[11]], g[[9]],
g[[14]], g[[13]], g[[15]], g[[16]],
nrow = 4,
ncol = 4
)
```
### Transitivity and Clustering{#Transitivity}
A network's global average transitivity (or clustering coefficient) is three times the number of closed triads over the total number of triads in a network. This measure can be calculated using `igraph::transitivity` for simple networks with or without isolates, directed networks, and weighted networks. There are options within the function to determine the specific type of transitivity (global transitivity is the default) and for how to treat isolates. See the help document (`?igraph::transitivity`) for more details. If you want to calculate local transitivity for a particular node you can use the `type = "local"` argument. This will return a `NA` value for nodes that are not part of any triads (isolates and nodes with a single connection).
```{r}
igraph::transitivity(simple_net, type = "global")
igraph::transitivity(simple_net, type = "local")
```
## Walks, Paths, and Distance{#WalksPathsDistance}
There are a variety of network metrics which rely on distance and paths across networks that can be calculated in R. There are a great many functions available and we highlight just a few here.
### Distance{#Distance}
In some cases, you may simply want information about the graph distance between nodes in general or perhaps the average distance. There are a variety of functions that can help with this including `igraph::distances` and `igraph::mean_distance`. These work on simple networks, directed networks, and weighted networks.
```{r}
# Create matrix of all distances among nodes and view the first
# few rows and columns
igraph::distances(simple_net)[1:4, 1:4]
# Calculate the mean distance for a network
igraph::mean_distance(simple_net)
```
### Shortest Paths{#ShortestPaths}
If you want to identify particular shortest paths to or from nodes in a network you can use the `igraph::shortest_paths` function or alternatively the `igraph::all_shortest_paths` if you want all shortest paths originating at a particular node. To call this function you simply need to provide a network object and an id for the origin and destination of the path. The simplest solution is just to call the node number. This function works with directed and undirected networks with or without weights. Although it can be applied to networks with isolates, the isolates themselves will produce `NA` results.
```{r}
# track shortest path from Apache Creek to Pueblo de los Muertos
igraph::shortest_paths(simple_net, from = 1, to = 21)
```
The output provides the ids for all nodes crossed in the path from origin to destination.
### Diameter{#Diameter}
The `igraph::diameter` function calculates the diameter of a network (the longest shortest path) and you can also use the `farthest_vertices` function to get the ids of the nodes that form the ends of that longest shortest path. This metric can be calculated for directed and undirected, weighted and unweighted networks, with or without isolates.
```{r}
igraph::diameter(directed_net, directed = TRUE)
igraph::farthest_vertices(directed_net, directed = TRUE)
```
## Components and Bridges{#ComponentsAndBridges}
Identifying fully connected subgraphs within a large network is a common analytical procedure and is quite straight forward in R using the igraph package. If you first want to know whether or not a given network is fully connected you can use the `igraph::is_connected` function to check.
```{r}
igraph::is_connected(simple_net)
igraph::is_connected(simple_net_noiso)
```
You can also count components using the `count_components` function.
```{r}
igraph::count_components(simple_net)
```
### Identifying Components{#IdentifyingComponents}
If you want to decompose a network object into its distinct components you can use the `igraph::decompose` function which outputs a list object with each entry representing a distinct component. Each object in the list can then be called using `[[k]]` where `k` is the number of the item in the list.
```{r}
components <- igraph::decompose(simple_net, min.vertices = 1)
components
V(components[[2]])$name
```
In the example here this network is fully connected with the exception of 1 node (WS Ranch). When you run the decompose function it separates WS ranch into a component as an isolate with no edges.
### Cutpoints{#Cutpoints}
A cutpoint is a node, the removal which creates a network with a higher number of components. There is not a convenient igraph function for identifying cutpoints but there is a function in the `sna` package within the `statnet` suite. Using the `intergraph` package we can easily convert an `igraph` object to a `network` object (using the `asNetwork` function) within the call to use this function.
The `sna::cutpoint` function returns the node id for any cutpoints detected. We can use the numbers returned to find the name of the node in question.
```{r}
cut_p <- cutpoints(asNetwork(simple_net))
cut_p
V(simple_net)$name[cut_p]
set.seed(4536)
plot(simple_net)
```
The example here reveals that Ojo Bonito is a cutpoint and if we look at the figure we can see that it is the sole connection with Baca Pueblo which would otherwise become and isolate and distinct component if Ojo Bonito were removed.
### Bridges{#Bridges}
A bridge is an edge, the removal of which results in a network with a higher number of components. The function igraph::min_cut finds bridges in network objects for sets of nodes or for the graph as a whole. The output of this function includes a vector called \$cut which provides the edges representing bridges. By default this function only outputs the cut value but you can use the argument `value.only = FALSE` to get the full output.
```{r}
min_cut(simple_net_noiso, value.only = FALSE)
```
As this example illustrates the edge between Ojo Bonito and Baca Pueblo is a bridge (perhaps not surprising as Ojo Bonito was a cut point).
## Cliques and Communities{#CliquesAndCommunities}
Another very common task in network analysis involves creating cohesive sub-groups of nodes in a larger network. There are wide variety of methods available for defining such groups and we highlight a few of the most common here.
### Cliques{#Cliques}
A clique as a network science concept is arguably the strictest method of defining a cohesive subgroup. It is any set of three or more nodes in which each node is directly connected to all other nodes. It can be alternatively defined as a completely connected subnetwork, or a subnetwork with maximum density. The function `igraph::max_cliques` finds all maximal cliques in a network and outputs a list object with nodes in each set indicated. For the sake of space here we only output one clique of the 24 that were defined by this function call.
```{r}
max_cliques(simple_net, min = 1)[[24]]
```
Note in this list that the same node can appear in more than one maximal clique.
### K-cores{#KCores}
A k-core is a maximal subnetwork in which each vertex has at least degree k within the subnetwork. In R this can be obtained using the `igraph::coreness` function and the filtering by value as appropriate. This function creates a vector of k values which can then be used to remove nodes as appropriate or symbolize them in plots.
```{r}
# Define coreness of each node
kcore <- coreness(simple_net)
kcore[1:6]
# set up color scale
col_set <- heat.colors(max(kcore), rev = TRUE)
set.seed(2509)
plot(simple_net, vertex.color = col_set[kcore])
```
In the plot shown here the darker read colors represent higher maximal k-core values.
### Cluster Detection Algorithms{#ClusterDetection}
R allows you to use a variety of common cluster detection algorithms to define groups of nodes in a network using a variety of different assumptions. We highlight a few of the most common here.
#### Girvan-Newman Clustering{#GirvanNewman}
Girvan-Newman clustering is a divisive algorithm based on betweenness that defines a partition of network that maximizes modularity by removing nodes with high betweenness iteratively (see discussion in Brughmans and Peeples 2023 Chapter 4.6). In R this is referred to as the `igraph::edge.betweenness.community` function. This function can be used on directed or undirected networks with or without edge weights. This function outputs a variety of information including individual edge betweenness scores, modularity information, and partition membership. See the help documents for more information
```{r}
gn <- igraph::edge.betweenness.community(simple_net)
set.seed(4353)
plot(simple_net, vertex.color = gn$membership)
```
#### Walktrap Algorithm{#Walktrap}
The walktrap algorithm is designed to work for either binary or weighted networks and defines communities by generating a large number of short random walks and determining which sets of nodes consistently fall along the same short random walks. This can called using the `igraph::cluster_walktrap` function. The "steps" argument determines the length of the short walks and is set to 4 by default.
```{r}
wt <- igraph::cluster_walktrap(simple_net, steps = 4)
set.seed(4353)
plot(simple_net, vertex.color = wt$membership)
```
#### Louvain Modularity{#Louvain}
Louvain modularity is a cluster detection algorithm based on modularity. The algorithm iteratively moves nodes among community definitions in a way that optimizes modularity. This measure can be calculated on simple networks, directed networks, and weighted networks and is implemented in R through the `igraph::cluster_louvain` function.
```{r}
lv <- igraph::cluster_louvain(simple_net)
set.seed(4353)
plot(simple_net, vertex.color = lv$membership)
```
#### Calculating Modularity for Partitions{#Modularity}
If you would like to compare modularity scores among partitions of the same graph, this can be achieved using the `igraph::modularity` function. In the modularity call you simply supply an argument indicating the partition membership for each node. Note that this can also be used for attribute data such as regional designations. In the following chunk of code we will compare modularity for each of the clustering methods described above as well using subregion designations [from the original Cibola region attribute data](data/Cibola_attr.csv)
```{r}
# Modularity for Girvan-Newman
modularity(simple_net, membership = membership(gn))
# Modularity for walktrap
modularity(simple_net, membership = membership(wt))
# Modularity for Louvain clustering
modularity(simple_net, membership = membership(lv))
# Modularity for subregion
cibola_attr <- read.csv("data/Cibola_attr.csv")
modularity(simple_net, membership = as.factor(cibola_attr$Region))
```
Note that although modularity can be useful in comparing among partitions like this approach has been shown to be poor at detecting small communities within a network so will not always be appropriate.
#### Finding Edges Within and Between Communities{#FindingEdgesBetween}
In many cases you may be interested in identifying edges that remain within or extend between some network partition. This can be done using the `igraph::crossing` function. This function expects a igraph cluster definition object and an igraph network and will return a list of `TRUE` and `FALSE` values for each edge where true indicates an edge that extends beyond the cluster assigned to the nodes. Let's take a look at the first 10 edges in our simple_net object based on the Louvain cluster definition.
```{r}
igraph::crossing(lv, simple_net)[1:6]
```
Beyond this, if you plot an igraph object and add a cluster definition to the call it will produce a network graph with the clusters outlined and with nodes that extend between clusters shown in red.
```{r}
set.seed(54)
plot(lv, simple_net)
```
## Case Study: Roman Roads{#ExploratoryRomanRoads}
In the case study provided at the end of Chapter 4 of Brughmans and Peeples (2023) we take a simple network based on [Roman era roads](#RomanRoads) and spatial proximity of settlements in the Iberian Peninsula and calculate some basic exploratory network statistics. As described in the book, we can create different definitions and criteria for network edges and these can have impacts on the network and node level properties. In this case, we define three different networks as follows:
* **`road_net`** - A basic network where every road connecting two settlements is an edge
* **`road_net2`** - A network that retains all of the ties of the above network but also connects isolated nodes that are within 50 Kms of one of the road network settlements
* **`road_net3`** - A network that retains all of the ties of the first road network but connects each isolate to its nearest neighbor among the road network settlements
First let's read in the [data file](data/road_networks.RData) that contains all three networks and start by plotting them in turn on a map. We are using a custom network map function here that is save in a file called [map_net.R](scripts/map_net.R) that takes locations with decimal degrees locations and plots a network directly on a map. We will go over the specifics of the function in more detail in the [Network Visualization](#Visualization) and [Spatial Networks](#SpatialNetworks) sections but here we simply call the script directly from the .R file. Make sure you have the libraries initialized below to replicate this map.
```{r iberian_roads, warning=F, message=F, cache=T}
library(igraph)
library(ggmap)
library(sf)
library(dplyr)
# Read in required data
load("data/road_networks.RData")
source("scripts/map_net.R")
# Create Basic network map
map_net(
nodes = nodes,
net = road_net,
bounds = c(-9.5, 36, 3, 43.8),
gg_maptype = "watercolor",
zoom_lev = 6,
map_title = "Basic Network"
)
# Create Basic network map
map_net(
nodes = nodes,
net = road_net2,
bounds = c(-9.5, 36, 3, 43.8),
gg_maptype = "watercolor",
zoom_lev = 6,
map_title = "Basic Network+ 50Km Buffer"
)
# Create Basic network map
map_net(
nodes = nodes,
net = road_net3,
bounds = c(-9.5, 36, 3, 43.8),
gg_maptype = "watercolor",
zoom_lev = 6,
map_title = "Basic Network + Nearest Neighbor Isolates"
)
```
Now that we've replicated the visuals, we want to replicate network statistics. Since we're going to calculate several of the same network statistics for the networks in question, we can wrap this all into a function to save a bit of time. The following function expects an `igraph` network object and calculates each of the 10 variables show in the example in the book and returns them as a matrix.
Although the function below is somewhat long, it is very simple. It defines a function with a single argument `net` which is an `igraph` network object. It then creates an output matrix called `out` with the appropriate number of rows and columns and then populates the first column with the name of each measure. Next each network measure is evaluated in turn and assigned to the appropriate row in column 2 of the `out` matrix. Finally, the full matrix is returned: `return(out)`.
```{r}
library(igraph)
net_stats <- function(net) {
out <- matrix(NA, 10, 2)
out[, 1] <- c("Nodes", "Edges", "Isolates", "Density", "Average Degree",
"Average Shortest Path", "Diamater",
"Clustering Coefficient", "Closed Triad Count",
"Open Triad Count")
# number of nodes
out[1, 2] <- vcount(net)
# number of edges
out[2, 2] <- ecount(net)
# number of isolates
out[3, 2] <- sum(igraph::degree(net) == 0)
# network density rounding to the third digit
out[4, 2] <- round(edge_density(net), 3)
# mean degree rounding to the third digit
out[5, 2] <- round(mean(igraph::degree(net)), 3)
# mean shortest path length rounding to the third digit
out[6, 2] <- round(igraph::mean_distance(net), 3)
# network diameter
out[7, 2] <- igraph::diameter(net)
# average global transitivity rounding to the third digit
out[8, 2] <- round(igraph::transitivity(net, type = "average"), 3)
# closed triads in triad_census
out[9, 2] <- igraph::triad_census(net)[16]
# open triads in triad_census
out[10, 2] <- igraph::triad_census(net)[11]
return(out)
}
```
Now let's run it for each of the three networks in turn to reproduce the results in the book. We then combine the results into a single table that is nicely formatted using the `kable` function. If you'd prefer you can simply view the results of `net_stats()` right at the console.
```{r, warning=F, message=F}
ns1 <- net_stats(road_net)
ns2 <- net_stats(road_net2)
ns3 <- net_stats(road_net3)
ns_res <- cbind(ns1, ns2[, 2], ns3[, 2])
colnames(ns_res) <- c("Measure", "Basic Network", "50 Km Buffer",
"Nearest Neighbors")
knitr::kable(ns_res, format = "html")
```
For an extended discussion of the Cranborne Chase case study and exponential random graph models [click here](#ERGM)