diff --git a/01-introduction.Rmd b/01-introduction.Rmd index 8a9869a46..f6726c8dc 100644 --- a/01-introduction.Rmd +++ b/01-introduction.Rmd @@ -21,7 +21,7 @@ Over the last few decades free and open source software for geospatial (FOSS4G\i Thanks to organizations such as OSGeo, advanced geographic techniques are no longer the preserve of those with expensive hardware and software: anyone can now download and run high-performance software for geocomputation. Open source Geographic Information Systems (GIS\index{GIS}), such as [QGIS](https://qgis.org/en/site/)\index{QGIS}, have made geographic analysis accessible worldwide. GIS software products are powerful, but tend to emphasize a graphical user interface\index{graphical user interface} (GUI) approach over the command-line interface (CLI) approach advocated in this book. -The 'GUI-focus' of many GIS products has unintended consequence of disabling many users from making their work full reproducible\index{reproducibility}, a problem that can be overcome by calling 'geoalgorithms' contained in GIS software from the command line, as we'll see in Chapter \@ref(gis)). +The 'GUI-focus' of many GIS products has unintended consequence of disabling many users from making their work full reproducible\index{reproducibility}, a problem that can be overcome by calling 'geoalgorithms' contained in GIS software from the command line, as we'll see in Chapter \@ref(gis). A simplistic comparison between the different approaches is illustrated in Table \@ref(tab:gdsl). ```{r gdsl, echo=FALSE, message=FALSE} @@ -35,7 +35,7 @@ knitr::kable(x = d, ``` R is not the only language providing a CLI for geocomputation. -Other command environments with powerful geographic capabilities exist, including Python, Julia, and JavaScript. +Other command environments with powerful geographic capabilities exist, including Python\index{Python}, Julia, and JavaScript. We encourage curious readers to give them a try and to reproduce the examples in this book in other languages, perhaps with reference to the book [Geocomputation with Python](https://py.geocompx.org/), the open access version of which is also hosted on [geocompx.org](https://geocompx.org/). However, R has some advantages that make it well-suited to geocomputation. The 'R-spatial stack' is easy to install, has comprehensive and well-maintained core packages meaning less 'context switching' and more focus on techniques rather than getting the code working. @@ -57,7 +57,7 @@ This may sound simple and easy to achieve (which it is if you carefully maintain ## What is geocomputation? -Geocomputation is the application and development of computational methods for geographic data processing, analysis, modeling and visualization with command-line tools and scripts, focused on performance, reproducibility and modularity. +Geocomputation\index{geocomputation} is the application and development of computational methods for geographic data processing, analysis, modeling and visualization with command-line tools and scripts, focused on performance, reproducibility and modularity. This definition encapsulates many of the key ideas in this book, building on the short history of the word, dating back to the first conference on the subject in 1996 when it entered the lexicon.^[ The first 'GeoComputation' conference took place at the University of Leeds, where one of the authors (Robin) is currently based. In 2017 the GeoComputation conference returned to University of Leeds, providing a chance for us to work on and present the book (see www.geocomputation.org for more on the conference series, and papers/presentations spanning more than two decades). @@ -67,7 +67,7 @@ What distinguished geocomputation from the (at the time) commonly used term 'qua In the words of Stan Openshaw, a pioneer in the field who was an advocate (and possibly originator) of the term, "GeoComputation is about using the various different types of geodata and about developing relevant geo-tools within the overall context of a 'scientific' approach" [@openshaw_geocomputation_2000]. Building on this early definition, *Geocomputation with R* goes beyond data analysis and modeling to include the development of new tools and methods for work that is not just interesting academically but beneficial. -Our approach differs from early definitions of geocomputation in one important way, however: in its emphasis on reproducibility and collaboration. +Our approach differs from early definitions of geocomputation in one important way, however: in its emphasis on reproducibility\index{reproducibility} and collaboration. At the turn of the 21^st^ Century, it was unrealistic to expect readers to be able to reproduce code examples, due to barriers preventing access to the necessary hardware, software and data. Fast-forward two decades and things have progressed rapidly. Anyone with access to a laptop with sufficient RAM (at least 8 GB recommended) can install and run software for geocomputation, and reproduce the contents of this book. @@ -89,7 +89,7 @@ Geocomputation is a recent term but is influenced by old ideas. It can be seen as a part of Geography\index{Geography}, which has a 2000+ year history [@talbert_ancient_2014]; and an extension of *Geographic Information Systems* (GIS\index{GIS}) [@neteler_open_2008], which emerged in the 1960s [@coppock_history_1991]. -Geography\index{Geography} has played an important role in explaining and influencing humanity's relationship with the natural world long before the invention of the computer, however. +Geography\index{geography} has played an important role in explaining and influencing humanity's relationship with the natural world long before the invention of the computer, however. Alexander von Humboldt's\index{von Humboldt} travels to South America in the early 1800s illustrates this role: not only did the resulting observations lay the foundations for the traditions of physical and plant geography, they also paved the way towards policies to protect the natural world [@wulf_invention_2015]. This book aims to contribute to the 'Geographic Tradition' [@livingstone_geographical_1992] by harnessing the power of modern computers and open source software. @@ -246,8 +246,6 @@ knitr::include_graphics("figures/01-cranlogs.png") It is noteworthy that shifts in the wider R community, as exemplified by the data processing package **dplyr** (released in [2014](https://cran.r-project.org/src/contrib/Archive/dplyr/)) influenced shifts in R's spatial ecosystem. Alongside other packages that have a shared style and emphasis on 'tidy data' (including, e.g., **ggplot2**), **dplyr** was placed in the **tidyverse** 'metapackage'\index{tidyverse (package)} in late [2016](https://cran.r-project.org/src/contrib/Archive/tidyverse/). - - The **tidyverse**\index{tidyverse (package)} approach, with its focus on long-form data and fast intuitively named functions, has become immensely popular. This has led to a demand for 'tidy geographic data' which has been partly met by **sf**. An obvious feature of the **tidyverse** is the tendency for packages to work in harmony. @@ -307,6 +305,7 @@ interfaces to GDAL\index{GDAL} and PROJ\index{PROJ}, for example, still power R' The initial release supported only raster drivers but subsequent enhancements provided support for coordinate reference systems (via the PROJ library), reprojections and import of vector file formats (see Chapter \@ref(read-write) for more on file formats). Many of these additional capabilities were developed by Barry Rowlingson and released in the **rgdal** codebase in 2006 (see @rowlingson_rasp:_2003 and the [R-help](https://stat.ethz.ch/pipermail/r-help/2003-January/028413.html) email list for context). +\index{sp (package)} **sp**, released in 2005, overcame R's inability to distinguish spatial and non-spatial objects [@pebesma_classes_2005]. **sp** grew from a workshop in Vienna in 2003 and was hosted at SourceForge before migrating to R-Forge, and then to GitHub. Prior to 2005, geographic coordinates were generally treated like any other number. @@ -330,6 +329,7 @@ length(revdep_sf) # 739 # 2023-11-16 While **rgdal** and **sp** solved many spatial issues, it was not until **rgeos** was developed during a Google Summer of Code project in 2010 [@R-rgeos] that geometry operations could be undertaken on **sp** objects. Functions such as `gIntersection()` enabled users to find spatial relationships between geographic objects and to modify their geometries (see Chapter \@ref(geometry-operations) for details on geometric operations with **sf**). +\index{raster (package)} A limitation of the **sp** ecosystem was its limited support for raster data. This was overcome by **raster**\index{raster}, first released in 2010 [@R-raster]. **raster**'s class system and functions enabled a range of raster operations, capabilities now implemented in the **terra** package, which supersedes **raster**, as outlined in Section \@ref(raster-data). @@ -351,7 +351,7 @@ Although geographic visualization tended to focus on vector data, raster visuali Since then map making in R has become a hot topic, with dedicated packages such as **tmap**, **leaflet**, **rayshader** and **mapview** gaining popularity, as highlighted in Chapter \@ref(adv-map). Since 2018, when the First Edition of Geocomputation with R was published, the development of geographic R packages has accelerated. -\index{terra (package)} +\index{terra (package)}\index{raster (package)} **terra**, a successor of the **raster** package, was firstly released in 2020 [@R-terra], bringing several benefits to R users working with raster datasets: it is faster and has more a straightforward user interface than its predecessor, as described in Section \@ref(raster-data). In mid-2021, a substantial (and in some cases breaking) change was made to the **sf** package by incorporating spherical geometry calculations. diff --git a/02-spatial-data.Rmd b/02-spatial-data.Rmd index 4037ac309..4ac661772 100644 --- a/02-spatial-data.Rmd +++ b/02-spatial-data.Rmd @@ -14,7 +14,7 @@ We recommend not only reading the prose but also *running the code* in each chap To keep track of your learning journey, it may be worth starting by creating a new folder on your computer to save your R scripts, outputs and other things related to Geocomputation with R as you go. You can also [download](https://github.com/geocompx/geocompr/archive/refs/heads/main.zip) or [clone](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository) the [source code](https://github.com/geocompx/geocompr) underlying the book to support your learning. -We strongly recommend installing an integrated development environment (IDE) such as [RStudio](https://posit.co/download/rstudio-desktop/#download) (recommended for most people) or [VS Code](https://github.com/REditorSupport/vscode-R) when writing/running/testing R code.^[ +We strongly recommend installing an integrated development environment (IDE) such as [RStudio](https://posit.co/download/rstudio-desktop/#download)\index{RStudio} (recommended for most people) or [VS Code](https://github.com/REditorSupport/vscode-R)\index{VS Code} when writing/running/testing R code.^[ We recommend using [RStudio projects](https://r4ds.had.co.nz/workflow-projects.html), [VS Code workspaces](https://code.visualstudio.com/docs/editor/workspaces) or similar system to manage your projects. A quick way to do this with RStudio is via the **rstudioapi** package. Open a new project called 'geocompr-learning' in your home directory with the following command from the R console in RStudio, for example: `rstudioapi::openProject("~/geocompr-learning")`. @@ -36,8 +36,6 @@ install.packages("spData") install.packages("spDataLarge", repos = "https://nowosad.r-universe.dev") ``` - - ```{r} #| echo: FALSE #| message: FALSE @@ -139,13 +137,12 @@ The **sf** package provides classes for geographic vector data and a consistent - [GDAL](https://gdal.org/)\index{GDAL}, for reading, writing and manipulating a wide range of geographic data formats, covered in Chapter \@ref(read-write) - [PROJ](https://proj.org/), a powerful library for coordinate system transformations, which underlies the content covered in Chapter \@ref(reproj-geo-data) and \@ref(reproj-geo-data) - [GEOS](https://libgeos.org/)\index{GEOS}, a planar geometry engine for operations such as calculating buffers and centroids on data with a projected CRS, covered in Chapter \@ref(geometry-operations) -- [S2](https://s2geometry.io/), a spherical geometry engine written in C++ developed by Google, via the [**s2**](https://r-spatial.github.io/s2/) package, covered in Section \@ref(s2) below and in Chapter \@ref(reproj-geo-data) - +- [S2](https://s2geometry.io/)\index{S2}, a spherical geometry engine written in C++ developed by Google, via the [**s2**](https://r-spatial.github.io/s2/) package, covered in Section \@ref(s2) below and in Chapter \@ref(reproj-geo-data) -Information about these interfaces is printed by **sf** the first time the package is loaded: the message `r print(capture.output(sf:::.onAttach(), type = "message"))` that appears below the `library(sf)` command at the beginning of this chapter tells us the versions of linked GEOS, GDAL and PROJ libraries (these vary between computers and over time) and whether or not the S2 interface is turned on. +Information about these interfaces is printed by **sf** the first time the package is loaded: the message `r print(capture.output(sf:::.onAttach(), type = "message"))` that appears below the `library(sf)` command at the beginning of this chapter tells us the versions of linked GEOS, GDAL and PROJ libraries (these vary between computers and over time) and whether or not the S2\index{S2} interface is turned on. Nowadays, we take it for granted, however, only the tight integration with different geographic libraries makes reproducible geocomputation possible in the first place. -A neat feature of **sf** is that you can change the default geometry engine used on unprojected data: 'switching off' S2 can be done with the command `sf::sf_use_s2(FALSE)`, meaning that the planar geometry engine GEOS will be used by default for all geometry operations, including geometry operations on unprojected data. +A neat feature of **sf** is that you can change the default geometry engine used on unprojected data: 'switching off' S2\index{S2} can be done with the command `sf::sf_use_s2(FALSE)`, meaning that the planar geometry engine GEOS\index{GEOS} will be used by default for all geometry operations, including geometry operations on unprojected data. As we will see in Section \@ref(s2), planar geometry is based on 2 dimensional space. Planar geometry engines such as GEOS assume 'flat' (projected) coordinates while spherical geometry engines such as S2 assume unprojected (lon/lat) coordinates. @@ -171,10 +168,10 @@ knitr::include_graphics("figures/sf-classes.png") \index{sf} \index{sf (package)|see {sf}} **sf** also supports geometry collections, which can contain multiple geometry types in a single object. -**sf** provides the same functionality (and more) previously provided in three packages --- **sp** for data classes [@R-sp], **rgdal** for data read/write via an interface to GDAL and PROJ [@R-rgdal] and **rgeos** for spatial operations via an interface to GEOS [@R-rgeos]. +**sf** provides the same functionality (and more) previously provided in three packages --- **sp**\index{sp (package)} for data classes [@R-sp], **rgdal** for data read/write via an interface to GDAL and PROJ [@R-rgdal] and **rgeos** for spatial operations via an interface to GEOS [@R-rgeos]. To re-iterate the message from Chapter 1, geographic R packages have a long history of interfacing with lower level libraries, and **sf** continues this tradition with a unified interface to recent versions of GEOS for geometry operations, the GDAL library for reading and writing geographic data files, and the PROJ library for representing and transforming projected coordinate reference systems. -Through **s2**, an R interface to Google's spherical geometry library, [`s2`](https://s2geometry.io/), **sf** also has access to fast and accurate "measurements and operations on non-planar geometries" [@bivand_progress_2021]. +Through **s2**\index{S2}, an R interface to Google's spherical geometry library, [`s2`](https://s2geometry.io/), **sf** also has access to fast and accurate "measurements and operations on non-planar geometries" [@bivand_progress_2021]. Since **sf** version 1.0.0, launched in [June 2021](https://cran.r-project.org/src/contrib/Archive/sf/), **s2** functionality is now used by [default](https://r-spatial.org/r/2020/06/17/s2.html) on geometries with geographic (longitude/latitude) coordinate systems, a unique feature of **sf** that differs from spatial libraries that only support GEOS for geometry operations such as the Python package [GeoPandas](geopandas/geopandas/issues/2098). We will discuss **s2** in subsequent chapters. @@ -197,7 +194,7 @@ vignette("sf6") # miscellaneous long-form documentation vignette("sf7") # spherical geometry operations ``` -As the first vignette explains, simple feature objects in R are stored in a data frame, with geographic data occupying a special column, usually named 'geom' or 'geometry'. +As the first vignette explains, simple feature objects in R are stored in a data frame, with geographic data occupying a special column, usually named 'geom' or 'geometry'\index{geometry}. We will use the `world` dataset provided by **spData** [@R-spData], loaded at the beginning of this chapter, to show what `sf` objects are and how they work. `world` is an '`sf` data frame' containing spatial and attribute columns, the names of which are returned by the function `names()` (the last column in this example contains the geographic information). @@ -358,7 +355,8 @@ source("https://github.com/geocompx/geocompr/raw/main/code/02-contpop.R") The code above uses the function `st_centroid()` to convert one geometry type (polygons) to another (points) (see Chapter \@ref(geometry-operations)), the aesthetics of which are varied with the `cex` argument. \index{bounding box} -**sf**'s plot method also has arguments specific to geographic data. `expandBB`, for example, can be used to plot an `sf` object in context: +**sf**'s plot method also has arguments specific to geographic data. +`expandBB`, for example, can be used to plot an `sf` object in context: it takes a numeric vector of length four that expands the bounding box of the plot relative to zero in the following order: bottom, left, top, right. This is used to plot India in the context of its giant Asian neighbors, with an emphasis on China to the east, in the following code chunk, which generates Figure \@ref(fig:china) (see exercises below on adding text to plots):^[ Note the use of `st_geometry(india)` to return only the geometry associated with the object to prevent attributes being plotted in a simple feature column (`sfc`) object. @@ -402,6 +400,7 @@ Generally, well-known binary (WKB) or well-known text (WKT) are the standard enc \index{well-known text} \index{WKT|see {well-known text}} \index{well-known binary} +\index{WKB|see {well-known binary}} WKB representations are usually hexadecimal strings easily readable for computers. This is why GIS and spatial databases use WKB to transfer and store geometry objects. WKT, on the other hand, is a human-readable text markup description of simple features. @@ -801,7 +800,8 @@ Benchmarks, in the package's [documentation](https://dcooley.github.io/sfheaders ### Spherical geometry operations with S2 {#s2} Spherical geometry engines are based on the fact that world is round while simple mathematical procedures for geocomputation, such as calculating a straight line between two points or the area enclosed by a polygon, assume planar (projected) geometries. -Since **sf** version 1.0.0, R supports spherical geometry operations 'out of the box' (and by default), thanks to its interface to Google's S2 spherical geometry engine via the **s2** interface package. +Since **sf** version 1.0.0, R supports spherical geometry operations 'out of the box' (and by default), thanks to its interface to Google's S2 spherical geometry engine via the **s2** interface package +\index{S2}. S2 is perhaps best known as an example of a Discrete Global Grid System (DGGS). Another example is the [H3](https://h3geo.org/) global hexagonal hierarchical spatial index [@bondaruk_assessing_2020]. @@ -860,7 +860,7 @@ To turn off S2 for the entirety of a project you can create a file called .Rprof ## Raster data -The spatial raster data model represents the world with the continuous grid of cells (often also called pixels; Figure \@ref(fig:raster-intro-plot):A). +The spatial raster data model represents the world with the continuous grid of cells (often also called pixels; Figure \@ref(fig:raster-intro-plot):A)\index{raster data model}. This data model often refers to so-called regular grids, in which each cell has the same, constant size -- and we will focus on the regular grids in this book only. However, several other types of grids exist, including rotated, sheared, rectilinear, and curvilinear grids (see Chapter 1 of @pebesma_spatial_2022 or Chapter 2 of @tennekes_elegant_2022). @@ -904,6 +904,7 @@ source("code/02-raster-intro-plot2.R", print.eval = TRUE) ### R packages for working with raster data Over the last two decades, several packages for reading and processing raster datasets have been developed. +\index{raster (package)}\index{terra (package)}\index{stars (package)} As outlined in Section \@ref(the-history-of-r-spatial), chief among them was **raster**, which led to a step change in R's raster capabilities when it was launched in 2010 and the premier package in the space until the development of **terra** and **stars**. Both more recently developed packages provide powerful and performant functions for working with raster datasets and there is substantial overlap between their possible use cases. In this book we focus on **terra**, which replaces the older and (in most cases) slower **raster**. @@ -926,17 +927,13 @@ We also encourage you to read @pebesma_spatial_2022 for the most comprehensive i ### An introduction to terra +\index{terra (package)} The **terra** package supports raster objects in R. It provides an extensive set of functions to create, read, export, manipulate and process raster datasets. **terra**'s functionality is largely the same as the more mature **raster** package, but there are some differences: **terra** functions are usually more computationally efficient than **raster** equivalents. On the other hand, the **raster** class system is popular and used by many other packages. You can seamlessly translate between the two types of object to ensure backwards compatibility with older scripts and packages, for example, with the functions [`raster()`](https://rspatial.github.io/raster/reference/raster.html), [`stack()`](https://rspatial.github.io/raster/reference/stack.html), and `brick()` in the **raster** package (see the previous chapter for more on the evolution of R packages for working with geographic data). -```{r, echo=FALSE, eval=FALSE} -# # test raster/terra conversions -# See https://github.com/rspatial/terra/issues/399 -``` - In addition to functions for raster data manipulation, **terra** provides many low-level functions that can form a foundation for developing new tools for working with raster datasets. \index{terra (package)|see {terra}} **terra** also lets you work on large raster datasets that are too large to fit into the main memory. @@ -983,6 +980,7 @@ There are several other approaches for plotting raster data in R that are outsid ### Raster classes {#raster-classes} +\index{terra (package)} The `SpatRaster` class represents rasters object of **terra**. The easiest way to create a raster object in R is to read-in a raster file from disk or from a server (Section \@ref(raster-data-read)). \index{raster!class} diff --git a/04-spatial-operations.Rmd b/04-spatial-operations.Rmd index 7f565ad6a..2d396c3ba 100644 --- a/04-spatial-operations.Rmd +++ b/04-spatial-operations.Rmd @@ -85,7 +85,7 @@ Like attribute subsetting, the command `x[y, ]` (equivalent to `nz_height[canter Instead of `y` being a vector of class `logical` or `integer`, however, for spatial subsetting both `x` and `y` must be geographic objects. Specifically, objects used for spatial subsetting in this way must have the class `sf` or `sfc`: both `nz` and `nz_height` are geographic vector data frames and have the class `sf`, and the result of the operation returns another `sf` object representing the features in the target `nz_height` object that intersect with (in this case high points that are located within) the `canterbury` region. -Various *topological relations* can be used for spatial subsetting which determine the type of spatial relationship that features in the target object must have with the subsetting object to be selected. +Various *topological relations*\index{topological relations} can be used for spatial subsetting which determine the type of spatial relationship that features in the target object must have with the subsetting object to be selected. These include *touches*, *crosses* or *within*, as we will see shortly in Section \@ref(topological-relations). The default setting `st_intersects` is a 'catch all' topological relation that will return features in the target that *touch*, *cross* or are *within* the source 'subsetting' object. Alternative spatial operators can be specified with the `op =` argument, as demonstrated in the following command which returns the opposite of `st_intersects()`, points that do not intersect with Canterbury (see Section \@ref(topological-relations)). @@ -116,7 +116,7 @@ canterbury_height2 = nz_height[sel_logical, ] ``` The above code chunk creates an object of class `sgbp` (a sparse geometry binary predicate, a list of length `x` in the spatial operation) and then converts it into a logical vector `sel_logical` (containing only `TRUE` and `FALSE` values, something that can also be used by **dplyr**'s filter function). -\index{binary predicate|seealso {topological relations}} +\index{binary predicate|see also {topological relations}} The function `lengths()` identifies which features in `nz_height` intersect with *any* objects in `y`. In this case 1 is the greatest possible value but for more complex operations one could use the method to subset only features that intersect with, for example, 2 or more features from the source object. @@ -152,7 +152,7 @@ The next section explores different types of spatial relation, also known as bin ### Topological relations -Topological relations describe the spatial relationships between objects. +Topological relations\index{topological relations} describe the spatial relationships between objects. "Binary topological relationships", to give them their full name, are logical statements (in that the answer can only be `TRUE` or `FALSE`) about the spatial relationships between two objects defined by ordered sets of points (typically forming points, lines and polygons) in two or more dimensions [@egenhofer_mathematical_1990]. That may sound rather abstract and, indeed, the definition and classification of topological relations is based on mathematical foundations first published in book form in 1966 [@spanier_algebraic_1995], with the field of algebraic topology continuing into the 21^st^ century [@dieck_algebraic_2008]. @@ -242,7 +242,7 @@ st_intersects(point_sf, polygon_sfc, sparse = FALSE) In the above output each row represents a feature in the target (argument `x`) object and each column represents a feature in the selecting object (`y`). In this case, there is only one feature in the `y` object `polygon_sfc` so the result, which can be used for subsetting as we saw in Section \@ref(spatial-subsetting), has only one column. -`st_intersects()` returns `TRUE` even in cases where the features just touch: *intersects* is a 'catch-all' topological operation which identifies many types of spatial relation, as illustrated in Figure \@ref(fig:relations). +`st_intersects()` returns `TRUE` even in cases where the features just touch: *intersects*\index{intersects} is a 'catch-all' topological operation which identifies many types of spatial relation, as illustrated in Figure \@ref(fig:relations). More restrictive questions include which points lie within the polygon, and which features are on or contain a shared boundary with `y`? These can be answered as follows (results not shown): @@ -321,7 +321,7 @@ plot(p, add = TRUE) ### Distance relations -While the topological relations presented in the previous section are binary (a feature either intersects with another or does not) distance relations are continuous. +While the topological relations presented in the previous section are binary (a feature either intersects with another or does not) distance relations are continuous\index{distance relations}. The distance between two `sf` objects is calculated with `st_distance()`, which is also used behind the scenes in Section \@ref(non-overlapping-joins) for distance-based joins. This is illustrated in the code chunk below, which finds the distance between the highest point in New Zealand and the geographic centroid of the Canterbury region, created in Section \@ref(spatial-subsetting): \index{sf!distance relations} @@ -356,7 +356,7 @@ plot(st_geometry(nz_height)[2:3], add = TRUE) ### DE-9IM strings {#DE-9IM-strings} -Underlying the binary predicates demonstrated in the previous section is the Dimensionally Extended 9-Intersection Model (DE-9IM). +Underlying the binary predicates demonstrated in the previous section is the Dimensionally Extended 9-Intersection Model (DE-9IM)\index{topological relations!DE-9IM}. As the cryptic name suggests, this is not an easy topic to understand, but it is worth knowing about because it underlies many spatial operations and enables the creation of custom spatial predicates. The model was originally labelled "DE + 9IM" by its inventors, referring to the "dimension of the intersections of boundaries, interiors, and exteriors of two features" [@clementini_comparison_1995], but is now referred to as DE-9IM [@shen_classification_2018]. DE-9IM is applicable to 2-dimensional objects (points, lines and polygons) in Euclidean space, meaning that the model (and software implementing it such as GEOS) assumes you are working with data in a projected coordinate reference system, described in Chapter \@ref(reproj-geo-data). diff --git a/05-geometry-operations.Rmd b/05-geometry-operations.Rmd index 949d768b7..96b6f49d2 100644 --- a/05-geometry-operations.Rmd +++ b/05-geometry-operations.Rmd @@ -204,9 +204,9 @@ tmap_arrange(p_buffs1, p_buffs2, ncol = 2) The `st_buffer()` has a few additional arguments. The most important ones are: -- `nQuadSegs` (when the GEOS engine is used), which means 'number of segments per quadrant' and is set by default to 30 (meaning circles created by buffers are composed of $4 \times 30 = 120$ lines). +- `nQuadSegs` (when the GEOS\index{GEOS} engine is used), which means 'number of segments per quadrant' and is set by default to 30 (meaning circles created by buffers are composed of $4 \times 30 = 120$ lines). Unusual cases where it may be useful include when the memory consumed by the output of a buffer operation is a major concern (in which case it should be reduced) or when very high precision is needed (in which case it should be increased) -- `max_cells` (when the S2 engine is used), the larger the value, the more smooth the buffer will be, but the calculations will take longer +- `max_cells` (when the S2\index{S2} engine is used), the larger the value, the more smooth the buffer will be, but the calculations will take longer - `endCapStyle` and `joinStyle` (when the GEOS engine is used), which control the appearance of the buffer's edges - `singleSide` (when the GEOS engine is used), which controls whether the buffer is created on one or both sides of the input geometry ``` diff --git a/06-raster-vector.Rmd b/06-raster-vector.Rmd index ebfad2453..a9f4ebced 100644 --- a/06-raster-vector.Rmd +++ b/06-raster-vector.Rmd @@ -460,7 +460,6 @@ Contours can also be added to existing plots with functions such as `contour()`, As illustrated in Figure \@ref(fig:contour-tmap), isolines can be labelled. \index{hillshade} - ```{r contour-tmap, echo=FALSE, message=FALSE, fig.cap="DEM with hillshading, showing the southern flank of Mt. Mongón overlaid with contour lines.", fig.scap="DEM with hillshading.", warning=FALSE, fig.asp=0.56, fig.width=3.5} # hs = shade(slope = terrain(dem, "slope", unit = "radians"), # aspect = terrain(dem, "aspect", unit = "radians")) diff --git a/07-reproj.Rmd b/07-reproj.Rmd index e6581e5ab..5344cb526 100644 --- a/07-reproj.Rmd +++ b/07-reproj.Rmd @@ -207,7 +207,7 @@ st_is_longlat(london_geo) ## Geometry operations on projected and unprojected data {#geom-proj} Since **sf** version 1.0.0, R's ability to work with geographic vector datasets that have lon/lat CRSs has improved substantially, thanks to its integration with the S2 *spherical geometry engine* introduced in Section \@ref(s2). -As shown in Figure \@ref(fig:s2geos), **sf** uses either GEOS or the S2 depending on the type of CRS and whether S2 has been disabled (it is enabled by default). +As shown in Figure \@ref(fig:s2geos), **sf** uses either GEOS\index{GEOS} or the S2\index{S2} depending on the type of CRS and whether S2 has been disabled (it is enabled by default). GEOS is always used for projected data and data with no CRS; for geographic data S2 is used by default but can be disabled with `sf::sf_use_s2(FALSE)`. ```{r s2geos, fig.cap="The behavior of the geometry operations in the sf package depending on the input data's CRS.", echo=FALSE} @@ -268,7 +268,7 @@ london_buff_s2_100_cells = st_buffer(london_geo, dist = 100000, max_cells = 100) In the first line above, **sf** assumes that the input is projected and generates a result that has a buffer in units of degrees, which is problematic, as we will see. In the second line, **sf** silently uses the spherical geometry engine S2, introduced in Chapter \@ref(spatial-class), to calculate the extent of the buffer using the default value of `max_cells = 1000` --- set to `100` in line three --- the consequences which will become apparent shortly. -To highlight the impact of **sf**'s use of the S2 geometry engine for unprojected (geographic) coordinate systems, we will temporarily disable it with the command `sf_use_s2()` (which is on, `TRUE`, by default), in the code chunk below. +To highlight the impact of **sf**'s use of the S2\index{S2} geometry engine for unprojected (geographic) coordinate systems, we will temporarily disable it with the command `sf_use_s2()` (which is on, `TRUE`, by default), in the code chunk below. Like `london_buff_no_crs`, the new `london_geo` object is a geographic abomination: it has units of degrees, which makes no sense in the vast majority of cases: ```{r 06-reproj-4-2} @@ -281,7 +281,7 @@ The warning message above hints at issues with performing planar geometry operat When spherical geometry operations are turned off, with the command `sf::sf_use_s2(FALSE)`, buffers (and other geometric operations) may result in worthless outputs because they use units of latitude and longitude, a poor substitute for proper units of distances such as meters. ```{block2 06-reproj-5, type="rmdnote"} -The distance between two lines of longitude, called meridians, is around 111 km at the equator (execute `geosphere::distGeo(c(0, 0), c(1, 0))` to find the precise distance). +The distance between two lines of longitude, called meridians\index{meridians}, is around 111 km at the equator (execute `geosphere::distGeo(c(0, 0), c(1, 0))` to find the precise distance). This shrinks to zero at the poles. At the latitude of London, for example, meridians are less than 70 km apart (challenge: execute code that verifies this). @@ -375,13 +375,10 @@ tmap_arrange(tm1, tm2, tm3, nrow = 1) It is clear from Figure \@ref(fig:crs-buf) that buffers based on `s2` and properly projected CRSs are not 'squashed', meaning that every part of the buffer boundary is equidistant to London. The results that are generated from lon/lat CRSs when `s2` is *not* used, either because the input lacks a CRS or because `sf_use_s2()` is turned off, are heavily distorted, with the result elongated in the north-south axis, highlighting the dangers of using algorithms that assume projected data on lon/lat inputs (as GEOS does). -The results generated using S2 are also distorted, however, although less dramatically. +The results generated using S2\index{S2} are also distorted, however, although less dramatically. Both buffer boundaries in Figure \@ref(fig:crs-buf) (left) are jagged, although this may only be apparent or relevant for the thick boundary representing a buffer created with the `s2` argument `max_cells` set to 100. - - - The lesson is that results obtained from lon/lat data via S2 will be different from results obtained from using projected data. -The difference between S2 derived buffers and GEOS derived buffers on projected data reduce as the value of `max_cells` increases: the 'right' value for this argument may depend on many factors and the default value 1000 is often a reasonable default. +The difference between S2\index{S2} derived buffers and GEOS\index{GEOS} derived buffers on projected data reduce as the value of `max_cells` increases: the 'right' value for this argument may depend on many factors and the default value 1000 is often a reasonable default. When choosing `max_cells` values, speed of computation should be balanced against resolution of results. In situations where smooth curved boundaries are advantageous, transforming to a projected CRS before buffering (or performing other geometry operations) may be appropriate. @@ -424,13 +421,13 @@ Additionally, you should not be attached just to one projection for every task. It is possible to use one projection for some part of the analysis, another projection for a different part, and even some other for visualization. Always try to pick the CRS that serves your goal best! -When selecting **geographic CRSs**, the answer is often [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System#A_new_World_Geodetic_System:_WGS_84). +When selecting **geographic CRSs**\index{geographic CRS}, the answer is often [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System#A_new_World_Geodetic_System:_WGS_84). It is used not only for web mapping, but also because GPS datasets and thousands of raster and vector datasets are provided in this CRS by default. WGS84 is the most common CRS in the world, so it is worth knowing its EPSG code: 4326.^[ Instead of `"EPSG:4326"`, you may also use `"OGC:CRS84"`. The former assumes that latitude is always ordered before longitude, while the latter is the standard representation used by GeoJSON, with coordinates ordered longitude before latitude.] This 'magic number' can be used to convert objects with unusual projected CRSs into something that is widely understood. -What about when a **projected CRS** is required? +What about when a **projected CRS**\index{projected CRS} is required? In some cases, it is not something that we are free to decide: "often the choice of projection is made by a public mapping agency" [@bivand_applied_2013]. This means that when working with local data sources, it is likely preferable to work with the CRS in which the data was provided, to ensure compatibility, even if the official CRS is not the most accurate. @@ -507,7 +504,6 @@ We will cover the particularities of vector data transformation in Section \@ref Next, Section \@ref(mapproj), shows how to create custom map projections. ## Reprojecting vector geometries {#reproj-vec-geom} - \index{CRS!reprojection} \index{vector!reprojection} @@ -613,6 +609,7 @@ Reprojection of the regular rasters is also known as warping. Additionally, there is a second similar operation called "transformation". Instead of resampling all of the values, it leaves all values intact but recomputes new coordinates for every raster cell, changing the grid geometry. For example, it could convert the input raster (a regular grid) into a curvilinear grid. +\index{stars (package)} The transformation operation can be performed in R using [the **stars** package](https://r-spatial.github.io/stars/articles/stars5.html). ``` @@ -740,6 +737,7 @@ For instance, if we are interested in a density (points per grid cell or inhabit ## Custom map projections {#mapproj} +\index{CRS!custom projections} Established CRSs captured by `AUTHORITY:CODE` identifiers such as `EPSG:4326` are well suited for many applications. However, it is desirable to use alternative projections or to create custom CRSs in some cases. Section \@ref(which-crs) mentioned reasons for using custom CRSs, and provided several possible approaches. diff --git a/08-read-write-plot.Rmd b/08-read-write-plot.Rmd index 537e9a8a5..6d47160f2 100644 --- a/08-read-write-plot.Rmd +++ b/08-read-write-plot.Rmd @@ -299,6 +299,7 @@ snow = rast(myurl) snow ``` +\index{COG} Due to the fact that the input data is COG, we are actually not reading this file to our RAM, but rather creating a connection to it without obtaining any values. Its values will be read if we apply any value-based operation (e.g., `crop()` or `extract()`). This allows us also to just read a tiny portion of the data without downloading the entire file. @@ -426,6 +427,7 @@ writeRaster(x = single_layer, filename = "my_raster.tif", gdal = c("COMPRESS=NONE"), overwrite = TRUE) ``` +\index{COG} Additionally, we can save our raster object as COG (*Cloud Optimized GeoTIFF*, Section \@ref(file-formats)) with the `filetype = "COG"` options. ```{r 07-read-write-plot-35b, eval=FALSE} @@ -582,7 +584,7 @@ Geographic data can also be imported into R from various 'bridges' to geographic ## Geographic metadata -Geographic metadata are a cornerstone of geographic information management, used to describe datasets, data structures and services. +Geographic metadata\index{geographic metadata} are a cornerstone of geographic information management, used to describe datasets, data structures and services. They help make datasets FAIR (Findable, Accessible, Interoperable, Reusable) and are defined by the ISO/OGC standards, in particular the ISO 19115 standard and underlying schemas. These standards are widely used within spatial data infrastructures, handled through metadata catalogs. diff --git a/09-mapping.Rmd b/09-mapping.Rmd index 305bee83f..f4eb36cdb 100644 --- a/09-mapping.Rmd +++ b/09-mapping.Rmd @@ -44,7 +44,7 @@ nz_elev = rast(system.file("raster/nz_elev.tif", package = "spDataLarge")) ## Introduction A satisfying and important aspect of geographic research is communicating the results. -Map making --- the art of cartography --- is an ancient skill involving communication, attention to detail, and an element of creativity. +Map making\index{map making} --- the art of cartography --- is an ancient skill involving communication, attention to detail, and an element of creativity. Static mapping in R is straightforward with the `plot()` function, as we saw in Section \@ref(basic-map). It is possible to create advanced maps using base R methods [@murrell_r_2016]. The focus of this chapter, however, is cartography with dedicated map-making packages. @@ -58,7 +58,7 @@ Furthermore, poor map making can hinder the communication of results [@brewer_de > Amateur-looking maps can undermine your audience’s ability to understand important information and weaken the presentation of a professional data investigation. Maps have been used for several thousand years for a wide variety of purposes. -Historic examples include maps of buildings and land ownership in the Old Babylonian dynasty more than 3000 years ago and Ptolemy's world map in his masterpiece *Geography* nearly 2000 years ago [@talbert_ancient_2014]. +Historic examples include maps of buildings and land ownership in the Old Babylonian dynasty more than 3000 years ago and Ptolemy's world map in his masterpiece *Geography*\index{geography} nearly 2000 years ago [@talbert_ancient_2014]. Map making has historically been an activity undertaken only by, or on behalf of, the elite. This has changed with the emergence of open source mapping software such as the R package **tmap** and the 'print layout' in QGIS\index{QGIS} which enable anyone to make high-quality maps, enabling 'citizen science'. @@ -119,6 +119,7 @@ source("https://github.com/geocompx/geocompr/raw/main/code/09-tmshape.R", print. The object passed to `tm_shape()` in this case is `nz`, an `sf` object representing the regions of New Zealand (see Section \@ref(intro-sf) for more on `sf` objects). Layers are added to represent `nz` visually, with `tm_fill()` and `tm_borders()` creating shaded areas (left panel) and border outlines (middle panel) in Figure \@ref(fig:tmshape), respectively. +\index{map making!layers} This is an intuitive approach to map making: the common task of *adding* new layers is undertaken by the addition operator `+`, followed by `tm_*()`. The asterisk (\*) refers to a wide range of layer types which have self-explanatory names including: @@ -183,6 +184,7 @@ map_nz3 = map_nz2 + tm_shape(nz_height) + tm_symbols() ``` +\index{map making!metaplot} A useful and little known feature of **tmap** is that multiple map objects can be arranged in a single 'metaplot' with `tmap_arrange()`. This is demonstrated in the code chunk below which plots `map_nz1` to `map_nz3`, resulting in Figure \@ref(fig:tmlayers). @@ -195,8 +197,8 @@ Aesthetic settings, however, are controlled by arguments to layer functions. ### Visual variables -\index{tmap (package)!aesthetics} -\index{tmap (package)!visual variables} +\index{map making!aesthetics} +\index{map making!visual variables} The plots in the previous section demonstrate **tmap**'s default aesthetic settings. Gray shades are used for `tm_fill()` and `tm_symbols()` layers and a continuous black line is used to represent lines created with `tm_lines()`. Of course, these default values and other aesthetics can be overridden. @@ -361,6 +363,7 @@ All of the default `values` of the visual variables, such as default color palet For example run `tmap_options()$values.var`. ``` +\index{color palettes} There are three main groups of color palettes\index{map making!color palettes}: categorical, sequential and diverging (Figure \@ref(fig:colpal)), and each of them serves a different purpose.^[ A fourth group of color palettes, called bivariate, also exists. They are used when we want to represent relations between two variables on one map. diff --git a/10-gis.Rmd b/10-gis.Rmd index 001fe743c..067dbe59f 100644 --- a/10-gis.Rmd +++ b/10-gis.Rmd @@ -62,7 +62,7 @@ However, interactive command line interfaces have several important advantages i - Developing future-proof and efficient programming skills which are in high demand - Improving touch typing, a key skill in the digital age -On the other hand, good GUIs also have advantages, including: +On the other hand, good GUIs\index{graphical user interface} also have advantages, including: - 'Shallow' learning curves meaning geographic data can be explored and visualized without hours of learning a new language - Support for 'digitizing' (creating new vector datasets), including trace, snap and topological tools^[The **mapedit** R package allows the quick editing of a few spatial features in a browser window opened from R but not professional, large-scale cartographic digitizing.] @@ -92,7 +92,6 @@ This chapter focuses on 'bridges' to three mature open source GIS products, summ - SAGA\index{SAGA}, via **Rsagacmd**\index{Rsagacmd (package)} (Section \@ref(saga)) - GRASS GIS\index{GRASS GIS}, via **rgrass**\index{rgrass (package)} (Section \@ref(grass)) - There have also been major developments in enabling open source GIS software to write and execute R scripts inside QGIS\index{QGIS} (see [docs.qgis.org](https://docs.qgis.org/3.28/en/docs/training_manual/processing/r_intro.html)) and GRASS GIS (see [grasswiki.osgeo.org](https://grasswiki.osgeo.org/wiki/R_statistics/rgrass#R_within_GRASS)). ```{r gis-comp, echo=FALSE, message=FALSE} @@ -384,7 +383,7 @@ dem = system.file("raster/dem.tif", package = "spDataLarge") The **terra** package's `terrain()` command already allows the calculation of several fundamental topographic characteristics such as slope, aspect, TPI (*Topographic Position Index*), TRI (*Topographic Ruggedness Index*), roughness, and flow directions. However, GIS programs offer many more terrain characteristics, some of which can be more suitable in certain contexts. -For example, the topographic wetness index (TWI) was found useful in studying hydrological and biological processes [@sorensen_calculation_2006]. +For example, the topographic wetness index (TWI)\index{topographic wetness index} was found useful in studying hydrological and biological processes [@sorensen_calculation_2006]. Let's search the algorithm list for this index using `"wetness"` as keyword. ```{r, eval=has_qgis_plugins} @@ -479,7 +478,7 @@ The System for Automated Geoscientific Analyses (SAGA\index{SAGA}; Table \@ref(t In addition, there is a Python interface (SAGA Python API\index{API}). **Rsagacmd**\index{Rsagacmd (package)} uses the former to run SAGA\index{SAGA} from within R. -We will use **Rsagacmd** in this section to delineate areas with similar values of the normalized difference vegetation index (NDVI) of the Mongón study area in Peru from the 22nd of September 2000 (Figure \@ref(fig:sagasegments), left panel) by using a seeded region growing algorithm from SAGA.^[Read Section \@ref(local-operations) on details of how to calculate NDVI from a remote sensing image.] +We will use **Rsagacmd** in this section to delineate areas with similar values of the normalized difference vegetation index (NDVI) of the Mongón study area in Peru from the 22nd of September 2000 (Figure \@ref(fig:sagasegments), left panel) by using a seeded region growing algorithm from SAGA\index{segmentation}.^[Read Section \@ref(local-operations) on details of how to calculate NDVI from a remote sensing image.] ```{r, eval=FALSE} ndvi = rast(system.file("raster/ndvi.tif", package = "spDataLarge")) @@ -555,7 +554,7 @@ They can also be further aggregated into larger polygons using various technique You can try to do it in Exercises. R also has other tools to achieve the goal of creating polygons with similar values (so-called segments). -It includes the **SegOptim** package [@goncalves_segoptim_2019] that allows running several image segmentation algorithms and **supercells** [@nowosad_extended_2022] that implements superpixels algorithm SLIC to work with geospatial data. +It includes the **SegOptim** package [@goncalves_segoptim_2019] that allows running several image segmentation algorithms and **supercells** [@nowosad_extended_2022] that implements superpixels\index{segmentation!superpixels} algorithm SLIC to work with geospatial data. ## GRASS GIS {#grass} diff --git a/11-algorithms.Rmd b/11-algorithms.Rmd index a7003698d..fefb7d605 100644 --- a/11-algorithms.Rmd +++ b/11-algorithms.Rmd @@ -42,7 +42,7 @@ The example also reflects a secondary aim of the chapter which, following @xiao_ ## Scripts If functions distributed in packages are the building blocks of R code, scripts are the glue that holds them together. -Scripts should be stored and executed in a logical order to create reproducible workflows, manually or with workflow automation tools such as **targets** [@landau_targets_2021]. +Scripts should be stored and executed in a logical order to create reproducible workflows\index{reproducibility}, manually or with workflow automation tools such as **targets** [@landau_targets_2021]. If you are new to programming scripts may seem intimidating when you first encounter them, but they are simply plain text files. Scripts are usually saved as a file with an extension representing the language they contain, such as `.py` for scripts written in Python or `.rs` for scripts written in Rust. R scripts should be saved with a `.R` extension and named to reflect what they do. @@ -199,7 +199,6 @@ C1 = (T1[1,] + T1[2,] + T1[3,]) / 3 C1_alternative = (T1[1, , drop = FALSE] + T1[2, , drop = FALSE] + T1[3, , drop = FALSE]) / 3 ``` - ```{r polymat, echo=FALSE, fig.cap="Illustration of polygon centroid calculation problem.", fig.height="100", warning=FALSE} # initial plot: can probably delete this: old_par = par(pty = "s") diff --git a/12-spatial-cv.Rmd b/12-spatial-cv.Rmd index 901b7c4c6..a5501253c 100644 --- a/12-spatial-cv.Rmd +++ b/12-spatial-cv.Rmd @@ -286,8 +286,8 @@ Third, the **resampling** approach assesses the predictive performance of the mo ### Generalized linear model {#glm} To use a GLM\index{GLM} in **mlr3**\index{mlr3 (package)}, we must create a **task** containing the landslide data. -Since the response is binary (two-category variable) and has a spatial dimension, we create a classification\index{classification} task with `TaskClassifST$new()` of the **mlr3spatiotempcv** package [@schratz_mlr3spatiotempcv_2021, for non-spatial tasks, use `mlr3::TaskClassif$new()` or `mlr3::TaskRegr$new()` for regression\index{regression} tasks, see `?Task` for other task types].^[The **mlr3** ecosystem makes heavily use of **data.table** and **R6** classes. And though you might use **mlr3** without knowing the specifics of **data.table** or **R6**, it might be rather helpful. To learn more about **data.table**, please refer to https://rdatatable.gitlab.io/data.table/. To learn more about **R6**, we recommend [Chapter 14](https://adv-r.hadley.nz/fp.html) of the Advanced R book [@wickham_advanced_2019].] -The first essential argument of these `Task*$new()` functions is `backend`. +Since the response is binary (two-category variable) and has a spatial dimension, we create a classification\index{classification} task with `as_task_classif_st()` of the **mlr3spatiotempcv** package [@schratz_mlr3spatiotempcv_2021, for non-spatial tasks, use `mlr3::as_task_classif` or `mlr3::as_task_regr` for regression\index{regression} tasks, see `?Task` for other task types].^[The **mlr3** ecosystem makes heavily use of **data.table** and **R6** classes. And though you might use **mlr3** without knowing the specifics of **data.table** or **R6**, it might be rather helpful. To learn more about **data.table**, please refer to https://rdatatable.gitlab.io/data.table/. To learn more about **R6**, we recommend [Chapter 14](https://adv-r.hadley.nz/fp.html) of the Advanced R book [@wickham_advanced_2019].] +The first essential argument of these `as_task_` functions is `backend`. `backend` expects that the input data includes the response and predictor variables. The `target` argument indicates the name of a response variable (in our case this is `lslpts`) and `positive` determines which of the two factor levels of the response variable indicate the landslide initiation point (in our case this is `TRUE`). All other variables of the `lsl` dataset will serve as predictors. @@ -297,7 +297,7 @@ Additionally, we should decide if we want to use the coordinates as predictors i ```{r 12-spatial-cv-11, eval=FALSE} # 1. create task -task = mlr3spatiotempcv::TaskClassifST$new( +task = mlr3spatiotempcv::as_task_classif_st( id = "ecuador_lsl", backend = mlr3::as_data_backend(lsl), target = "lslpts", @@ -310,7 +310,7 @@ task = mlr3spatiotempcv::TaskClassifST$new( Note that `mlr3spatiotempcv::as_task_classif_st()` also accepts an `sf`-object as input for the `backend` parameter. In this case, you might only want to additionally specify the `coords_as_features` argument. -We did not convert `lsl` into an `sf`-object because `TaskClassifST$new()` would just turn it back into a non-spatial `data.table` object in the background. +We did not convert `lsl` into an `sf`-object because `as_task_classif_st()` would just turn it back into a non-spatial `data.table` object in the background. For a short data exploration, the `autoplot()` function of the **mlr3viz** package might come in handy since it plots the response against all predictors and all predictors against all predictors (not shown). @@ -508,9 +508,6 @@ lrn_ksvm$fallback = lrn("classif.featureless", predict_type = "prob") The next stage is to specify a resampling strategy. Again we will use a 100-repeated 5-fold spatial CV\index{cross-validation!spatial CV}. - - ```{r 12-spatial-cv-25} # performance estimation level perf_level = mlr3::rsmp("repeated_spcv_coords", folds = 5, repeats = 100) @@ -545,10 +542,10 @@ terminator = mlr3tuning::trm("evals", n_evals = 50) tuner = mlr3tuning::tnr("random_search") ``` -The next stage is to modify the learner `lrn_ksvm` in accordance with all the characteristics defining the hyperparameter tuning with `AutoTuner$new()`. +The next stage is to modify the learner `lrn_ksvm` in accordance with all the characteristics defining the hyperparameter tuning with `auto_tuner()`. ```{r 12-spatial-cv-27, eval=FALSE} -at_ksvm = mlr3tuning::AutoTuner$new( +at_ksvm = mlr3tuning::auto_tuner( learner = lrn_ksvm, resampling = tune_level, measure = mlr3::msr("classif.auc"), diff --git a/13-transport.Rmd b/13-transport.Rmd index ad980800b..f93a26d4a 100644 --- a/13-transport.Rmd +++ b/13-transport.Rmd @@ -42,10 +42,10 @@ It is therefore unsurprising that transport researchers have long turned to geog This chapter introduces the geographic analysis of transport systems at different geographic levels: -- **Areal units**: transport patterns can be understood with reference to zonal aggregates, such as the main mode of travel (by car, bike or foot, for example), and average distance of trips made by people living in a particular zone, covered in Section \@ref(transport-zones) +- **Areal units**\index{areal units}: transport patterns can be understood with reference to zonal aggregates, such as the main mode of travel (by car, bike or foot, for example), and average distance of trips made by people living in a particular zone, covered in Section \@ref(transport-zones) - **Desire lines**\index{desire lines}: straight lines that represent 'origin-destination' data that records how many people travel (or could travel) between places (points or zones) in geographic space, the topic of Section \@ref(desire-lines) - **Nodes**\index{node}: these are points in the transport system that can represent common origins and destinations and public transport stations such as bus stops and rail stations, the topic of Section \@ref(nodes) -- **Routes**: these are lines representing a path along the route network along the desire lines and between nodes. +- **Routes**\index{routes}: these are lines representing a path along the route network along the desire lines and between nodes. Routes (which can be represented as single linestrings or multiple short *segments*) and the *routing engines* that generate them, are covered in Section \@ref(routes) - **Route networks**\index{network}: these represent the system of roads, paths and other linear features in an area and are covered in Section \@ref(route-networks). They can be represented as geographic features (typically short segments of road that add up to create a full network) or structured as an interconnected graph, with the level of traffic on different segments referred to as 'flow' by transport modelers [@hollander_transport_2016] @@ -248,9 +248,6 @@ zones_od = inner_join(zones_joined, zones_destinations, by = "geo_code") A simplified version of Figure \@ref(fig:zones) is created with the code below (see `13-zones.R` in the [`code`](https://github.com/geocompx/geocompr/tree/main/code) folder of the book's GitHub repository to reproduce the figure and Section \@ref(faceted-maps) for details on faceted maps with **tmap**\index{tmap (package)}): - - - ```{r 13-transport-11, eval=FALSE} qtm(zones_od, c("all", "all_dest")) + tm_layout(panel.labels = c("Origin", "Destination")) @@ -427,6 +424,7 @@ tm_shape(zones_od) + ## Routes +\index{routes} From a geographical perspective, routes are desire lines\index{desire lines} that are no longer straight: the origin and destination points are the same as in the desire line representation of travel, but the pathway to get from A to B is more complex. The geometries of routes are typically (but not always) determined by the transport network. @@ -592,6 +590,7 @@ Route segment results can enable the prioritization of investment where it is mo ## Route networks +\index{network} While routes generally contain data on travel *behavior*, at the same geographic level as desire lines and OD pairs, route network datasets usually represent the physical transport network. Each *segment* in a route network roughly corresponds to a continuous section of street between junctions and appears only once, although the average length of segments depends on the data source (segments in the OSM-derived `bristol_ways` dataset used in this section have an average length of just over 200 m, with a standard deviation of nearly 500 m). Variability in segment lengths can be explained by the fact that in some rural locations junctions are far apart while in dense urban areas there are crossings and other segment breaks every few meters. diff --git a/14-location.Rmd b/14-location.Rmd index b1f1d3cc2..288739d7e 100644 --- a/14-location.Rmd +++ b/14-location.Rmd @@ -142,7 +142,7 @@ knitr::kable(tab, ``` ## Create census rasters - + After the preprocessing, the data can be converted into a `SpatRaster` object (see Sections \@ref(raster-classes) and \@ref(raster-subsetting)) with the help of the `rast()` function. When setting its `type` argument to `xyz`, the `x` and `y` columns of the input data frame should correspond to coordinates on a regular grid. All the remaining columns (here: `pop`, `women`, `mean_age`, `hh_size`) will serve as values of the raster layers (Figure \@ref(fig:census-stack); see also `code/14-location-figures.R` in our GitHub repository). @@ -212,7 +212,6 @@ reclass # full output not shown #> max values : 8000, 3, 3, 3 ``` - ## Define metropolitan areas We deliberately define metropolitan areas as pixels of 20 km^2^ inhabited by more than 500,000 people. diff --git a/15-eco.Rmd b/15-eco.Rmd index 9b31493c5..958a8244e 100644 --- a/15-eco.Rmd +++ b/15-eco.Rmd @@ -249,7 +249,7 @@ Hence, we need to dismiss all sites in which no species were found. pa = vegan::decostand(comm, "pa") # 100 rows (sites), 69 columns (species) # keep only sites in which at least one species was found pa = pa[rowSums(pa) != 0, ] # 84 rows, 69 columns -``` +``` The resulting matrix serves as input for the NMDS\index{NMDS}. `k` specifies the number of output axes, here, set to 4.^[ @@ -402,11 +402,8 @@ knitr::include_graphics("figures/15_tree.png") ``` The resulting tree consists of three internal nodes and four terminal nodes (Figure \@ref(fig:tree)). -The first internal node at the top of the tree assigns all observations which are below - -328.5 m to the left and all other observations to the right branch. -The observations falling into the left branch have a mean NMDS\index{NMDS} score of --1.198. +The first internal node at the top of the tree assigns all observations which are below 328.5 m to the left and all other observations to the right branch. +The observations falling into the left branch have a mean NMDS\index{NMDS} score of -1.198. Overall, we can interpret the tree as follows: the higher the elevation, the higher the NMDS\index{NMDS} score becomes. This means that the simple decision tree has already revealed four distinct floristic assemblages. For a more in-depth interpretation please refer to the \@ref(predictive-mapping) section. @@ -493,7 +490,7 @@ In each of these spatial partitions, we run 50 models (`trm()`) while using rand The performance measure is the root mean squared error (RMSE\index{RMSE}). ```{r 15-eco-23} -autotuner_rf = mlr3tuning::AutoTuner$new( +autotuner_rf = mlr3tuning::auto_tuner( learner = lrn_rf, resampling = mlr3::rsmp("spcv_coords", folds = 5), # spatial partitioning measure = mlr3::msr("regr.rmse"), # performance measure @@ -523,12 +520,6 @@ autotuner_rf = readRDS("extdata/15-tune.rds") autotuner_rf$tuning_result ``` - - ### Predictive mapping The tuned hyperparameters\index{hyperparameter} can now be used for the prediction. diff --git a/16-synthesis.Rmd b/16-synthesis.Rmd index 3047de4bd..5eec3be7c 100644 --- a/16-synthesis.Rmd +++ b/16-synthesis.Rmd @@ -46,7 +46,7 @@ It depends: the former only processes the geometry data contained in `nz` so is Whether to use the base R function `aggregate()` or the **dplyr** function `summarise()` is a matter of preference, with the latter being more readable for many. The wider point is that there are often multiple options to choose from when working with geographic data in R, even within a single package. -The range of options grows further when more R packages are considered: you could achieve the same result using the older **sp** package, for example. +The range of options grows further when more R packages are considered: you could achieve the same result using the older **sp** package\index{sp (package)}, for example. However, based on our goal of providing good advice, we recommend using the more recent, more performant and future-proof **sf** package. The same applies for all packages showcased in this book, although it can be helpful (when not distracting) to be aware of alternatives and being able to justify your choice of software. @@ -113,22 +113,22 @@ length(spdeps) # 463 ## Gaps and overlaps {#gaps} -Geocomputation is a big area, so there are inevitably gaps in this book. +Geocomputation\index{geocomputation} is a big area, so there are inevitably gaps in this book. We have been selective, deliberately highlighting certain topics, techniques and packages, while omitting others. We have tried to emphasize topics that are most commonly needed in real-world applications such as geographic data operations, basics of coordinate reference systems, read/write data operations and visualization techniques. Some topics and themes appear repeatedly, with the aim of building essential skills for geocomputation, and showing you how to go further, into more advanced topics and specific applications. We deliberately omitted some topics that are covered in-depth elsewhere. -Statistical modeling of spatial data such as point pattern analysis, spatial interpolation (kriging) and spatial regression, for example, are mentioned in the context of machine learning in Chapter \@ref(spatial-cv) but not covered in detail. +Statistical modeling of spatial data such as point pattern analysis\index{point pattern analysis}, spatial interpolation\index{spatial interpolation} (e.g., kriging) and spatial regression\index{spatial regression}, for example, are mentioned in the context of machine learning in Chapter \@ref(spatial-cv) but not covered in detail. There are already excellent resources on these methods, including statistically orientated chapters in @pebesma_spatial_2023 and books on point pattern analysis [@baddeley_spatial_2015], Bayesian techniques applied to spatial data [@gomez-rubio_bayesian_2020; @moraga_spatial_2023], and books focused on particular applications such as health [@moraga_geospatial_2019] and [wildfire severity analysis](https://bookdown.org/mcwimberly/gdswr-book/application---wildfire-severity-analysis.html) [@wimberly_geographic_2023]. Other topics which received limited attention were remote sensing and using R alongside (rather than as a bridge to) dedicated GIS software. There are many resources on these topics, including a [discussion on remote sensing in R](https://github.com/r-spatial/discuss/issues/56), @wegmann_remote_2016 and the GIS-related teaching materials available from [Marburg University](https://geomoer.github.io/moer-info-page/courses.html). -We focused on machine learning rather than spatial statistical inference in Chapters \@ref(spatial-cv) and \@ref(eco) because of the abundance of quality resources on the topic. +We focused on machine learning rather than spatial statistical inference\index{statistical inference} in Chapters \@ref(spatial-cv) and \@ref(eco) because of the abundance of quality resources on the topic. These resources include @zuur_mixed_2009, @zuur_beginners_2017 which focus on ecological use cases, and freely available teaching material and code on *Geostatistics & Open-source Statistical Computing* hosted at [css.cornell.edu/faculty/dgr2](https://css.cornell.edu/faculty/dgr2/teach/). [*R for Geographic Data Science*](https://sdesabbata.github.io/r-for-geographic-data-science/) provides an introduction to R for geographic data science and modelling. -We have largely omitted geocomputation on 'big data' by which we mean datasets that do not fit on a high-spec laptop. +We have largely omitted geocomputation on 'big data'\index{big data} by which we mean datasets that do not fit on a high-spec laptop. This decision is justified by the fact that the majority of geographic datasets that are needed for common research or policy applications *do* fit on consumer hardware, large high resolution remote sensing datasets being a notable exception (see Section \@ref(cloud)). It is possible to get more RAM on your computer or to temporarily 'renting' compute power available on platforms such as [GitHub Codespaces, which can be used to run the code in this book](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=84222786&machine=basicLinux32gb&devcontainer_path=.devcontainer.json&location=WestEurope). Furthermore, learning to solve problems on small datasets is a prerequisite to solving problems on huge datasets and the emphasis in this book is getting started, and the skills you learn here will be useful when you move to bigger datasets. @@ -187,13 +187,13 @@ There are many forums where you can do this, including: - The large and general purpose programming Q&A site [stackoverflow.com](https://stackoverflow.com/) - Online forums associated with a particular entity, such as the [RStudio Community](https://community.rstudio.com/), the [rOpenSci Discuss](https://discuss.ropensci.org/) web forum and forums associated with particular software tools such as the [Stan](https://discourse.mc-stan.org/) forum - Software development platforms such as GitHub, which hosts issue trackers for the majority of R-spatial packages and also, increasingly, built-in discussion pages such as that created to encourage discussion (not just bug reporting) around the **sfnetworks** package (see [luukvdmeer/sfnetworks/discussions](https://github.com/luukvdmeer/sfnetworks/discussions/)) -- Online chat rooms and forums associated with communities such as the [rOpenSci](https://ropensci.org/blog/2022/09/13/contributing-ropensci/) and the [geocompx](https://geocompx.org) community (which has a [Discord server](https://discord.com/invite/PMztXYgNxp) where you can ask questions), of which this book is a part. +- Online chat rooms and forums associated with communities such as the [rOpenSci](https://ropensci.org/blog/2022/09/13/contributing-ropensci/) and the [geocompx](https://geocompx.org)\index{geocompx} community (which has a [Discord server](https://discord.com/invite/PMztXYgNxp) where you can ask questions), of which this book is a part ### Reproducible examples with **reprex** {#reprex} In terms of asking a good question, a clearly stated questions supported by an accessible and fully reproducible example is key (see also https://r4ds.hadley.nz/workflow-help.html). It is also helpful, after showing the code that 'did not work' from the user's perspective, to explain what you would like to see. -A very useful tool for creating reproducible examples is the **reprex** package. +A very useful tool for creating reproducible examples is the **reprex** package\index{reproducibility}. To highlight unexpected behaviour, you can write completely reproducible code that demonstrates the issue and then use the `reprex()` function to create a copy of your code that can be pasted into a forum or other online space. Imagine you are trying to create a map of the world with blue sea and green land. @@ -217,7 +217,6 @@ library(spData) plot(st_geometry(world), col = "green", bg = "lightblue") ``` - ```{r 16-synthesis-reprex, out.width="49%", fig.show="hold", fig.cap="A map of the world with green land, illustrating a question with a reproducible example (left) and the solution (right).", echo=FALSE, message=FALSE, warning=FALSE} library(sf) library(spData) @@ -254,7 +253,7 @@ There are good reasons for learning R as a language for geocomputation, as descr R's strengths are particularly relevant to our definition of geocomputation due to its emphasis on scientific reproducibility, widespread use in academic research and unparalleled support for statistical modeling of geographic data. Furthermore, we advocate learning one language for geocomputation in depth before delving into other languages/frameworks because of the costs associated with context switching, and R is an excellent starting point on your geocomputational journey. ] -It would be possible to study *Geocomputation with: Python*, *C++*, *JavaScript*, *Scala*\index{Scala} or *Rust*\index{Rust} in equal depth. +It would be possible to study *Geocomputation with: Python*\index{Python}, *C++*, *JavaScript*, *Scala*\index{Scala} or *Rust*\index{Rust} in equal depth. Each has evolving geospatial capabilities. [**rasterio**](https://github.com/rasterio/rasterio), for example, is a Python package with similar functionality as the **terra** package used in this book. See [*Geocomputation with Python*](https://py.geocompx.org/), for an introduction to geocomputation with Python. @@ -295,9 +294,9 @@ Both capture the essence of working with geographic data, but geocomputation has We added the final ingredient: reproducibility was barely mentioned in early work on geocomputation, yet a strong case can be made for it being a vital component of the first two ingredients. Reproducibility\index{reproducibility}: -- encourages *creativity* by shifting the focus away from the basics (which are readily available through shared code) and towards applications -- discourages people from 'reinventing the wheel': there is no need to re-do what others have done if their methods can be used by others -- makes research more conducive to real world applications, by enabling anyone in any sector to apply your methods in new areas +- Encourages *creativity* by shifting the focus away from the basics (which are readily available through shared code) and towards applications +- Discourages people from 'reinventing the wheel': there is no need to re-do what others have done if their methods can be used by others +- Makes research more conducive to real world applications, by enabling anyone in any sector to apply your methods in new areas If reproducibility is the defining asset of geocomputation (or command-line GIS) it is worth considering what makes it reproducible. This brings us to the 'open source approach', which has three main components: