forked from hadley/r-pkgs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathman.Rmd
985 lines (750 loc) · 48.9 KB
/
man.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
# Function documentation {#sec-man}
```{r, echo = FALSE}
source("common.R")
```
In this chapter, you'll learn about function documentation, which users access with `?somefunction` or `help("somefunction")`.
Base R provides a standard way of documenting a package where each function is documented in a **topic,** an `.Rd` file ("R documentation") in the `man/` directory.
`.Rd` files use a custom syntax, loosely based on LaTeX, and can be rendered to HTML, plain text, or pdf, as needed, for viewing in different contexts.
In the devtools ecosystem, we don't edit `.Rd` files directly with our bare hands.
Instead, we include specially formatted "roxygen comments" above the source code for each function[^man-1].
Then we use the [roxygen2 package](https://roxygen2.r-lib.org/index.html) to generate the `.Rd` files from these special comments[^man-2]
. There are a few advantages to using roxygen2
:
[^man-1]: The name "roxygen" is a nod to the Doxygen documentation generator, which inspired the development of an R package named roxygen.
Then that original concept was rebooted as roxygen2, similar to ggplot2.
[^man-2]: The NAMESPACE file is also generated from these roxygen comments.
Or, rather, it *can* be and that is the preferred devtools workflow (@sec-dependencies-NAMESPACE-workflow).
- Code and documentation are co-located.
When you modify your code, it's easy to remember to also update your documentation.
- You can use markdown, rather than having to learn a one-off markup language that only applies to `.Rd` files.
In addition to formatting, the automatic hyperlinking functionality makes it much, much easier to create richly linked documentation.
- There's a lot of `.Rd` boilerplate that's automated away.
- roxygen2 provides a number of tools for sharing content across documentation topics and even between topics and vignettes.
In this chapter we'll focus on documenting functions, but the same ideas apply to documenting datasets (@sec-documenting-data), classes and generics, and packages.
You can learn more about those important topics in `vignette("rd-other", package = "roxygen2")`.
## roxygen2 basics
To get started, we'll work through the basic roxygen2 workflow and discuss the overall structure of roxygen2 comments, which are organised into blocks and tags.
We also highlight the biggest wins of using markdown with roxygen2.
### The documentation workflow {#sec-man-workflow}
Unlike with testthat, there's no obvious opening move to declare that you're going to use roxygen2 for documentation.
That's because the use of roxygen2 is purely a matter of your development workflow.
It has no effect on, e.g., how a package gets checked or built.
We think the roxygen approach is the best way to generate your `.Rd` files, but officially R only cares about the files themselves, not how they came to be.
Your documentation workflow truly begins when you start to add roxygen comments above your functions.
Roxygen comment lines always start with `#'` , the usual `#` for a comment, followed immediately by a single quote `'`:
```{r}
#' Add together two numbers
#'
#' @param x A number.
#' @param y A number.
#' @returns A numeric vector.
#' @examples
#' add(1, 1)
#' add(10, 1)
add <- function(x, y) {
x + y
}
```
::: callout-tip
## RStudio
Usually you write your function first, then its documentation.
Once the function definition exists, put your cursor somewhere in it and do *Code \> Insert Roxygen Skeleton* to get a great head start on the roxygen comment.
:::
Once you have at least one roxygen comment, run `devtools::document()` to generate (or update) your package's `.Rd` files[^man-3].
Under the hood, this ultimately calls `roxygen2::roxygenise()`. The roxygen block above generates a `man/add.Rd` file that looks like this:
[^man-3]: Running `devtools::document()` also affects another field in `DESCRIPTION`, which looks like this: `RoxygenNote: 7.2.1`.
This records which version of roxygen2 was last used in a package, which makes it easier for devtools (and its underlying packages) to make an intelligent guess about when to re-`document()` a package and when to leave well enough alone.
In a collaborative setting, this also reduces nuisance changes to the `.Rd` files, by making the relevant roxygen2 version highly visible.
``` text
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/add.R
\name{add}
\alias{add}
\title{Add together two numbers}
\usage{
add(x, y)
}
\arguments{
\item{x}{A number.}
\item{y}{A number.}
}
\value{
A numeric vector.
}
\description{
Add together two numbers
}
\examples{
add(1, 1)
add(10, 1)
}
```
::: callout-tip
## RStudio
You can also run `devtools::document()` with the keyboard shortcut Ctrl/Cmd + Shift + D or via the *Build* menu or pane.
:::
If you've used LaTeX before, this should look vaguely familiar since the `.Rd` format is loosely based on LaTeX.
If you are interested in the `.Rd` format, you can read more in [Writing R Extensions](https://cran.r-project.org/doc/manuals/R-exts.html#Rd-format).
But generally you'll never need to look at `.Rd` files, except to commit them to your package's Git repository.
How does this `.Rd` file correspond to the documentation you see in R?
When you run `?add`, `help("add")`, or `example("add")`, R looks for an `.Rd` file containing `\alias{add}`.
It then parses the file, converts it into HTML, and displays it.
@fig-rendered-help-topic-add shows how this help topic would look in RStudio:
```{r}
#| label: fig-rendered-help-topic-add
#| echo: false
#| out-width: ~
#| fig-cap: >
#| Help topic rendered to HTML.
#| fig-alt: >
#| Screenshot of the help topic for the add() function.
knitr::include_graphics("images/man-add.png", dpi = 220)
```
::: callout-warning
## `R CMD check` warning
You should document all exported functions and datasets.
Otherwise, you'll get this warning from `R CMD check`:
```
W checking for missing documentation entries (614ms)
Undocumented code objects:
‘somefunction’
Undocumented data sets:
‘somedata’
All user-level objects in a package should have documentation entries.
```
Conversely, you probably don't want to document unexported functions.
If you want to use roxygen comments for internal documentation, include the `@noRd` tag to suppress the creation of the `.Rd` file.
:::
This is also a good time to explain something you may have noticed in your `DESCRIPTION` file:
```
Roxygen: list(markdown = TRUE)
```
devtools/usethis includes this by default when initiating a `DESCRIPTION` file and it gives roxygen2 a heads-up that your package uses markdown syntax in its roxygen comments.[^man-4]
[^man-4]: This is part of the explanation promised in @sec-description-custom-fields, where we also clarify that, with our current conventions, this field should really be called `Config/Needs/roxygen`, instead of `Roxygen`.
We highly recommend that you use markdown in all new packages and that you migrate older-but-actively maintained packages to markdown syntax.
In this case, you can call `usethis::use_roxygen_md()` to update `DESCRIPTION` and get a reminder about the roxygen2md package, which can help with conversion.
The default help-seeking process looks inside **installed** packages, so to see your package's documentation during development, devtools overrides the usual help functions with modified versions that know to consult the current **source** package.
To activate these overrides, you'll need to run `devtools::load_all()` at least once.
If it feels like your edits to the roxygen comments aren't having an effect, double check that you have actually regenerated the `.Rd` files with `devtools::document()` and that you've loaded your package.
When you call `?function`, you should see "Rendering development documentation ...".
To summarize, there are four steps in the basic roxygen2 workflow:
1. Add roxygen2 comments to your `.R` files.
2. Run `devtools::document()` or press Ctrl/Cmd + Shift + D to convert roxygen2 comments to `.Rd` files.
3. Preview documentation with `?function`.
4. Rinse and repeat until the documentation looks the way you want.
### roxygen2 comments, blocks, and tags {#sec-man-roxygen-comments}
Now that you understand the basic workflow, we'll go into more detail about the syntax.
roxygen2 comments start with `#'` and all the roxygen2 comments preceding a function are collectively called a **block**.
Blocks are broken up by **tags**, which look like `@tagName tagValue`, and the content of a tag extends from the end of the tag name to the start of the next tag[^man-5].
A block can contain text before the first tag which is called the **introduction**. By default, each block generates a single documentation **topic**, i.e. a single `.Rd` file[^man-6] in the `man/` directory
.
[^man-5]: Or the end of the block, if it's the last tag.
[^man-6]: The name of the file is automatically derived from the object you're documenting.
Throughout this chapter we'll show you roxygen2 comments from real tidyverse packages, focusing on [stringr](https://stringr.tidyverse.org), since the functions there tend to be fairly straightforward, leading to documentation that's understandable with relatively little context.
We attach stringr here so that its functions are hyperlinked in the rendered book (more on that in section @sec-man-key-md-features).
```{r}
library(stringr)
```
Here's a simple first example: the documentation for `str_unique()`.
```{r}
#' Remove duplicated strings
#'
#' `str_unique()` removes duplicated values, with optional control over
#' how duplication is measured.
#'
#' @param string Input vector. Either a character vector, or something
#' coercible to one.
#' @param ... Other options used to control matching behavior between duplicate
#' strings. Passed on to [stringi::stri_opts_collator()].
#' @returns A character vector, usually shorter than `string`.
#' @seealso [unique()], [stringi::stri_unique()] which this function wraps.
#' @examples
#' str_unique(c("a", "b", "c", "b", "a"))
#'
#' # Use ... to pass additional arguments to stri_unique()
#' str_unique(c("motley", "mötley", "pinguino", "pingüino"))
#' str_unique(c("motley", "mötley", "pinguino", "pingüino"), strength = 1)
#' @export
str_unique <- function(string, ...) {
...
}
```
Here the introduction includes the title ("Remove duplicated strings") and a basic description of what the function does.
The introduction is followed by five tags: two `@param`s, one `@returns`, one `@seealso`, one `@examples`, and one `@export`.
Note that the block has an intentional line length (typically the same as that used for the surrounding R code) and the second and subsequent lines of the long `@param` tag are indented, which makes the entire block easier to scan.
You can get more roxygen2 style advice in the [tidyverse style guide](https://style.tidyverse.org/documentation.html).
::: callout-tip
## RStudio
It can be aggravating to manually manage the line length of roxygen comments, so be sure to try out *Code \> Reflow Comment* (Ctrl/Cmd+Shift+/).
:::
Note also that the order in which tags appear in your roxygen comments (or even in handwritten `.Rd` files) does not dictate the order in rendered documentation.
The order of presentation is determined by tooling within base R.
The following sections go into more depth for the most important tags.
We start with the introduction, which provides the title, description, and details.
Then we cover the inputs (the function arguments), outputs (the return value), and examples.
Next we discuss links and cross-references, then finish off with techniques for sharing documentation between topics.
### Key markdown features {#sec-man-key-md-features}
For the most part, general markdown and R Markdown knowledge suffice for taking advantage of markdown in roxygen2.
But there are a few pieces of syntax that are so important we want to highlight them here.
You'll see these in many of the examples in this chapter.
**Backticks for inline code**: Use backticks to format a piece of text as code, i.e. in a fixed width font.
Example:
```{r}
#' I like `thisfunction()`, because it's great.
```
**Square brackets for an auto-linked function**: Enclose text like `somefunction()` and `somepackage::somefunction()` in square brackets to get an automatic link to that function's documentation.
Be sure to include the trailing parentheses, because it's good style and and it causes the function to be formatted as code, i.e. you don't need to add backticks.
Example:
```{r}
#' It's obvious that `thisfunction()` is better than [otherpkg::otherfunction()]
#' or even our own [olderfunction()].
```
**Vignettes**: If you refer to a vignette with an inline call to `vignette("some-topic")`, it serves a dual purpose.
First, this is literally the R code you would execute to view a vignette locally.
But wait there's more!
In many rendered contexts, this automatically becomes a hyperlink to that same vignette in a pkgdown website.
Here we use that to link to some very relevant vignettes[^man-7]:
[^man-7]: These calls include an explicit specification of `package = "somepackage"`, since it can't be inferred from context, i.e. the context is a Quarto book, not package documentation.
- `vignette("rd-formatting", package = "roxygen2")`
- `vignette("reuse", package = "roxygen2")`
- `vignette("linking", package = "pkgdown")`
**Lists**: Bullet lists break up the dreaded "wall of text" and can make your documentation easier to scan.
You can use them in the description of the function or of an argument and also for the return value.
It is not necessary to include a blank line before the list, but that is also allowed.
```{r}
#' Best features of `thisfunction()`:
#' * Smells nice
#' * Has good vibes
```
## Title, description, details
The introduction provides a title, description, and, optionally, details, for the function.
While it's possible to use explicit tags in the introduction, we usually rely on implicit tags when possible:
- The **title** is taken from the first sentence.
It should be written in sentence case, not end in a full stop, and be followed by a blank line.
The title is shown in various function indexes (e.g. `help(package = "somepackage")`) and is what the user will usually see when browsing multiple functions.
- The **description** is taken from the next paragraph.
It's shown at the top of documentation and should briefly describe the most important features of the function.
- Additional **details** are anything after the description.
Details are optional, but can be any length so are useful if you want to dig deep into some important aspect of the function.
Note that, even though the details come right after the description in the introduction, they appear much later in rendered documentation.
The following sections describe each component in more detail, and then discuss a few useful related tags.
### Title
When writing the title, it's useful to think about how it will appear in the reference index.
When a user skims the index, how will they know which functions will solve their current problem?
This requires thinking about what your functions have in common (which doesn't need to be repeated in every title) and what is unique to that function (which should be highlighted in the title).
When we wrote this chapter, we found the function titles for stringr to be somewhat disappointing.
But they provide a useful negative case study:
- `str_detect()`: Detect the presence or absence of a pattern in a string
- `str_extract()`: Extract matching patterns from a string
- `str_locate()`: Locate the position of patterns in a string
- `str_match()`: Extract matched groups from a string
There's a lot of repetition ("pattern", "from a string") and the verb used for the function name is repeated in the title, so if you don't understand the function already, the title seems unlikely to help much.
Hopefully we'll have improved those titles by the time you read this!
In contrast, these titles from dplyr are much better[^man-8]:
[^man-8]: Like all the examples, these might have changed a bit since we wrote this book, because we're constantly striving to do better.
You might compare what's in the book to what we now use, and consider if you think if it's an improvement.
- `mutate()`: Create, modify, and delete columns
- `summarise()`: Summarise each group down to one row
- `filter()`: Keep rows that match a condition
- `select()`: Keep or drop columns using their names and types
- `arrange()`: Order rows using column values
Here we try to succinctly describe what the function does, making sure to describe whether it affects rows, columns, or groups.
We do our best to use synonyms, instead of repeating the function name, to hopefully give folks another chance to understand the intent of the function.
### Description
The purpose of the description is to summarize the goal of the function, usually in a single paragraph.
This can be challenging for simple functions, because it can feel like you're just repeating the title of the function.
Try to find a slightly different wording, if you can.
It's okay if this feels a little repetitive; it's often useful for users to see the same thing expressed in two different ways.
It's a little extra work, but the extra effort is often worth it.
Here's the description for `str_detect()`:
```{r}
#' Detect the presence/absence of a match
#'
#' `str_detect()` returns a logical vector with `TRUE` for each element of
#' `string` that matches `pattern` and `FALSE` otherwise. It's equivalent to
#' `grepl(pattern, string)`.
```
If you want more than one paragraph, you must use an explicit `@description` tag to prevent the second (and subsequent) paragraphs from being turned into the `@details`.
Here's a two-paragraph `@description` from `str_view()`:
```{r}
#' View strings and matches
#'
#' @description
#' `str_view()` is used to print the underlying representation of a string and
#' to see how a `pattern` matches.
#'
#' Matches are surrounded by `<>` and unusual whitespace (i.e. all whitespace
#' apart from `" "` and `"\n"`) are surrounded by `{}` and escaped. Where
#' possible, matches and unusual whitespace are coloured blue and `NA`s red.
```
Here's another example from `str_like()`, which has a bullet list in `@description`:
```{r}
#' Detect a pattern in the same way as `SQL`'s `LIKE` operator
#'
#' @description
#' `str_like()` follows the conventions of the SQL `LIKE` operator:
#'
#' * Must match the entire string.
#' * `_` matches a single character (like `.`).
#' * `%` matches any number of characters (like `.*`).
#' * `\%` and `\_` match literal `%` and `_`.
#' * The match is case insensitive by default.
```
Basically, if you're going to include an empty line in your description, you'll need to use an explicit `@description` tag.
Finally, it's often particularly hard to write a good description if you've just written the function, because the purpose often seems very obvious.
Do your best, and then come back later, when you've forgotten exactly what the function does.
Once you've re-derived what the function does, you'll be able to write a better description.
### Details
The `@details` are just any additional details or explanation that you think your function needs.
Most functions don't need details, but some functions need a lot.
If you have a lot of information to convey, it's a good idea to use informative markdown headings to break the details up into manageable sections[^man-9].
Here's an example from `dplyr::mutate()`. We've elided some of the details to keep this example short, but you should still get a sense of how we used headings to break up the content in to skimmable chunks:
[^man-9]: In older code, you might see the use of `@section title:` which was used to create sections before roxygen2 had full markdown support.
If you've used these in the past, you can now turn them into markdown headings.
```{r}
#' Create, modify, and delete columns
#'
#' `mutate()` creates new columns that are functions of existing variables.
#' It can also modify (if the name is the same as an existing
#' column) and delete columns (by setting their value to `NULL`).
#'
#' @section Useful mutate functions:
#'
#' * [`+`], [`-`], [log()], etc., for their usual mathematical meanings
#'
#' ...
#'
#' @section Grouped tibbles:
#'
#' Because mutating expressions are computed within groups, they may
#' yield different results on grouped tibbles. This will be the case
#' as soon as an aggregating, lagging, or ranking function is
#' involved. Compare this ungrouped mutate:
#'
#' ...
```
This is a good time to remind ourselves that, even though a heading like `Useful mutate functions` in the example above comes immediately after the description in the roxygen block, the content appears much later in the rendered documentation.
The details (whether they use section headings or not) appear after the function usage, arguments, and return value.
## Arguments
For most functions, the bulk of your work will go towards documenting how each argument affects the output of the function.
For this purpose, you'll use `@param` (short for parameter, a synonym of argument) followed by the argument name and a description of its action.
The highest priority is to provide a succinct summary of the allowed inputs and what the parameter does.
For example, here's how `str_detect()` documents `string`:
```{r}
#' @param string Input vector. Either a character vector, or something
#' coercible to one.
```
And here are three of the arguments to `str_flatten()`:
```{r}
#' @param collapse String to insert between each piece. Defaults to `""`.
#' @param last Optional string to use in place of the final separator.
#' @param na.rm Remove missing values? If `FALSE` (the default), the result
#' will be `NA` if any element of `string` is `NA`.
```
Note that `@param collapse` and `@param na.rm` describe their default arguments.
This is often a good practice because the function usage (which shows the default values) and the argument description are often quite far apart in the rendered documentation.
But there are downsides.
The main one is that this duplication means you'll need to make updates in two places if you change the default value; we believe this small amount of extra work is worth it to make the life of the user easier.
If an argument has a fixed set of possible parameters, you should list them.
If they're simple, you can just list them in a sentence, like in `str_trim()`:
```{r}
#' @param side Side on which to remove whitespace: `"left"`, `"right"`, or
#' `"both"` (the default).
```
If they need more explanation, you might use a bulleted list, as in `str_wrap()`:
```{r}
#' @param whitespace_only A boolean.
#' * `TRUE` (the default): wrapping will only occur at whitespace.
#' * `FALSE`: can break on any non-word character (e.g. `/`, `-`).
```
The documentation for most arguments will be relatively short, often one or two sentences.
But you should take as much space as you need, and you'll see some examples of multi-paragraph argument docs shortly.
### Multiple arguments
If the behavior of multiple arguments is tightly coupled, you can document them together by separating the names with commas (with no spaces).
For example, `x` and `y` are interchangeable in `str_equal()`, so they're documented together:
```{r}
#' @param x,y A pair of character vectors.
```
In `str_sub()`, `start` and `end` define the range of characters to replace.
But instead of supplying both, you can use just `start` if you pass in a two-column matrix.
So it makes sense to document them together:
```{r}
#' @param start,end A pair of integer vectors defining the range of characters
#' to extract (inclusive).
#'
#' Alternatively, instead of a pair of vectors, you can pass a matrix to
#' `start`. The matrix should have two columns, either labelled `start`
#' and `end`, or `start` and `length`.
```
In `str_wrap()`, `indent` and `exdent` define the indentation for the first line and all subsequent lines, respectively:
```{r}
#' @param indent,exdent A non-negative integer giving the indent for the
#' first line (`indent`) and all subsequent lines (`exdent`).
```
### Inheriting arguments
If your package contains many closely related functions, it's common for them to have arguments that share the same name and meaning.
It would be both annoying and error prone to copy and paste the same `@param` documentation to every function, so roxygen2 provides `@inheritParams` which allows you to inherit argument documentation from another function, possibly even in another package.
stringr uses `@inheritParams` extensively because most functions have `string` and `pattern` arguments.
The detailed and definitive documentation belongs to `str_detect()`:
```{r}
#' @param string Input vector. Either a character vector, or something
#' coercible to one.
#' @param pattern Pattern to look for.
#'
#' The default interpretation is a regular expression, as described in
#' `vignette("regular-expressions")`. Use [regex()] for finer control of the
#' matching behaviour.
#'
#' Match a fixed string (i.e. by comparing only bytes), using
#' [fixed()]. This is fast, but approximate. Generally,
#' for matching human text, you'll want [coll()] which
#' respects character matching rules for the specified locale.
#'
#' Match character, word, line and sentence boundaries with
#' [boundary()]. An empty pattern, "", is equivalent to
#' `boundary("character")`.
```
Then the other stringr functions use `@inheritParams str_detect` to get this detailed documentation for `string` and `pattern` without having to duplicate that text.
`@inheritParams` only inherits docs for arguments that the function actually uses and that aren't already documented, so you can document some arguments locally and inherit others.
`str_match()` uses this to inherit `str_detect()`'s standard documentation for the `string` argument, while providing its own specialized documentation for `pattern`:
```{r}
#' @inheritParams str_detect
#' @param pattern Unlike other stringr functions, `str_match()` only supports
#' regular expressions, as described `vignette("regular-expressions")`.
#' The pattern should contain at least one capturing group.
```
Now that we've discussed default values and inheritance we can bring up one more dilemma.
Sometimes there's tension between giving detailed information on an argument (acceptable values, default value, how the argument is used, etc.) and making the documentation amenable to reuse in other functions (which might differ in some specifics).
This can motivate you to assess whether it's truly worth it for related functions to handle the same input in different ways or if standardization would be beneficial.
You can inherit documentation from a function in another package by using the standard `::` notation, i.e. `@inheritParams anotherpackage::function`.
This does introduce one small annoyance: now the documentation for your package is no longer self-contained and the version of `anotherpackage` can affect the generated docs.
Beware of spurious diffs introduced by contributors who run `devtools::document()` with a different installed version of the inherited-from package.
## Return value {#sec-man-returns}
A function's output is as important as its inputs.
Documenting the output is the job of the `@returns`[^man-10] tag.
Here the priority is to describe the overall "shape" of the output, i.e. what sort of object it is, and its dimensions (if that makes sense).
For example, if your function returns a vector you might describe its type and length, or if your function returns a data frame you might describe the names and types of the columns and the expected number of rows.
[^man-10]: For historical reasons, you can also use `@return`, but we now favor `@returns` because it reads more naturally.
The `@returns` documentation for functions in stringr is straightforward because almost all functions return some type of vector with the same length as one of the inputs.
For example, here's how `str_like()` describes its output:
```{r}
#' @returns A logical vector the same length as `string`.
```
A more complicated case is the joint documentation for `str_locate()` and `str_locate_all()`[^man-11].
`str_locate()` returns an integer matrix, and `str_locate_all()` returns a list of matrices, so the text needs to describe what determines the rows and columns.
[^man-11]: We'll come back how to document multiple functions in one topic in @sec-man-multiple-functions.
```{r}
#' @returns
#' * `str_locate()` returns an integer matrix with two columns and
#' one row for each element of `string`. The first column, `start`,
#' gives the position at the start of the match, and the second column, `end`,
#' gives the position of the end.
#'
#'* `str_locate_all()` returns a list of integer matrices with the same
#' length as `string`/`pattern`. The matrices have columns `start` and `end`
#' as above, and one row for each match.
#' @seealso
#' [str_extract()] for a convenient way of extracting matches,
#' [stringi::stri_locate()] for the underlying implementation.
```
In other cases it can be easier to figure out what to highlight by thinking about the set of functions and how they differ.
For example, most dplyr functions return a data frame, so just saying `@returns A data frame` is not very useful.
Instead, we tried to identify exactly what makes each function different.
We decided it makes sense to describe each function in terms of how it affects the rows, the columns, the groups, and the attributes.
For example, this describes the return value of `dplyr::filter()`:
```{r}
#' @returns
#' An object of the same type as `.data`. The output has the following properties:
#'
#' * Rows are a subset of the input, but appear in the same order.
#' * Columns are not modified.
#' * The number of groups may be reduced (if `.preserve` is not `TRUE`).
#' * Data frame attributes are preserved.
```
`@returns` is also a good place to describe any important warnings or errors that the user might see.
For example `readr::read_csv()` mentions what happens if there are any parsing problems:
```{r}
#' @returns A [tibble()]. If there are parsing problems, a warning will alert you.
#' You can retrieve the full details by calling [problems()] on your dataset.
```
::: callout-warning
## Submitting to CRAN
For your initial CRAN submission, all functions must document their return value.
While this may not be scrutinized in subsequent submissions, it's still a good practice.
There's currently no way to check that you've documented the return value of every function (we're [working on it](https://github.com/r-lib/roxygen2/issues/1334)) which is why you'll notice some tidyverse functions lack output documentation.
But we certainly aspire to provide this information across the board.
:::
## Examples {#sec-man-examples}
Describing what a function does is great, but *showing* how it works is even better.
That's the role of the `@examples` tag, which uses executable R code to demonstrate what a function can do.
Unlike other parts of the documentation where we've focused mainly on what you should write, here we'll briefly give some content advice and then focus mainly on the mechanics.
The main dilemma with examples is that you must jointly satisfy two requirements:
- Your example code should be readable and realistic.
Examples are documentation that you provide for the benefit of the user, i.e. a real human, working interactively, trying to get their actual work done with your package.
- Your example code must run without error and with no side effects in many non-interactive contexts over which you have limited or no control, such as when CRAN runs `R CMD check` or when your package website is built via GitHub Actions.
It turns out that there is often tension between these goals and you'll need to find a way to make your examples as useful as you can for users, while also satisfying the requirements of CRAN (if that's your goal) or other automated infrastructure.
The mechanics of examples are complex because they must never error and they're executed in four different situations:
- Interactively using the `example()` function.
- During `R CMD check` on your computer, or another computer you control (e.g. in GitHub Actions).
- During `R CMD check` run by CRAN.
- When your pkgdown website is being built, often via GitHub Actions or similar.
After discussing what to put in your examples, we'll talk about keeping your examples self-contained, how to display errors if needed, handling dependencies, running examples conditionally, and alternatives to the `@examples` tag for including example code.
::: callout-tip
## RStudio
When preparing `.R` scripts or `.Rmd` / `.qmd` reports, it's handy to use Ctrl/Cmd + Enter or the *Run* button to send a line of R code to the console for execution.
Happily, you can use the same workflow for executing and developing the `@examples` in your roxygen comments.
Remember to do `devtools::load_all()` often, to stay synced with the package source.
:::
### Contents
Use examples to first show the basic operation of the function, then to highlight any particularly important properties.
For example, `str_detect()` starts by showing a few simple variations and then highlights a feature that's easy to miss: as well as passing a vector of strings and one pattern, you can also pass one string and vector of patterns.
```{r}
#' @examples
#' fruit <- c("apple", "banana", "pear", "pineapple")
#' str_detect(fruit, "a")
#' str_detect(fruit, "^a")
#' str_detect(fruit, "a$")
#'
#' # Also vectorised over pattern
#' str_detect("aecfg", letters)
```
Try to stay focused on the most important features without getting into the weeds of every last edge case: if you make the examples too long, it becomes hard for the user to find the key application that they're looking for.
If you find yourself writing very long examples, it may be a sign that you should write a vignette instead.
There aren't any formal ways to break up your examples into sections but you can use sectioning comments that use many `---` to create a visual breakdown.
Here's an example from `tidyr::chop()`:
```{r}
#' @examples
#' # Chop ----------------------------------------------------------------------
#' df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1)
#' # Note that we get one row of output for each unique combination of
#' # non-chopped variables
#' df %>% chop(c(y, z))
#' # cf nest
#' df %>% nest(data = c(y, z))
#'
#' # Unchop --------------------------------------------------------------------
#' df <- tibble(x = 1:4, y = list(integer(), 1L, 1:2, 1:3))
#' df %>% unchop(y)
#' df %>% unchop(y, keep_empty = TRUE)
```
Strive to keep the examples focused on the specific function that you're documenting.
If you can make the point with a familiar built-in dataset, like `mtcars`, do so.
If you find yourself needing to do a bunch of setup to create a dataset or object to use in the example, it may be a sign that you need to create a package dataset or even a helper function.
See @sec-data, @sec-data-example-path-helper, and @sec-testing-advanced-fixture-helper for ideas.
Making it easy to write (and read) examples will greatly improve the quality of your documentation.
Also, remember that examples are not tests.
Examples should be focused on the authentic and typical usage you've designed for and that you want to encourage.
The test suite is the more appropriate place to exhaustively exercise all of the arguments and to explore weird, pathological edge cases.
### Leave the world as you found it
Your examples should be self-contained.
For example, this means:
- If you modify `options()`, reset them at the end of the example.
- If you create a file, create it somewhere in `tempdir()`, and make sure to delete it at the end of the example.
- Don't change the working directory.
- Don't write to the clipboard (unless a user is present to provide some form of consent).
This has a lot of overlap with our recommendations for tests (see section @sec-testing-design-self-contained) and even for the R functions in your package (see section @sec-code-r-landscape).
However, due to the way that examples are run during `R CMD check`, the tools available for making examples self-contained are much more limited.
Unfortunately, you can't use the withr package or even `on.exit()` to schedule clean up, like restoring options or deleting a file.
Instead, you'll need to do it by hand.
If you can avoid doing something that must then be undone, that is the best way to go and this is especially true for examples.
These constraints are often in tension with good documentation, if you're trying to document a function that somehow changes the state of the world.
For example, you have to "show your work", i.e. all of your code, which means that your users will see all of the setup and teardown, even it is not typical for authentic usage.
If you're finding it hard to follow the rules, this might be another sign to switch to a vignette (see @sec-vignettes).
::: callout-warning
## Submitting to CRAN
Many of these constraints are also mentioned in the [CRAN repository policy](https://cran.r-project.org/web/packages/policies.html), which you must adhere to when submitting to CRAN.
Use find in page to locate "malicious or anti-social" to see the details.
:::
Additionally, you want your examples to send the user on a short walk, not a long hike.
Examples need to execute relatively quickly so users can quickly see the results, it doesn't take ages to build your website, automated checks happen quickly, and it doesn't take up computing resources when submitting to CRAN.
::: callout-warning
## Submitting to CRAN
All examples must run in under 10 minutes.
:::
### Errors {#sec-man-examples-errors}
Your examples cannot throw any errors, so don't include flaky code that can fail for reasons beyond your control.
In particular, it's best to avoid accessing websites, because `R CMD check` will fail whenever the website is down.
What can you do if you want to include code that causes an error for the purposes of teaching?
There are two basic options:
- You can wrap the code in `try()` so that the error is shown, but doesn't stop execution of the examples.
For example, `dplyr::bind_cols()` uses `try()` to show you what happens if you attempt to column-bind two data frames with different numbers of rows:
```{r}
#' @examples
#' ...
#' # Row sizes must be compatible when column-binding
#' try(bind_cols(tibble(x = 1:3), tibble(y = 1:2)))
```
- You can wrap the code in `\dontrun{}`[^man-12], so it is never run by `example()`. The example above would look like this if you used `\dontrun{}` instead of `try()`.
```{r}
#' # Row sizes must be compatible when column-binding
#' \dontrun{
#' bind_cols(tibble(x = 1:3), tibble(y = 1:2)))
#' }
```
[^man-12]: You used to be able to use `\donttest{}` for a similar purpose, but we no longer recommend it because CRAN sets a special flag that causes the code to be executed anyway.
We generally recommend using `try()` so that the reader can see an example of the error in action.
::: callout-warning
## Submitting to CRAN
For the initial CRAN submission of your package, all functions must have at least one example and the example code can't all be wrapped inside `\dontrun{}`.
If the code can only be run under specific conditions, use the techniques below to express those pre-conditions.
:::
### Dependencies and conditional execution {#sec-man-examples-dependencies-conditional-execution}
An additional source of errors in examples is the use of external dependencies: you can only use packages in your examples that your package formally depends on (i.e. that appear in `Imports` or `Suggests`).
Furthermore, example code is run in the user's environment, not the package environment, so you'll have to either explicitly attach the dependency with `library()` or refer to each function with `::`.
For example, dbplyr is a dplyr extension package, so all of its examples start with `library(dplyr)`:
```{r}
#' @examples
#' library(dplyr)
#' df <- data.frame(x = 1, y = 2)
#'
#' df_sqlite <- tbl_lazy(df, con = simulate_sqlite())
#' df_sqlite %>% summarise(x = sd(x, na.rm = TRUE)) %>% show_query()
```
In the past, we recommended only using code from suggested packages inside a block like this:
```{r}
#' @examples
#' if (requireNamespace("suggestedpackage", quietly = TRUE)) {
#' # some example code
#' }
```
We no longer believe that approach is a good idea, because:
- Our policy is to expect that suggested packages are installed when running `R CMD check`[^man-13] and this informs what we do in examples, tests, and vignettes.
- The cost of putting example code inside `{ … }` is high: you can no longer see intermediate results, such as when the examples are rendered in the package's website. The cost of a package not being installed is low: users can usually recognize the associated error and resolve it themselves, i.e. by installing the missing package.
[^man-13]: This is certainly true for CRAN and is true in most other automated checking scenarios, such as our GitHub Actions workflows.
In other cases, your example code may depend on something other than a package.
For example, if your examples talk to a web API, you probably only want to run them for an authenticated user, and never want such code to run on CRAN.
In this case, you really do need conditional execution.
The entry-level solution is to express this explicitly:
```{r}
#' @examples
#' if (some_condition()) {
#' # some example code
#' }
```
The condition could be quite general, such as `interactive()`, or very specific, such as a custom predicate function provided by your package.
But this use of `if()` still suffers from the downside highlighted above, where the rendered examples don't clearly show what's going on inside the `{ … }` block.
The `@examplesIf` tag is a great alternative to `@examples` in this case:
```{r}
#' @examplesIf some_condition()
#' some_other_function()
#' some_more_functions()
```
This looks almost like the snippet just above, but has several advantages:
- Users won't actually see the `if() { … }` machinery when they are reading your documentation from within R or on a pkgdown website.
Users only see realistic code.
- The example code renders fully in pkgdown.
- The example code runs when it should and does not run when it should not.
- This doesn't run afoul of CRAN's prohibition of putting all your example code inside `\dontrun{}`.
For example, [googledrive](https://googledrive.tidyverse.org/reference/index.html) uses `@examplesIf` in almost every function, guarded by `googledrive::drive_has_token()`.
Here's how the examples for `googledrive::drive_publish()` begin:
```{r}
#' @examplesIf drive_has_token()
#' # Create a file to publish
#' file <- drive_example_remote("chicken_sheet") %>%
#' drive_cp()
#'
#' # Publish file
#' file <- drive_publish(file)
#' ...
```
The example code doesn't run on CRAN, because there's no token.
It does run when the pkgdown site is built, because we can set up a token securely.
And, if a normal user executes this code, they'll be prompted to sign in to Google, if they haven't already.
### Intermixing examples and text
An alternative to examples is to use R Markdown code blocks elsewhere in your roxygen comments, either ```` ```R ```` if you just want to show some code, or ```` ```{r} ```` if you want the code to be run.
These can be effective techniques but there are downsides to each:
- The code in ```` ```R ```` blocks is never run; this means it's easy to accidentally introduce syntax errors or to forget to update it when your package changes.
- The code in ```` ```{r} ```` blocks is run every time you document the package. This has the nice advantage of including the output in the documentation (unlike examples), but the code can't take very long to run or your iterative documentation workflow will become quite painful.
## Re-using documentation
roxygen2 provides a number of features that allow you to reuse documentation across topics.
They are documented in `vignette("reuse", package = "roxygen2")`, so here we'll focus on the three most important:
- Documenting multiple functions in one topic.
- Inheriting documentation from another topic.
- Using child documents to share prose between topics, or to share between documentation topics and vignettes.
### Multiple functions in one topic {#sec-man-multiple-functions}
By default, each function gets its own documentation topic, but if two functions are very closely connected, you can combine the documentation for multiple functions into a single topic.
For example, take `str_length()` and `str_width()`, which provide two different ways of computing the size of a string.
As you can see from the description, both functions are documented together, because this makes it easier to see how they differ:
```{r}
#' The length/width of a string
#'
#' @description
#' `str_length()` returns the number of codepoints in a string. These are
#' the individual elements (which are often, but not always letters) that
#' can be extracted with [str_sub()].
#'
#' `str_width()` returns how much space the string will occupy when printed
#' in a fixed width font (i.e. when printed in the console).
#'
#' ...
str_length <- function(string) {
...
}
```
To merge the two topics, `str_width()` uses `@rdname str_length` to add its documentation to an existing topic:
```{r}
#' @rdname str_length
str_width <- function(string) {
...
}
```
This technique works best for functions that have a lot in common, i.e. similar return values and examples, in addition to similar arguments.
### Inheriting documentation
In other cases, functions in a package might share many related behaviors, but aren't closely enough connected that you want to document them together.
We've discussed `@inheritParams` above, but there are three variations that allow you to inherit other things:
- `@inherit source_function` will inherit all supported components from `source_function()`.
- `@inheritSection source_function Section title` will inherit the single section with title "Section title" from `source_function()`.
- `@inheritDotParams` automatically generates parameter documentation for `...` for the common case where you pass `...` on to another function.
See <https://roxygen2.r-lib.org/articles/reuse.html#inheriting-documentation> for more details.
### Child documents
Finally, you can reuse the same `.Rmd` or `.md` document in the function documentation, `README.Rmd`, and vignettes by using R Markdown child documents.
The syntax looks like this:
```{r}
#' ```{r child = "man/rmd/filename.Rmd"}
#' ```
```
This is a feature we use very sparingly in the tidyverse, but one place we do use it is in dplyr, because a number of functions use the same syntax as `select()` and we want to provide all the info in one place:
```{r}
#' # Overview of selection features
#'
#' ```{r, child = "man/rmd/overview.Rmd"}
#' ```
```
Then `man/rmd/overview.Rmd` contains the repeated markdown:
``` md
Tidyverse selections implement a dialect of R where operators make
it easy to select variables:
- `:` for selecting a range of consecutive variables.
- `!` for taking the complement of a set of variables.
- `&` and `|` for selecting the intersection or the union of two
sets of variables.
- `c()` for combining selections.
...
```
If the Rmd file contains roxygen (Markdown-style) links to other help topics, then some care is needed.
See <https://roxygen2.r-lib.org/dev/articles/reuse.html#child-documents> for details.
## Help topic for the package {#sec-man-package-doc}
This chapter focuses on function documentation, but remember you can document other things, as detailed in `vignette("rd-other", package = "roxygen2")`.
In particular, you can create a help topic for the package itself by documenting the special sentinel `"_PACKAGE"`.
The resulting `.Rd` file automatically pulls in information parsed from the `DESCRIPTION`, including title, description, list of authors, and useful URLs.
This help topic appears alongside all your other topics and can also be accessed with `package?pkgname`, e.g. `package?usethis`, or even just `?usethis`.
We recommend calling `usethis::use_package_doc()` to set up this package-level documentation in a dummy file `R/{pkgname}-package.R`, whose contents will look something like this:
```{r, eval = FALSE}
#' @keywords internal
"_PACKAGE"
```
The `R/{pkgname}-package.R` file is the main reason we wanted to mention `use_package_doc()` and package-level documentation here.
It turns out there are a few other package-wide housekeeping tasks for which this file is a very natural home.
For example, it's a sensible, central location for import directives, i.e. for importing individual functions from your dependencies or even entire namespaces.
In @sec-dependencies-in-imports-r-code, we recommend importing specific functions via `usethis::use_import_from()` and this function is designed to write the associated roxygen tags into the `R/{pkgname}-package.R` file created by `use_package_doc()`.
So, putting it all together, this is a minimal example of how the `R/{pkgname}-package.R` file might look:
```{r, eval = FALSE}
#' @keywords internal
"_PACKAGE"
# The following block is used by usethis to automatically manage
# roxygen namespace tags. Modify with care!
## usethis namespace: start
#' @importFrom glue glue_collapse
## usethis namespace: end
NULL
```