Packages are the way to share R code in a structures, reproducible and tractable way. Even if the intend is not to disseminate your code, packaging it is worth it. Packages provide a mechanism for loading optional code and attached documentation as needed.
- logically group your own functions
- keep code and documentation together and consistent
- keep code and data together
- keep track of changes in code
- summarise all packages used for a analysis (see
sessionInfo()
) - make a reproducible research compendium (container for code, text, data as a means for distributing, managing and updating)
- optionally test your code
- ... project managment
- R packages, by Hadley Wickham
- R Installation and Administration [R-admin], R Core team
- Writing R Extensions [R-ext], R Core team
Use
help.start()
to access them from your local installation, or http://cran.r-project.org/manuals.html from the web.
A package is loaded from a library by the function library()
.
Thus a library is a directory containing installed packages.
Calling
library("foo", lib.loc = "/path/to/bar")
loads the package (book)foo
from the librarybar
located at/path/to/bar
.
library("devtools")
## Loading required package: usethis
library("roxygen2")
Basic workflow
- Prepare R code (see tip below)
- Create package directory:
mypackage
- Build the package tarball
- Check the package
- Install the package
Step 2 is done only once. Package developement cycles through 3 - 5.
Also
- Writing package documentation
- Vignettes
- Testing packages
- Compiled code
Tip: do not wait until there is too much code to write a package. The ideal project starts with planning, and not with a dive straight into coding. In the frame of this course, this translates into the creating of a package that will be, as the project evolves, be populated with new functions, classes, methods, data, ... and their documentation.
fn <- function()
message("I love R packages")
We can use
package.skeleton("myRpackage", list = "fn")
devtools::create("myRpackage")
: creates a fully functional package with an an.Rproj
file.- Use the RStudio wizard:
New Project > New Directory > R Package
myRpackage/
|-- DESCRIPTION
|-- NAMESPACE
|-- man
| `-- fun.Rd
`-- R
`-- fun.R
This is the source package. From this, we need to create the package tarball (or package bundle), i.e. a compressed archive of the source. We can also create binary packages for Windows and Mac.
Tip: While not directly related to package development, it is also
useful to already think about collaboration and dissemination at this
stage. A good solution is to use Github, Bitbucket, Gitlab or similar
online code versioning infrastructures. Such an online repository
(that can be private in a first instance, if necessary) will enable
collaboration (through issues) and tracking (code versioning) and,
once public, installation using devtools::install()
. When using a
repository, it is also very useful to add a README.md
file that will
be used as default landing page for public repositories.
In the shell
R CMD build myPackage ## creates myRpackage_1.0.tar.gz
R CMD check myPackage_1.0.tar.gz ## create myRpackage.Rcheck
R CMD INSTALL myRpackage_1.0.tar.gz ## Installation in the default library
Using RStudio useful keyboard shortcuts for package authoring:
- Build and Reload Package:
Ctrl + Shift + B
- Check Package:
Ctrl + Shift + E
- Test Package:
Ctrl + Shift + T
Using devtools:
devtools::build()
devtools::build(binary = TRUE)
devtools::check()
devtools::install()
A shortcut when developing:
devtools::load_all()
that will load/update all the code in the R package into the package's namespace (environment).
The DESCRIPTION
file
Package: myRpackage ## mandatory (*)
Type: Package ## optional, 'Package' is default type
Title: What the package does (short line) ## *
Version: 1.0 ## *
Date: 2013-05-10 ## release date of the current version
Author: Who wrote it ## *
Maintainer: Who to complain to <[email protected]> ## *
Description: More about what it does (maybe more than one line) ## *
License: What license is it under? ## *
Depends: methods, Biostrings ## for ex
Imports: evd ## for ex
Suggests: BSgenome.Hsapiens.UCSC.hg19 ## for ex
Collate: 'DataClasses.R' 'read.R' ## for ex
Package dependencies:
- Depends A comma-separated list of package names (optionally with versions) which this package depends on.
- Suggests Packages that are not necessarily needed: used only in examples, tests or vignettes, loaded in the body of functions
- Imports Packages whose name spaces are imported from (as
specified in the
NAMESPACE
file) which do not need to be attached to the search path. - Collate Controls the collation order for the R code files in a package. If filed is present, all source files must be listed.
Packages are attached to the search path with library()
or
require()
.
Tip In your scripts, always use library()
rather than
require()
. The former will stop immediately with an error in case of
a missing package. The latter will only return a warning and will lead
to issues later on. require()
is generally used inside functions
with an if
condition
try(library("not_a_package"))
## Error in library("not_a_package") :
## there is no package called 'not_a_package'
require("not_a_package")
## Loading required package: not_a_package
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'not_a_package'
if (!require("not_a_package")) {
message("Package not available")
## possibly do something here
stop("Evacuate ship!")
}
## Loading required package: not_a_package
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'not_a_package'
## Package not available
## Error in eval(expr, envir, enclos): Evacuate ship!
-
Attach When a package is attached, then all of its dependencies (see
Depends
field in itsDESCRIPTION
file) are also attached. Such packages are part of the evaluation environment and will be searched. -
Load One can also use the
Imports
field in theNAMESPACE
file. Imported packages are loaded but are not attached: they do not appear on the search path and are available only to the package that imported them.
Restricts the symbols that are exported and imports functionality from other packages. Only the exported symbols will have to be documented.
export(f, g) ## exports f and g
exportPattern("^[^\\.]")
import(foo) ## imports all symbols from package foo
importFrom(foo, f, g) ## imports f and g from foo
It is possible to explicitely use symbol s
from package foo
with
foo::s
or foo:::s
if s
is not exported.
The roxygen2
package (see below) can also be used to manage the
namespace.
Contains source()
able R source code to be installed. Files must
start with an ASCII (lower or upper case) letter or digit and have one
of the extensions .R
, .S
, .q
, .r
, or .s
(use .R
or
.r
).
- General style guidelines and best practice apply.
- Any number of files in
R
. - Any number of functions (methods, classes) in each source file.
- Order matters (somehow), as the files will be sourced in the
alphanumeric order. If that doesn't fit, use the
collate
field in theDESCRIPTION
files.
Example
## works fine without Collate field
AllGenerics.R DataClasses.R
methods-ClassA.R methods-ClassB.R
functions-ClassA.R ...
zzz.R
is generally used to define special functions used to
initialize (called after a package is loaded and attached) and clean
up (just before the package is detached). See help(".onLoad"))
,
?.First.Lib
and ?.Last.Lib
for more details.
- vignettes directory for vignettes in Sweave or R markdown format.
- data for R code, compressed tables (
.tab
,.txt
, or .csv
, see?data
for the file formats) and binary R objects. Available withdata()
. - inst/docs for additional documentation. That's also where the vignettes will be installed after compilation.
- inst/extdata directory for other data files, not belonging in
data
. - tests code for unit tests (see here and here).
- src for compiled code (see the rccpp material)
- demo for demo code (see
?demo
)
Tip While nothing prevents from creating the directories above manually (when they are needed), it is also possible to use the usethis a workflow package, that automates some of the repetitive tasks that arise during project setup and development of R packages (and non-package projects).
Package functions, datasets, methods and classes are documented in
Rd
, a LaTeX-like format.
% File src/library/base/man/load.Rd
% Part of the R package, http://www.R-project.org
% Copyright 1995-2014 R Core Team
% Distributed under GPL 2 or later
\name{load}
\alias{load}
\title{Reload Saved Datasets}
\description{
Reload datasets written with the function \code{save}.
}
\usage{
load(file, envir = parent.frame(), verbose = FALSE)
}
\arguments{
\item{file}{a (readable binary-mode) \link{connection} or a character string
giving the name of the file to load (when \link{tilde expansion}
is done).}
\item{envir}{the environment where the data should be loaded.}
\item{verbose}{should item names be printed during loading?}
}
\details{
\code{load} can load \R objects saved in the current or any earlier
format. It can read a compressed file (see \code{\link{save}})
directly from a file or from a suitable connection (including a call
to \code{\link{url}}).
[...]
\value{
A character vector of the names of objects created, invisibly.
}
\section{Warning}{
Saved \R objects are binary files, even those saved with
\code{ascii = TRUE}, so ensure that they are transferred without
conversion of end of line markers. \code{load} tries to detect such a
conversion and gives an informative error message.
[...]
\examples{
## save all data
xx <- pi # to ensure there is some data
save(list = ls(all = TRUE), file= "all.RData")
rm(xx)
## restore the saved values to the current environment
local({
load("all.RData")
ls()
})
xx <- exp(1:3)
## restore the saved values to the user's workspace
load("all.RData") ## which is here *equivalent* to
## load("all.RData", .GlobalEnv)
## This however annihilates all objects in .GlobalEnv with the same names !
xx # no longer exp(1:3)
rm(xx)
attach("all.RData") # safer and will warn about masked objects w/ same name in .GlobalEnv
ls(pos = 2)
## also typically need to cleanup the search path:
detach("file:all.RData")
## clean up (the example):
unlink("all.RData")
\dontrun{
con <- url("http://some.where.net/R/data/example.rda")
## print the value to see what objects were created.
print(load(con))
close(con) # url() always opens the connection
}}
\keyword{file}
These R documentation files can then be converted into text, pdf or html:
help("load")
help("load", help_type = "html")
help("load", help_type = "pdf")
One can use prompt
, promptClass
, promptMethods
, promptPackage
,
promptPackage
, promptData
to generate Rd
templates for
functions, classes, methods, packages and data.
The way documentation is managed in R packages separates the code from
the documentation, which makes it easier to adapt the latter when to
code is updated. In comes roxygen2
, that allows developer to write
their documentation on top of their functions:
#' Reads sequences data in fasta and create \code{DnaSeq}
#' and \code{RnaSeq} instances.
#'
#' This funtion reads DNA and RNA fasta files and generates
#' valid \code{"DnaSeq"} and \code{"RnaSeq"} instances.
#'
#' @title Read fasta files.
#' @param infile the name of the fasta file which the data are to be read from.
#' @return an instance of \code{DnaSeq} or \code{RnaSeq}.
#' @seealso \code{\linkS4class{GenericSeq}}, \code{\linkS4class{DnaSeq}} and \code{\linkS4class{RnaSeq}}.
#' @examples
#' f <- dir(system.file("extdata",package="sequences"),pattern="fasta",full.names=TRUE)
#' f
#' aa <- readFasta(f)
#' aa
#' @author Laurent Gatto \email{laurent.gatto@@uclouvain.be}
#' @export
readFasta <- function(infile){
lines <- readLines(infile)
header <- grep("^>", lines)
if (length(header)>1) {
warning("Reading first sequence only.")
lines <- lines[header[1]:(header[2]-1)]
header <- header[1]
}
##### (code cut for space reasons) #####
if (validObject(newseq))
return(newseq)
}
The roxygen code can then be parsed and converted to Rd
using the
roxygen2::roxygenise
or devtools::document
functions.
Note that the roxygenise
function does more than produce
documentation (that part is handled by the rd
roclet, set with
roclet = "rd"
). It can also manage your NAMESPACE
file and
Collate
field.
Tip The Rd2roxygen package can help to switch from existing Rd manual pages to roxygen.
Tip: roxygen now also has supports for
markdown
format: once can use `foo()`
instead of \code{foo()}
. The
roxygen2md package can help to switch
to markdown syntax for existing roxygen headers.
While manual pages are meant to be specific and technical, vignettes are workflow-type documentation files that provide an overview and/or a use-case demonstrating the package's functionality.
Vignettes can be written in Sweave format (.Rnw
extension),
supporting R code chunks in LaTeX documents, or R markdown formart
(.Rmd
extension) for R code and markdown.
The source document (.Rnw
or .Rmd
) can be
- weaved into
tex
ormd
files respectively - and converted into
pdf
orhtml
(Rnw
topdf
only,Rmd
to either)
using the utils::Sweave
(Rnw
only) or knitr::knit
functions.
Rstudio makes it very easy to write and compile Rmd
documents
(independently of R packages). When inside a package, the documents
are stored in the vignettes
directory and compiled/converted
automatically when the package is built.
Note: if you use knitr
and rmarkdown
for your vignette, you'll
have to add these dependencies in the Suggests
field and specify
VignetteBuilder: knitr
in the DESCRIPTION
file.
.Rbuildignore
with a list of files/dirs to ignore when building. For example the.Rproj
file..Rinstignore
with a list of files/dirs to ignore when installing.CITATION
file (seecitation()
function)README.Rmd
/README.md
files if you use github (see also tip above).
-
CRAN Read the CRAN Repository Policy (http://cran.r-project.org/web/packages/policies.html). Upload your
--as-cran
checked}myPackage\_x.y.z.tar.gz
toftp://cran.R-project.org/incoming
or usinghttp://CRAN.R-project.org/submit.html
. Your package will be installable withinstall.packages("myRpackage")
. -
R-forge Log in, register a project and wait for acceptance. Then commit you code to the svn repository. Your package will be installable with
install.packages
usingrepos="http://R-Forge.R-project.org"
. (not recommended) -
GitHub (and Bitbucket) Great for development and promoting interaction and contributions. Unofficial. Autmatic checking possible through CI such as
travis-ci
for example. Packages can be installed withdevtools::install_github
(devtools::install_bitbucket
). -
Bioconductor Make sure to satisfy submission criteria (pass
check
(andBiocCheck
), have a vignette, use S4 if OO, make use of appropriate existing infrastructure, include a NEWS file, must not already be on CRAN, ...). Your package will then be reviewed on github publicly before acceptance. A svn (git very soon) account will then be created. Package will be installable withbiocLite("myPackage")
.
The pkgdown makes creation of package webpages straighforward. Highly recommended!