Update changelog, and add remote CLI documentation

benda1997 · Sep 13, 2017 · f96b877 · f96b877
1 parent 049947b
commit f96b877
Show file tree

Hide file tree

Showing 3 changed files with 227 additions and 61 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -22,8 +22,7 @@ For more information about this file see also [Keep a Changelog](http://keepacha
 - SIPNET output netcdf now includes LAI; some variable names changed to match standard
 - Cleanup of leap year logic, using new `PEcAn.utils::days_in_year(year)` function (#801).
 - Replace many hard-coded unit conversions with `udunits2::ud.convert` for consistency, readability, and clarity
-- Bugfixes to remote:
-    - Check that `qsub` step works, and fail loudly if it doesn't
+- Remote execution is more robust to errors in the submission process, not just the actual model execution
 
 ### Added
 - Expanded initial conditions workflow for pool-based models, including PEcAn.data.land::prepare_pools to calculate pools from IC file (to be coupled with write.configs)
@@ -42,8 +41,10 @@ For more information about this file see also [Keep a Changelog](http://keepacha
 ### Changed
 - Clean up directory structure:
     * Move `base` packages (`utils`, `settings`, `db`, `visualizaton`) to a `base` directory, for consistency with `modules` and `models`
-    * Move `logger.*` functions out of the `PEcAn.utils` package and into the `pecan.logger` package
+    * Move `logger.*` functions out of the `PEcAn.utils` package and into the `PEcAn.logger` package
+    * More `remote` functions out of the `PEcAn.utils` package and into their own `PEcAn.remote` package.
 - #1594 shiny/workflowPlot Refactoring of code. `get_workflow_ids` in db/R/query.dplyr.R changed with `ensemble = FALSE`. Also allowing to load all workflow IDs. `load_data_single_run` and `var_names_all` also moved from shiny/workflowPlot/server.R to query.dplyr.R
+- `PEcAn.remote::start.model.runs` has been significantly refactored to be less redundant and more robust
 
 ## [1.5.0] - 2017-07-13
 ### Added

diff --git a/book_source/adv_user_guide_cmd/Enabling-Remote-Execution.Rmd b/book_source/adv_user_guide_cmd/Enabling-Remote-Execution.Rmd
diff --git a/book_source/adv_user_guide_cmd/Remote-execution.Rmd b/book_source/adv_user_guide_cmd/Remote-execution.Rmd
@@ -0,0 +1,223 @@
+## Remote execution
+
+### Introduction
+
+Remote execution allows the user to leverage the power and storage of high performance computing clusters, AWS instances, or specially configured virtual machines, but without leaving their local working environment.
+PEcAn uses remote execution primarily to run ecosystem models.
+
+The infrastructure for remote execution lives in the `PEcAn.remote` package (`base/remote` in the PEcAn repository).
+
+### Basics of SSH
+
+All of the PEcAn remote infrastructure depends on the system `ssh` utility, so it's important to make sure this works before attempting the advanced remote execution functionality in PEcAn.
+
+To connect to a remote server interactively, the command is simply:
+
+```sh
+ssh <username>@<hostname>
+```
+
+For instance, my connection to the BU shared computing cluster looks like:
+
+```sh
+ssh [email protected]
+```
+
+...which will prompt me for my BU password, and, if successful, will drop me into a login shell on the remote machine.
+
+Alternatively to the login shell, `ssh` can be used to execute arbitrary code, whose output will be returned exactly as it would if you ran the command locally.
+For example, the following:
+
+```sh
+ssh [email protected] pwd
+```
+
+...will run the `pwd` command, and return the path to my home directory on the BU SCC.
+The more advanced example below will run some simple R code on the BU SCC and return the output as if it was run locally.
+
+```sh
+ssh [email protected] Rscript -e "seq(1, 10)"
+```
+
+### SSH authentication -- password vs. SSH key
+
+Because this server uses passwords for authentication, this command will then prompt me for my password.
+
+An alternative to password authentication is using SSH keys.
+Under this system, the host machine (say, your laptop, or the PEcAn VM) has to generate a public and private key pair (using the `ssh-keygen` command).
+The private key (by default, a file in `~/.ssh/id_rsa`) lives on the host machine, and should **never** be shared with anyone.
+The public key will be distributed to any remote machines to which you want the host to be able to connect.
+On each remote machine, the public key should be added to a list of authorized keys located in the `~/.ssh/authorized_keys` file (on the remote machine).
+The authorized keys list indicates which machines (technically, which keys -- a single machine, and even a single user, can have many keys) are allowed to connect to it.
+This is the system used by all of the PEcAn servers (`pecan1`, `pecan2`, `test-pecan`).
+
+### SSH tunneling
+
+SSH authentication can be more advanced than indicated above, especially on systems that require dual authentication.
+Even simple password-protection can be tricky in scripts, since (by design) it is fairly difficult to get SSH to accept a password from anything other than the raw keyboard input (i.e. SSH doesn't let you pass passwords as input or arguments, because this exposes your password as plain text).
+
+A convenient and secure way to follow SSH security protocol, but prevent having to go through the full authentication process every time, is to use SSH tunnels (or "sockets", which are effectively synonymous).
+Essentially, an SSH socket is a read- and write-protectected file that contains all of the information about an SSH connection.
+
+To create an SSH tunnel, use a command like the following:
+
+```
+ssh -n -N -f -o ControlMaster=yes -S /path/to/socket/file <username>@<hostname>
+```
+
+If appropriate, this will prompt you for your password (if using password authentication), and then will drop you back to the command line (thanks to the `-N` flag, which runs SSH without executing a command, the `-f` flag, which pushes SSH into the background, and the `-n` flag, which prevents ssh from reading any input).
+It will also create the file `/path/to/socket/file`.
+
+To use this socket with another command, use the `-S /path/to/file` flag, pointing to the same tunnel file you just created.
+
+```
+ssh -S /path/to/socket/file <hostname> <optional command>
+```
+
+This will let you access the server without any sort of authentication step.
+As before, if `<optional command>` is blank, you will be dropped into an interactive shell on the remote, or if it's a command, that command will be executed and the output returned.
+
+To close a socket, use the following:
+
+```
+ssh -S /path/to/socket/file <hostname> -O exit
+```
+
+This will delete the socket file and close the connection.
+Alternatively, a scorched earth approach to closing the SSH tunnel if you don't remember where you put the socket file is something like the following:
+
+```
+pgrep ssh   # See which processes will be killed
+pkill ssh   # Kill those processes
+```
+
+...which will kill all user processes called `ssh`.
+
+### SSH tunnels and PEcAn
+
+Many of the `PEcAn.remote` functions assume that a tunnel is already open.
+If working from the web interface, the tunnel will be opened for you by some under-the-hood PHP and Bash code, but if debugging or working locally, you will have to create the tunnel yourself.
+The best way to do this is to create the tunnel first, outside of R, as described above.
+(In the following examples, I'll use my username `ashiklom` connecting to the `test-pecan` server with a socket stored in `/tmp/testpecan`.
+To follow along, replace these with your own username and designated server, respectively).
+
+```{sh}
+ssh -nNf -o ControlMaster=yes -S /tmp/testpecan [email protected]
+```
+
+Then, in R, create a `host` object, which is just a list containing the elements `name` (hostname) and `tunnel` (path to tunnel file).
+
+```{r}
+my_host <- list(name = "test-pecan.bu.edu", tunnel = "/tmp/testpecan")
+```
+
+This host object can then be used in any of the remote execution functions.
+
+
+## Basic remote execute functions
+
+The `PEcAn.remote::remote.execute.cmd` function runs a system command on a remote server (or on the local server, if `host$name == "localhost"`).
+
+```{r}
+x <- PEcAn.remote::remote.execute.cmd(host = my_host, cmd = "echo", args = "Hello world")
+x
+```
+
+Note that `remote.execute.cmd` is similar to base R's `system2`, in that the base command (in this case, `echo`) is passed separately from its arguments (`"Hello world"`).
+Note also that the output of the remote command is returned as a character.
+
+For R code, there is a special wrapper around `remote.execute.cmd` -- `PEcAn.remote::remote.execute.R`, which runs R code on a remote and returns the output.
+R code can be passed in as a list of strings...
+
+```{r}
+code <- c("x <- 2:4", "y <- 3:1", "dput(x ^ y)")
+```
+
+...as a single string...
+
+```{r}
+code <- "
+    x <- 2:4
+    y <- 3:1
+    dput(x ^ y)
+"
+```
+
+...or as an unevaluated R expression (generated by the base R `quote` function).
+
+```{r}
+code <- quote({
+    x <- 2:4
+    y <- 3:1
+    dput(x ^ y)
+})
+out <- PEcAn.remote::remote.execute.R(code = code, host = my_host)
+out
+```
+
+Note that the `dput()` statement at the end is required to properly return output.
+
+For additional functions related to file operations and other stuff, see the `PEcAn.remote` package documentation.
+
+
+## Remote model execution with PEcAn
+
+The workhorse of remote model execution is the `PEcAn.remote::start.model.runs` function, which distributes execution of each run in a list of runs (e.g. multiple runs in an ensemble) to the local machine or a remote based on the configuration in the PEcAn settings.
+
+Broadly, there are three major types of model execution:
+
+- Serialized (`PEcAn.remote::start_serial`) -- This runs models one at a time, directly on the local machine or remote (i.e. same as calling the executables one at a time for each run).
+- Via a queue system, (`PEcAn.remote::start_qsub`) -- This uses a queue management system, such as SGE (e.g. `qsub`, `qstat`) found on the BU SCC machines, to submit jobs.
+  For computationally intensive tasks, this is the recommended way to go.
+- Via a model launcher script (`PEcAn.remote::setup_modellauncher`) -- This is a highly customizable approach where task submission is controlled by a user-provided script (`launcher.sh`).
+
+### XML configuration
+
+The relevant section of the PEcAn XML file is the `<host>` block.
+Here is a minimal example from one of my recent runs:
+
+```
+<host>
+    <name>geo.bu.edu</name>
+    <user>ashiklom</user>
+    <tunnel>/home/carya/output//PEcAn_99000000008/tunnel/tunnel</tunnel>
+</host>
+```
+
+Breaking this down:
+
+- `name` -- The hostname of the machine where the runs will be performed. 
+  Set it to `localhost` to run on the local machine.
+- `user` -- Your username on the remote machine (note that this may be different from the username on your local machine).
+- `tunnel` -- This is the tunnel file for the connection used by all remote execution files. 
+  The tunnel is created automatically by the web interface, but must be created by the user for command line execution.
+
+This configuration will run in serialized mode.
+To use `qsub`, the configuration is slightly more involved:
+
+```
+<host>
+  <name>geo.bu.edu</name>
+  <user>ashiklom</user>
+  <qsub>qsub -V -N @NAME@ -o @STDOUT@ -e @STDERR@ -S /bin/bash</qsub>
+  <qsub.jobid>Your job ([0-9]+) .*</qsub.jobid>
+  <qstat>qstat -j @JOBID@ || echo DONE</qstat>
+  <tunnel>/home/carya/output//PEcAn_99000000008/tunnel/tunnel</tunnel>
+</host>
+```
+
+The additional fields are as follows:
+
+- `qsub` -- The command used to submit jobs to the queue system.
+  Despite the name, this can be any command used for any queue system.
+  The following variables are available to be set here:
+  - `@NAME@` -- Job name to display
+  - `@STDOUT@` -- File to which `stdout` will be redirected
+  - `@STDERR@` -- File to which `stderr` will be redirected
+- `qsub.jobid` -- A regular expression, from which the job ID will be determined.
+  This string will be parsed by R as `jobid <- gsub(qsub.jobid, "\\1", output)` -- note that the first pattern match is taken as the job ID.
+- `qstat` -- The command used to check the status of a job.
+  Internally, PEcAn will look for the `DONE` string at the end, so a structure like `<some command indicating if any jobs are still running> || echo DONE` is required.
+  The `@JOBID@` here is the job ID determined from the `qsub.jobid` parsing.
+
+Documentation for using the model launcher is currently unavailable.