Started vignette on techniques for computation and numerical accuracy…

… in the package.
ianmtaylor1 · Feb 28, 2021 · fb9940c · fb9940c
1 parent 92d27e8
commit fb9940c
Showing 1 changed file with 61 additions and 0 deletions.
diff --git a/vignettes/computational-considerations.Rmd b/vignettes/computational-considerations.Rmd
@@ -0,0 +1,61 @@
+---
+title: "Computational Considerations for Accuracy"
+output: 
+  rmarkdown::html_vignette:
+    toc: true
+    number_sections: true
+vignette: >
+  %\VignetteIndexEntry{Computational Considerations for Accuracy}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+```{r setup}
+library(gammacount)
+```
+
+
+# General Guidelines
+
+In this package, the guiding principles for ensuring numerical accuracy of computations are
+1. Offload as much as possible to base `R` functions such as `pgamma`.
+2. Work with probabilities and densities in the log scale, and exponentiate as a last step if required.
+3. When given the choice, work with the complement of a probability if the complement is smaller.
+
+This vignette describes how these principles are applied to some specific computations in this package.
+
+# Log-scale helper functions
+
+This package contains two helper functions (not exported), `logsumexp` and `logdiffexp` which can perform operations without leaving the log scale:
+\begin{align*}
+\mathrm{logsumexp}(a, b) &= \log(e^a + e^b) \\
+\mathrm{logdiffexp}(a, b) &= \log(e^a - e^b)
+\end{align*}
+
+Each works by factoring out the largest value ($\max(a, b)$ in the `logsumexp` case, and $a$ in the `logdiffexp` case).
+\begin{align*}
+\mathrm{logsumexp}(a, b) &= \max(a,b) + \log(1 + e^{-|a-b|}) \\
+\mathrm{logdiffexp}(a, b) &= a + \log(1 - e^{b - a})
+\end{align*}
+
+In this form, `logsumexp` can take advantage of `R`'s `log1p` function, and `logdiffexp` uses either `log1p(-exp(b-a))` or `log(-expm1(b-a))` depending on the magnitude of $b-a$. Doing this minimizes the loss of accuracy relative to a naive implementation of this operation.
+
+# Arrival Time Densities
+
+Recall that the density of the $n^{th}$ arrival time after a random start time has the density
+$$f_{\tau_n}(t) = Q(n\alpha, \alpha t) - Q((n-1)\alpha, \alpha t).$$
+For $n \geq 2$, $\lim_{t\to 0} f_{\tau_n}(t) = \lim_{t\to \infty} f_{\tau_n}(t) = 0$. On the left tail this is because both $Q$ functions approach one, while on the right tail this is because both $Q$ functions approach zero.
+
+On the log scale, finding the difference between two probabilities $p$ and $p+\epsilon$ is subject to less loss of accuracy when $p$ is small than large. So the density as written will be more accurate for large $t$ than for small $t$. To improve accuracy for small $t$, we notice that the density can also be written in terms of regularized lower incomplete gamma functions
+$$f_{\tau_n}(t) = P((n-1)\alpha, \alpha t) - P(n\alpha, \alpha t).$$
+This form of the density has the opposite behavior for small $t$, as both $P$ functions approach zero.
+
+We choose the distribution's mode as a convenient cutoff point to switch between using each representation and use the $P$ form for $t$ less than the mode, and use the $Q$ form for $t$ greater than the mode.
+