Skip to content

Commit

Permalink
Started vignette on techniques for computation and numerical accuracy…
Browse files Browse the repository at this point in the history
… in the package.
  • Loading branch information
Ian Taylor committed Feb 28, 2021
1 parent 92d27e8 commit fb9940c
Showing 1 changed file with 61 additions and 0 deletions.
61 changes: 61 additions & 0 deletions vignettes/computational-considerations.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: "Computational Considerations for Accuracy"
output:
rmarkdown::html_vignette:
toc: true
number_sections: true
vignette: >
%\VignetteIndexEntry{Computational Considerations for Accuracy}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

```{r setup}
library(gammacount)
```


# General Guidelines

In this package, the guiding principles for ensuring numerical accuracy of computations are
1. Offload as much as possible to base `R` functions such as `pgamma`.
2. Work with probabilities and densities in the log scale, and exponentiate as a last step if required.
3. When given the choice, work with the complement of a probability if the complement is smaller.

This vignette describes how these principles are applied to some specific computations in this package.

# Log-scale helper functions

This package contains two helper functions (not exported), `logsumexp` and `logdiffexp` which can perform operations without leaving the log scale:
\begin{align*}
\mathrm{logsumexp}(a, b) &= \log(e^a + e^b) \\
\mathrm{logdiffexp}(a, b) &= \log(e^a - e^b)
\end{align*}

Each works by factoring out the largest value ($\max(a, b)$ in the `logsumexp` case, and $a$ in the `logdiffexp` case).
\begin{align*}
\mathrm{logsumexp}(a, b) &= \max(a,b) + \log(1 + e^{-|a-b|}) \\
\mathrm{logdiffexp}(a, b) &= a + \log(1 - e^{b - a})
\end{align*}

In this form, `logsumexp` can take advantage of `R`'s `log1p` function, and `logdiffexp` uses either `log1p(-exp(b-a))` or `log(-expm1(b-a))` depending on the magnitude of $b-a$. Doing this minimizes the loss of accuracy relative to a naive implementation of this operation.

# Arrival Time Densities

Recall that the density of the $n^{th}$ arrival time after a random start time has the density
$$f_{\tau_n}(t) = Q(n\alpha, \alpha t) - Q((n-1)\alpha, \alpha t).$$
For $n \geq 2$, $\lim_{t\to 0} f_{\tau_n}(t) = \lim_{t\to \infty} f_{\tau_n}(t) = 0$. On the left tail this is because both $Q$ functions approach one, while on the right tail this is because both $Q$ functions approach zero.

On the log scale, finding the difference between two probabilities $p$ and $p+\epsilon$ is subject to less loss of accuracy when $p$ is small than large. So the density as written will be more accurate for large $t$ than for small $t$. To improve accuracy for small $t$, we notice that the density can also be written in terms of regularized lower incomplete gamma functions
$$f_{\tau_n}(t) = P((n-1)\alpha, \alpha t) - P(n\alpha, \alpha t).$$
This form of the density has the opposite behavior for small $t$, as both $P$ functions approach zero.

We choose the distribution's mode as a convenient cutoff point to switch between using each representation and use the $P$ form for $t$ less than the mode, and use the $Q$ form for $t$ greater than the mode.

0 comments on commit fb9940c

Please sign in to comment.