Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation on Running Simple Program using AD #24

Open
barak opened this issue Jan 30, 2018 · 10 comments
Open

Documentation on Running Simple Program using AD #24

barak opened this issue Jan 30, 2018 · 10 comments

Comments

@barak
Copy link

barak commented Jan 30, 2018

I'm trying to play with the AD facilities, so decided to start with a minimal example.

import ConCat.ADFun

d :: (Double -> a) -> Double -> a
d f x = derF f x 1

main = do
  print $ d id 0                     -- should be 1
  print $ d (\x -> x*(d (x*) 2)) 1   -- should be 2

Placing this as nest.hs in the top-level concat directory and running stack ghc nest.hs generates nest but

$ ./nest
nest: Oops: toCcc' called

I tried to turn on options etc, see below, to get this migrated to compile-time, but must not have tickled the right ones. This is a beautiful system, and I'd like to play with it using the nice exported API without delving down into the internals. A few tiny examples of how to do this would really help.

{-# OPTIONS_GHC -fplugin=ConCat.Plugin #-}
{-# OPTIONS_GHC -Wall #-}
{-# OPTIONS_GHC -O2 #-}

import ConCat.ADFun

-- Define a restricted form of the d operator.
d :: (Double -> a) -> Double -> a
d f x = derF f x 1
{-# INLINE d #-}

unnested :: Double
unnested = d id 0               -- should be 1
{-# INLINE unnested #-}

nested :: Double
nested = d (\x -> x*(d (x*) 2)) 1 -- should be 2
{-# INLINE nested #-}

main :: IO ()
main = do
  putStr "unnested, should be 1: "
  print unnested
  putStr "nested, should be 2: "
  print nested
@cfhammill
Copy link

@barak I tried your example as well, the unnested case works fine, the nested case doesn't, I'm getting ghc panic due to boxing. This is true even for simpler examples like d (*2) 1. I wonder if this is related to #8 and #9.

@TimPut
Copy link

TimPut commented Jun 28, 2019

As I see it the immediate reason why this fails is that d needs to be inlined for the plugin to be able to work its magic. However a function can only be inlined if it's (syntactically) fully applied. In its current form, d when fully applied returns a value in the codomain of a differentiable function, but not in general a differentiable function.

I suspect the real reason why this isn't easy to fix is that the category of differentiable functions is not equal to the category of twice-differentiable functions.

@conal
Copy link
Collaborator

conal commented Jun 28, 2019

I remember thinking that the issue here was GHC applying rewrite rules and other optimizations in an unfortunate order. Although one can probably define a category of infinitely differentiable functions (lazily generating as many derivatives as you might want), another option (used above, IIRC) is to differentiate, extract a regular function, and differentiate again. The plugin would then perform two translations from GHC Core to categorical form. I do such cascading translations often, for instance to differentiate and then generate a circuit.

Now I'm wondering if @TimPut is onto a simpler explanation and solution. GHC inlining, however, does not require syntactic full application. Rather there have to be as many explicit arguments as there were parameters on the LHS of a definition. The important difference here is that we can encourage d to inline by moving one or more parameters from LHS to RHS:

d :: (Double -> a) -> Double -> a
d f = \ x -> derF f x 1
{-# INLINE d #-}

@barak
Copy link
Author

barak commented Jul 16, 2019

That's funny. (Logically speaking, the information on the number of parameters that must be supplied in order to trigger inlining belongs in the INLINE directive rather than buried in the parameter count on the LHS of the definition. But I digress.) Will give it a shot.

I was trying to test whether nesting can be expressed, and if so whether it works correctly. For the case here, that pretty much comes down to handing variable scoping correctly. But if differentiation of higher-order functions is supported, there's a trickier case of nesting described in https://arxiv.org/abs/1211.4892

@conal
Copy link
Collaborator

conal commented Jul 16, 2019

I was trying to test whether nesting can be expressed, and if so whether it works correctly. For the case here, that pretty much comes down to handing variable scoping correctly.

All variables are eliminated during the translation to categorical form (described here), and that translation is independent of AD.

I don't think the perturbation confusion issue arises with the categorical approach. The extended essence-of-ad paper contains correctness proofs, which fortunately are fairly simple, since differentiation (really local affine approximation) is compositional over the categorical vocabulary in very simple ways, unlike with the lambda calculus. I'd be interested in exploring the question together, hopefully resolving it to our mutual satisfaction.

But if differentiation of higher-order functions is supported, there's a trickier case of nesting described ....

Thanks for the warning. Which example in the paper? Differentiation and higher-order functions seems to be a tricky combination for the categorical approach as well.

@barak
Copy link
Author

barak commented Jul 17, 2019

Let me explain the simple meat of arXiv:1211.4892. Start with a derivative-taking operator d, restricted to taking a scalar input for clarity. Define a shift function s. Then take the derivative of s and use it in a nested fashion:

d (f :: R -> a) (x :: R) = (derivative of f at x) :: (tangent space of a at f(x))

s (u :: R) (f :: R -> a) (x :: R) = f (x + u)

dHat = d s 0

The functions d and dHat should behave identically. So check if d (d h) y and dHat (dHat h) y yield the same result.

Note that d can take the derivative of higher-order functions. Without that ability, this issue cannot be expressed.

@barak
Copy link
Author

barak commented Jul 17, 2019

differentiation (really local affine approximation) is compositional over the categorical vocabulary in very simple ways

Yes, that is a very nice property. One issue is that even if a defined function is first-order, it may internally use higher-order constructs.

f ys y = foldl (+) 0 (map (^2) (map (y -) ys))

If we have a compositional formulation, and we push it into code like this, we're forced to define derivatives of higher-order functions in a fashion consonant with our first-order derivatives.

I'd very much welcome the chance to work through some of these issues with you. I feel like we each have a good handle on one appendage of the elephant, and if we pull together we might be able to really take it apart. Should we take the discussion to email?

@TimPut
Copy link

TimPut commented Jul 18, 2019

@barak
I for one would appreciate it if the discussion stayed out in the open simply so that I can follow along and learn from you and Conal. Please consider this an expression of appreciation and not a demand.

@barak
Copy link
Author

barak commented Jul 18, 2019

@TimPut I'm fine with that, until I start embarrassing myself!

@silky
Copy link

silky commented Apr 14, 2020

none of these examples work, presently.

> stack ghc nest.hs && ./nest
[1 of 1] Compiling Main             ( nest.hs, nest.o )
Linking nest ...
nest: Oops: toCcc' called

with

-- nest.hs
import ConCat.ADFun


d :: (Double -> a) -> Double -> a
d f = \x -> derF f x 1
{-# INLINE d #-}


main = do
  print $ d id 0                     -- should be 1
  print $ d (\x -> x*(d (x*) 2)) 1   -- should be 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants