-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: Backward propagation #23
Comments
Thanks for the suggestion. A few thoughts:
|
Separate responses
Sorry, I accidentally the 'close and comment' button when writing 😉 |
You can circumvent this by performing (reverse, in this case) AD on an AST and interpreting/compiling this AST to efficient implementations of the AST's operations (e.g BLAS & friends on CPU, CUBLAS on GPU). |
Clarification: Say you want to map |
Thanks for the replies.
Efficient evaluation of gradients and jacobians is important to me as well. By my "current one", do you mean
Just as the derivative of a function gives local linear approximations, I was guessing that you had in mind constructing local linear approximations of the inverses of given functions---equivalently, inverses of local linear approximations of the original functions.
I wouldn't give up too easily, and I wouldn't explore the question in terms of mutation. One of the benefits of compiling to categories (CtC) over shallow and deep embeddings is that sharing is easily observed (as mentioned in section 13 of the CtC paper). Given take your |
I think we could indeed use concat/CtC to generate code that works well with optimized linear algebra libraries, whether the generated code is Haskell, C, CUDA, etc. |
Agreed. A dense "matrix" (composed functor) representation would be mostly zero. With the function representation in |
@dramforever Getting to what I think is the heart of your suggestion, do you have a specification (correctness definition) in mind for conversion from As an example of what I mean by a specification, consider mapping newtype AD a b = AD (a -> b :* (a :-* b))
ad :: (a -> b) -> AD a b
ad f == AD (f &&& deriv f) Now require that |
Oh, wait a bit. By 'cartesian' you mean products? In that case we can just define I hope to get a chance to get down to the code some time... |
The idea is: instead of storing the jacobian I haven't really thought about the 'inverse' thing, but I don't think that's what I have in mind. |
My bad; cartesian means 'has products'. In that case it's easy to make it cartesian as described above Do you have any idea on having observable sharing in a cartesian closed category (w/ exponentials)? I feel like I can't. |
Yes, products. In concat, you'd define an instance of I encourage you to give a specification so that you can prove or derive/calculate your definitions (for all instances, not just |
In any case, your
In a purely denotative/functional setting, sharing is deeply problematic at run time but trivially easy at compile time. CtC happens at compile time. In the categorical formulation, sharing is especially simple, because it's exactly captured in |
Okay! Now if we don't need The specific version is:
where For example
|
Okay. Maybe we're getting closer. Can you make this specification more precise? And as a transformation on functions rather than (what I guess is) some sort of syntactic construct (a lambda expression)? For instance, a derivative of a function (not syntax) of type It's unclear to me what operation you mean by " Maybe what you're getting at has to do with the inverse of the derivative of the given function at a given point, something like bp f = BP (f &&& inverse . deriv f) Of course, inverses are often undefined, so maybe this isn't it. I still think it's about duality, starting with the usual coproduct for vector spaces (direct sum, represented by a cartesian product) and then using the opposite category, reversing arrows and swapping products and coproducts. The current concat implementation doesn't have per-category associated products, coproducts, exponentials, booleans, etc, so I can't yet express this construction with the elegance and generality I'd like, but we'll get there. |
Okay. Maybe we're getting closer. Can you make this specification more precise? And as a transformation on functions rather than (what I guess is) some sort of syntactic construct (a lambda expression)? For instance, a derivative of a function (not syntax) of type It's unclear to me what operation you mean by " Maybe what you're getting at has to do with the inverse of the derivative of the given function at a given point, something like bp f = BP (f &&& inverse . deriv f) Of course, inverses are often undefined, so maybe this isn't it. Rather, it seems to be about duality, starting with the usual coproduct for vector spaces (direct sum, represented by a cartesian product) and then using the opposite category, reversing arrows and swapping products and coproducts. The current concat implementation doesn't have per-category associated products, coproducts, exponentials, booleans, etc, so I can't yet express this construction with the elegance and generality I'd like, but we'll get there. newtype BP s a b = BP (a -> (b, ((b :-* s) -> (a :-* s)))) where |
I wouldn't say it's an inverse. I think ‘dual' is more appropriate. I used lambda's as I was expressing This is getting tricky I admit. Do you understand my example?
Let's write Say we have Or maybe I could relate my approach to yours. Now suppose Phew... |
Observation: Write |
Somewhat related paper: Backprop as Functor: A compositional perspective on supervised learning |
Note that matrix "multiplication" is just an implementation of linear map composition, so the property is that the derivative of the composition is the composition of the derivatives. Almost. Really it's |
I've long thought that forward mode and reverse modes of AD (FAD & RAD) are both special cases of the same algorithm (the one in the CtC paper and this repo) corresponding to right- and left-associating function compositions, respectively. Then this discussion suggested to me that maybe RAD is about duality/transposition. Now I see that they're the same thing. Thanks for the stimulating conversation, @dramforever! |
I'm glad that we were able to figure it out. And by the way, do you (and does anybody else) think that we can compute, say, the hessian? I've been trying to devise a method but had no luck. |
@dramforever Check out the talk The simple essence of automatic differentiation, to see where I went with this exploration. I think the Hessian can be done simply enough, but it's not a gradient computation, so it's not clear to me whether to use forward or reverse or mixed mode. Might be a good application of the optimal re-association mentioned on slide 22. |
I'm very sorry for failing to acknowledge you in the camera-ready version that appears in the ICFP proceedings. Authors were asked to exclude acknowledgments in paper submissions in order to support the anonymous review process. For most of the time between paper acceptance and submission of the final camera-ready version, I was away from home helping to deal with a family medical crisis. During this period, my concentration and clarity of thought were much reduced, and I failed to correct the omission. I just added an acknowledgment on the paper's web page. I would be happy to replace your github handle with your IRL name if you want to share it with me, or I can leave the handle as is. When I hear your preference, I'll also update the extended version of the paper on arXiv. Again, I'm grateful for this conversation, which spurred me to think more carefully about reverse-mode automatic differentiation (RAD) and its specialization to scalar-valued functions (as in backpropagation). The essential insights turned out to be a pair of isomorphisms for linear maps: (a) the Yoneda embedding and (b) vector space duality (relating a vector space to the space of linear maps from that space to its own scalar field). The Yoneda embedding ( |
Thank you for the acknowledgement. As for my name, you can refer to me using my real name To be honest, I had not expected the idea to come this far. Without you and this communication, the idea would have remained just 'I thought of it on the subway and it's cool'. So I would also like to thank you for taking the time to polish this immature idea, and going so far as to fitting it neatly within the larger picture of categorical constructions. |
Also, I was dealing with other things at the time of most of this thread, so I did not really went down and put it into code form. And looking back, I certainly did not communicate what I was thinking well. I was certainly expecting this to be forgotten, but you took it seriously. So thanks a lot for your patience and interest. |
Thanks for sharing your name. I've updated the paper's web page page and added the acknowledgement to the extended version of the paper on arXiv. I suspect that there are many more variations of AD to be found. Best wishes! |
That's nice. Thanks! |
We can hopefully get backward propagation using
We can add that to the two existing ADs. I'm not sure how to make it
Closed
though; it seems that you can't.The text was updated successfully, but these errors were encountered: