Cool, thanks again for your great post. Big fan of your articles.
> The functional graph becomes more complicated
Indeed. I once tried to come up with a "higher order function" that takes in a feedforward network and computes a separate function that computes the backward pass (like Theano's autodiff, but abstraction at the layer-level rather than ops).
Here's a diagram for a simple forward MLP (left to right), with the backward pass network below (right to left). I found this hard to work with, because of the explosion in the size of the computational graph when you try to decouple optimization / function. I notice something similar when trying to unroll a RNN visually across time.
http://imgur.com/ATNwknh
Let me know if this is way off base from what you were talking about in your post.
> The functional graph becomes more complicated
Indeed. I once tried to come up with a "higher order function" that takes in a feedforward network and computes a separate function that computes the backward pass (like Theano's autodiff, but abstraction at the layer-level rather than ops).
Here's a diagram for a simple forward MLP (left to right), with the backward pass network below (right to left). I found this hard to work with, because of the explosion in the size of the computational graph when you try to decouple optimization / function. I notice something similar when trying to unroll a RNN visually across time. http://imgur.com/ATNwknh
Let me know if this is way off base from what you were talking about in your post.