Hacker Timesnew | past | comments | ask | show | jobs | submit | JohannesVoderho's commentslogin

Original creator of the blog here. I originally wrote this for a course from my University and thought it was interesting so integrated it into my website. So i didn't want to "solve" anything. I just wanted to extend on what Andrej karpathy mentioned in one of his tweets: https://twitter.com/karpathy/status/1582807367988654081. I think this view on transformers is a very powerful one and explains in the best way why the transformer is build that way. This also allows us to reason about certain improvements for the transformer. This was mostly an educational blog


Original creator of the blog here. Yes, you are right. The title of the blog is a bit misleading. I originally wrote this for a course from my University and thought it was interesting so integrated it into my website. I didn't expect to get that much traffic on the blog and didn't intend to do any clickbaiting. I just wanted to extend on what Andrej karpathy mentioned in one of his tweets: https://twitter.com/karpathy/status/1582807367988654081. I think i also linked it in the blogpost. I think this view on transformers is a very powerful one and explains in the best way why the transformer is build that way. This also allows us to reason about certain improvements for the transformer. For example weight tying some layers to make it easier to map to algorithms that have a lot of repeated operation. The Goal of the blog was not to use the transformer as a general purpose differentiable computer but to convince you that the transformer already IS a general purpose differentiable computer. Maybe i will create another blog where i train the transformer on simple algorithmic problems like sorting or copying data, just like in the differentiable turing machine, or even show how you can hard code the transformer weights to solve them. What do you think would be a better title for the blog?


That makes sense. It definitely read like a university project to me.

One idea is "Transformers, analogous to general purpose computers" although that doesn't mention the meat of the article. "Transformers from scratch, a turing machine analogy" is another idea.

I'm not sure they're good titles but they don't imply that it's about using a transformer as a computer.

With the goal you state, I understand better how you came to the title.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: