Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Original creator of the blog here. I originally wrote this for a course from my University and thought it was interesting so integrated it into my website. So i didn't want to "solve" anything. I just wanted to extend on what Andrej karpathy mentioned in one of his tweets: https://twitter.com/karpathy/status/1582807367988654081. I think this view on transformers is a very powerful one and explains in the best way why the transformer is build that way. This also allows us to reason about certain improvements for the transformer. This was mostly an educational blog


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: