Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
Developing an LLM: Building, Training, Finetuning (A 1h Video Explainer) (youtube.com)
43 points by rasbt on June 14, 2024 | hide | past | favorite | 12 comments


Not Sebastian (who I assume is the OP), but his blog/substack is also a great resource

https://magazine.sebastianraschka.com/


thanks for mentioning, that makes me super happy to hear!


Seems very good, thank you.

The channel: https://www.youtube.com/@SebastianRaschka/videos

contains hundreds of video lessons, originally seemingly originating from Sebastian Raschka teaching at Wisconsin-Madison Uni (before he went full-time entrepreneur).


Thanks, glad that this is helpful!


Is anyone training LLMs outside of Meta, OpenAI, etc... ?

I don't much get the point. For huge models, it's impossible to outcompete them. For smaller models, isn't mistral or LLaMa good enough?

What are other startups finetuning LLMs for?


I find it can be nice to have an academic understanding of things you work with even if you don't have to develop it directly yourself.


Agreed, understanding how a method works and how it would be done helps with developing an intuition for its limitations -- what it can and what it can't do


When the topic under discussion is incredibly complex that even researchers in mentioned companies do not understand. This is like saying lets learn how combustion inside airplane engines work to get a better understanding of what LLMs can do.

Is it not better to focus your limited time on things that you can understand?


I disagree here: Setting up a large-scale pretraining run is super complex if you have to manage your distributed computing platform, but looking at how the training data looks like and is fed into an LLM is not that complex. If you are developing a product based on or with LLMs, it's worth spending a few hours to understand it on the big-picture level. I mean, look at how many people are confused why LLMs a) hallucinate facts, b) sometimes copy text passages verbatim, c) why they probably shouldn't be used as scientific calculators etc. All that could be much more clear if you know how they are trained.


You are probably forgetting that LLMs are not a final "end-of-history" thing, but a stage that calls for improvement, completion etc.


I wouldn't pretrain from scratch, but continued pretraining is pretty popular for adapating LLMs to recent and/or custom data. (Sometimes this is referred to 'finetuning', however, not to be confused with 'instruction finetuning').


Can someone train an AI to perform all that?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: