Developing an LLM: Building, Training, Finetuning (A 1h Video Explainer)

htrp · on June 14, 2024

Not Sebastian (who I assume is the OP), but his blog/substack is also a great resource

https://magazine.sebastianraschka.com/

rasbt · on June 14, 2024

thanks for mentioning, that makes me super happy to hear!

mdp2021 · on June 14, 2024

Seems very good, thank you.

The channel: https://www.youtube.com/@SebastianRaschka/videos

contains hundreds of video lessons, originally seemingly originating from Sebastian Raschka teaching at Wisconsin-Madison Uni (before he went full-time entrepreneur).

rasbt · on June 14, 2024

Thanks, glad that this is helpful!

yoouareperfect · on June 14, 2024

Is anyone training LLMs outside of Meta, OpenAI, etc... ?

I don't much get the point. For huge models, it's impossible to outcompete them. For smaller models, isn't mistral or LLaMa good enough?

What are other startups finetuning LLMs for?

pcloadletter_ · on June 14, 2024

I find it can be nice to have an academic understanding of things you work with even if you don't have to develop it directly yourself.

rasbt · on June 14, 2024

Agreed, understanding how a method works and how it would be done helps with developing an intuition for its limitations -- what it can and what it can't do

objektif · on June 14, 2024

When the topic under discussion is incredibly complex that even researchers in mentioned companies do not understand. This is like saying lets learn how combustion inside airplane engines work to get a better understanding of what LLMs can do.

Is it not better to focus your limited time on things that you can understand?

rasbt · on June 14, 2024

I disagree here: Setting up a large-scale pretraining run is super complex if you have to manage your distributed computing platform, but looking at how the training data looks like and is fed into an LLM is not that complex. If you are developing a product based on or with LLMs, it's worth spending a few hours to understand it on the big-picture level. I mean, look at how many people are confused why LLMs a) hallucinate facts, b) sometimes copy text passages verbatim, c) why they probably shouldn't be used as scientific calculators etc. All that could be much more clear if you know how they are trained.

mdp2021 · on June 14, 2024

You are probably forgetting that LLMs are not a final "end-of-history" thing, but a stage that calls for improvement, completion etc.

rasbt · on June 14, 2024

I wouldn't pretrain from scratch, but continued pretraining is pretty popular for adapating LLMs to recent and/or custom data. (Sometimes this is referred to 'finetuning', however, not to be confused with 'instruction finetuning').

oneshtein · on June 14, 2024

Can someone train an AI to perform all that?