I don't know about that. LLMs have been trained mostly on text. If you add photos, audio and videos, and later even 3D games, or 3D videos, you get massively more data than the old plain text. Maybe by many orders of magnitude. And this is certainly that can improve cognition in general. Getting to AGI without audio and video, and 3D perception seems like a non-starter. And even if we think AGI is not the goal, further improvements from these new training datasets are certainly conceivable.
That's been done already for years. OpenAI were training on bulk AI transcribed YouTube vids already in the GPT-4 era. Modern models are all multi-modal and cotrained on audio and image tokens together with text.
The AI companies are not only out of such data but their access to it is shrinking as the people who control the hosting sites wall them off (like YouTube).
Also, even if we lacked the data to proceed with Chinchilla-optimal scaling that wouldn't be the same as being unable to proceed with scaling, it would just require larger models and more flops than we would prefer.
I don't know about that. LLMs have been trained mostly on text. If you add photos, audio and videos, and later even 3D games, or 3D videos, you get massively more data than the old plain text. Maybe by many orders of magnitude. And this is certainly that can improve cognition in general. Getting to AGI without audio and video, and 3D perception seems like a non-starter. And even if we think AGI is not the goal, further improvements from these new training datasets are certainly conceivable.