There is no actual distinction between the "real" thing and "mimicking".
The datasets behemoth LLMs are trained on include a lot of noise that derail progress. They also just contain a lot of irrelevant knowledge that the LLM has to learn or memorize so an obscene amount of parameters is required.
When you're not trying to teach a language model the sum total of human knowledge and you provide a high quality curated dataset, the scale barrier is much lower.
It has to learn the meaning of words, including implicit associations, and to do that it needs to see approximately all the English text ever. We don't know how to balance this with only feeding it useful knowledge.
It doesn't necessarily have to see approximately all the English text ever. Real people don't learn English like that, for example.
It's just that given what we know about neural networks, it's often easier and simpler and more effective to increase the amount of training data than to change anything else.
Yes, LLMs and human brains share at most some faint similarities.
Nevertheless, human feats can act as an existence proof of what is possible. Including of what might be possible for a neural network.
(I'm not sure whether a large language model necessarily needs to be a neural network in the sense of a bunch of linear transformations interleaved with some simple non-linear activation functions. But for the sake of strengthening your argument, let's assume that we are assuming this restrictive definition of LLM.)
Unfortunately models aren't always good at knowing what they don't know ("out of distribution data") so it could lead to confidently wrong answers if you leave something out.
And if you want it to be superhuman then you're by definition not capable of knowing what's important, I guess.
That’s what the fine tuning is about. It learns the language, concepts etc. from the main dataset and is then tweaked by continuing to train on a smaller, high quality, hand curated dataset. That’s how it learns to generate conversational responses by default instead of needing a complicated prompt.
The datasets behemoth LLMs are trained on include a lot of noise that derail progress. They also just contain a lot of irrelevant knowledge that the LLM has to learn or memorize so an obscene amount of parameters is required.
When you're not trying to teach a language model the sum total of human knowledge and you provide a high quality curated dataset, the scale barrier is much lower.
https://arxiv.org/abs/2305.07759