> Is the number of parameters to be read as the indicator of how "advanced" the ...

> Is the number of parameters to be read as the indicator of how "advanced" the training has gotten, or the accuracy of the output?

Accuracy, of course.

> As in, this dataset/training has gotten to the point that it understands the 160 billionth small exception to the general rules of how language should be interpreted, or constructed, to be considered believable?

It memorized a lot of facts, but it is also better at figuring out rules than its predecessor.

> Sometimes (as a layman) I look at this and think instead, wow, how slow these ML algorithms must be that they need 160 billion parameters to predict correctly.

There are more specialized models which are trained on much smaller datasets. They usually are given a specific task, such as classification. GPT-3 is trained on a very large dataset in unsupervised way. And as a result, it is able to handle a very wide variety of tasks (without re-training). If you tell it to do math, it will do math. If you tell it to translate between different languages, it will do translation. If you tell it to write JS code, it will write JS code. If you ask it to write a Harry Potter parody as if it was written by Hemingway, it will do that.

So the whole point is that it can do pretty much any imaginable task involving text given only few examples, with no specific training.