On the other hand, I used ChatGPT to try and find the name of a book I'd forgotten, and it straight up lied (listed the synopses of a few similar books in the genre, with the plot points I asked it to search for inserted in the middle of the synopsis).
I haven't tried it with code yet, but I have heard of similar things happening where it fabricates method signatures, packages to import, and so on, wholesale.
It's nice if it has helped you, but I remain distrustful of LLMs.
Yeah for searching movies, for example, it is terrible. I also don't really see how it is going to get much better than it is since it inevitably seems to try and steer its way to non-obscure stuff even when you are explicitly asking for something obscure.
Just for example. I was trying to remember a film that is basically a knock off of war games involving two teenagers using a scientists small helicopter to do stuff. I typed into Google "film with small helicopter" and it was not only the first result, it was in a little cut out section highlighted at the top. The film is called Defense Play and it has a grand total of 144 votes on IMDb. Incredibly obscure.
I thought it would be fun to ask ChatGPT roughly the same question and at first it just gave me a list of very popular movies with large helicopters. So I said, no this has a small helicopter or a toy helicopter in it. And it gave me a list of films with helicopters but three in Toy Soldiers, Small Soldiers (which probably does have a small helicopter in it) and GI Joe (which is based off a toy and has a helicopter probably).
No matter how much I insisted that this movie is obscure it can't quite figure out how to do anything about returning things that match something as vague as "obscurity" even though there are measures (e.g. IMBd votes) available that could reasonably give it such a notion. It probably does know about the movie, but whenever I ask about obscure movies while it often can give me a year and the first few actors in it it does seem to make up the plot even when asked to just paraphrase whatever plot synopsis it has available to it.
I do see it improving but I also wonder if a model which is heavily weighted towards guessing the next section of text will have difficulty with the unlikely scenario of someone being specifically interested in an obscure film a handful of people have ever bothered to watch. Seems really useful for searches where a person doesn't really know what they want, and very not useful in searches where the person knows they need a specific and obscure piece of information potentially.
Well I heard this argument on ThisWeekInTech recently, someone made a good point: we will learn that LLMs, just like search engines are NOT a repository of truth.
Do you expect Google to return only "truthful" and "factual" results? I hope not! Do you want a live in a one sided world where you can only hear things that have been vetted for approval by some government/org?
We are pretty used today that Google returns a bunch of stuff, and we pick the things we find useful and we have to chose ourself what is reliable sources and so on.
I think part of the distrust here may come from the fact that they design this as a human-like chat interface and make it seem like ChatGPT is a person. And that sets the wrong expectations. So I'd agree with your sentiment in that the providers of these LLMs need to stop trying to make them look like reasonable people.
Unfortunately right now it's really exciting tech and the media especially is guilty of passing it off for what it isn't and trying to get all the clicks with their doomsday articles about AI taking over all our jobs, or AI teachings us "false" things and whatnot when LLMs are really just an sort of multiplication matrix that can find data points in a huge data set and interpolate in between in a useful way.
I don't understand the insistence on anthropomorphzing ChatGPT. It didn't "lie", the algorithm just didn't translate the prompt you gave it to the result you desired. Doesn't that happen all the time with google searches also or with regular expression patterns that don't quite turn out to match what you wanted?
ChatGPT, Google's search engine, and a regular expressions algorithms are tools. Assessing them in turns of "trust" seems strange to me. I think it is better to think of them as being useful (or not) for particular types of problems.
This probably can be solved with better tooling. ChatGPT is probably "less confident" when it comes to these kind of false positives and could say so explicitly.
What I think is happening with people though, is there is a missing context. With a google search that doesn't return the expected results, you can clearly see it. The results are for unrelated things and you know you need to amend your approach to try something else. When you get linked to some kind of misinformation article ... people do complain and are upset. With ChatGPT you sometimes are missing the feedback to know that further work is required to get the desired outcome.
I don't disagree about context being helpful but I'm not going to get upset at ChatGPT as if a person had "lied" to me by withholding information that clearly would have helped. I have different expectations for a person than for an algorithm.
Because it's not a search engine, and it is lossy intentionally. Things that happen statistically more often occur in higher weights in the neural net, things occur less often can get lower weights.
I haven't tried it with code yet, but I have heard of similar things happening where it fabricates method signatures, packages to import, and so on, wholesale.
It's nice if it has helped you, but I remain distrustful of LLMs.