Why ChatGPT doesn't understand

Try using Bard, or bing chat. Both have connections to the internet, can look up information, and will cite their sources.

ChatGPT with plugins will do that and much more. For example, you could tell it, “Find me a vacation deal for someplace warm in January, but I don’t want to connect through Toronto Pearson because it’s a hell hole” and it will use Kayak, Travelocity and other services to find what you want. Doing that search yourself would be tedious as hell.

There are a lot of queries like that. Queries that require sub-queries, multiple queries, etc. Here’s one I did a while ago: Try to find the city in Canada which was the largest coal mining city. Bing got it right away. I then tried to do it the ‘traditiinal’ way, and it led me through a whole bunch of web sites on the history of Canadian coal mining, none of which really answered the question.

This is partially true, but more broadly the method of training the current slate of models is intensely expensive and computation-heavy, much moreso than responding to prompts is. There simply isn’t enough time or processors to continue training a model like this in real time on user prompts. That said, they are absolutely storing user prompts and model responses and using that to train further generations.

Well, the context is a form of ‘training’. It’s just not very large and only lasts for the session, then is reset. When models are ‘fine tuned’, it’s a similar process, but permanent. For example, you could fine-tune a model on all of your corporate operations manuals, then use it for checking ISO 9001 compliance.

The ‘training’ that is expensive for LLMs is the initial huge dataset that makes them them able to do general tasks, understand natural language, etc. From there, ‘fine tuning’ isn’t very expensive. And there’s an open spurce LLM out there now that is ChatGPT 3 level, and was fully trained for $30,000, on completely open source data sets. And Alpaca, an LLM that can run on a laptop, was trained with a different technique: they had GPT-4 train it. The result is a model with only 12B parameters yet functions almost as well as GPT 3.5.

True, the fine-tuning training is not as expensive as the initial training. It’s usually a trade-off, you make the model more specialized to the output you want but you’ll start to get responses that always tend toward the fine-tuning. Like, I’ve seen Stable Diffusion models trained to produce a very specific kind of painterly Anime art but they struggle to do anything other than that kind of image.

The “context” you get in the chat-like environment of ChatGPT is even simpler, all the previous interactions you have in that chat are fed back into GPT with every prompt, up to whatever the current token limit is. You can tell ChatGPT something and it will “remember” it only because it gets re-told everything again each time you submit a prompt.

Just like granny!