Part of the reason why ChatGPT does this is that (at least in earlier versions) it does not perform any, or enough, look-ahead on the task. If you think about how a human would answer this question…
- First, silently inspect the word ‘strawberries’, letter by letter
- Count the instances of the letter R
- State the answer
What ChatGPT tends to do is more like ‘start talking and hope it all makes sense at the end’ - this works pretty well for conversational responses, but not always for responses that need a series of steps where each one is dependent on the previous. For the same reason, ChatGPT struggles with writing a correct and truthful sentence that ends with the words ‘and that’s why this sentence contains 26 words’ - it gets the word count wrong, and also has trouble making the first part of the sentence actually explain why there are 26 words:
Me: Please can you write a sentence that is truthful and logically self-consistent and ends with the phrase ‘and that’s why this sentence contains 26 words.’
ChatGPT: The Earth revolves around the Sun in a roughly elliptical orbit, influenced by gravitational forces, and that’s why this sentence contains 26 words.
Me: Couple of tiny problems… that sentence only contains 23 words and I’m not convinced that it’s true that the properties of the solar system are really the reason for the word count in the sentence.
ChatGPT: My apologies for the oversight. Here’s a corrected sentence:
Water boils at 100 degrees Celsius under standard atmospheric pressure, and that’s why this sentence contains 26 words.
Me: Hmmm. 18 words, and how did the boiling point of water cause the sentence to be the length it is? The sentence needs to explain WHY it has 26 words. Also it needs to contain 26 words.
ChatGPT: Apologies for the oversight. Here’s a revised sentence:
The Fibonacci sequence, a series of numbers where each is the sum of the two preceding ones, correlates with patterns in nature, and that’s why this sentence contains 26 words.
Me: That’s 38 words, and I’d be a little more convinced if 26 was in the Fibonacci sequence (it’s not)
ChatGPT: My apologies for the mistake. Here’s another attempt:
This sentence was deliberately crafted with 26 words to fulfill a linguistic challenge, demonstrating the flexibility and precision of language, and that’s why this sentence contains 26 words.
Me: That’s more like it, but it’s two words over length.
ChatGPT: I appreciate your patience. Here’s a revised version:
Crafting a sentence with exactly 26 words showcases the intricacies of language, fulfills a specific challenge, and highlights the adaptability of expression, and that’s why this sentence contains 26 words.
Me: Good in terms of consistency, but now it’s 30 words long - can you shorten it to 26 words (but still make it coherent)
ChatGPT: Creating a precise 26-word sentence demonstrates language’s nuances, fulfills a unique challenge, and underscores its adaptability, showcasing the intricacies of expression. And that’s why this sentence contains 26 words.
Me: That’s 29 words. Can you shorten it to 26?
ChatGPT: Crafting a 26-word sentence showcases language nuances, fulfills a unique challenge, and underscores adaptability, highlighting expression intricacies. That’s why this sentence contains 26 words.
Me: Oof. Now that’s 24 words. Can you just slip in two extra words somewhere?
ChatGPT: Crafting precisely a 26-word sentence skillfully showcases language nuances, fulfills a unique challenge, and underscores adaptability, thereby highlighting expression intricacies. That’s why this sentence intentionally contains 26 words.
Me: That’s 28 words. Can you remove two superfluous words?
ChatGPT: Crafting a 26-word sentence skillfully showcases language nuances, fulfills a unique challenge, and underscores adaptability, highlighting expression intricacies. That’s why this sentence intentionally contains 26 words.
Me: Hoorah! You did it!
I think it got there mostly by luck (although maybe also because it does still remember the earlier parts of the conversation when it’s writing the latter parts, so that might have helped).
Humans solve this sort of problem by breaking it into pieces and solving the pieces, revising earlier tentative answers, then composing the answer; ChatGPT sort of jumps straight to the ‘composing the answer’ thing.
This is one area where we can imagine the difference between human-level intelligence and a hypothetical super-human intelligence. Perhaps a super-human intelligence would look at this problem and say "What do you mean, ‘break it into pieces’? - The solution is just obvious in one step’ (and be able to do that in one step, perhaps because the pieces just naturally get processed in parallel in this hypothetical super-human mind).