Stump the AI

Arjuna34 · July 6, 2025, 2:05am

I enabled “think”, “deep think”, etc. for the models that had it. I didn’t want to bias them by altering the prompt DMC gave though - I pasted it as-is. You can see the chatgpt chain of thought if you click the link.

ETA: part of the fun isn’t nudging the model to get the right answer, but to see if it can get it without knowing how to nudge, or what to direction to nudge it to.

Whack-a-Mole · July 6, 2025, 2:06am

Asking it to be thorough is not altering the query. Asking it to show its work is not altering the query.

ETA: None of the above is nudging anything. It is asking the AI to do its best and show its work (be more verbose in its response).

ETA2: The second question is nudging fer sure but we show the initial response as is. Then we see if we can use that to get to a better answer and the AI will change. I think everyone gets that…I have shown all first responses as given…right or wrong.

DMC · July 6, 2025, 2:12am

While I agree with this statement, I’m not sure why it’s directed at me.

scudsucker · July 6, 2025, 2:15am

I’d probably stump the AI by choosing the goat; I already have a car. But I would really like a goat.

Whack-a-Mole · July 6, 2025, 2:15am

Sorry…fixed. The board is being flaky with who I am quoting for some reason. May be my computer.

Apologies.

LSLGuy · July 6, 2025, 2:17am

If you’re small enough and the goat is big enough you can not only feed the goat at no cost off scrub growing wild on your land, but also ride it into town. So once you have the goat you can sell the car. And stop paying for gasoline.

Now that’s a Big win-win-win!

DMC · July 6, 2025, 2:20am

Just one problem with that, and why overfitting gave the wrong answer. I never mentioned a “goat”, nor the host opening any door for me, which is why my question was not the Monty Hall Problem.

DMC · July 6, 2025, 2:25am

Another one that stumps Gemini for some reason:

In a list of 10 items, how many commas would there be if I did not use Oxford commas.

Here is the reasoning I was given:

9 commas.

Explanation:

Rule: When listing three or more items in a series, you use commas to separate each item.

No Oxford Comma: Omitting the Oxford comma means you do not place a comma before the conjunction (usually “and” or “or”) that precedes the last item in the list.

Example:

If your list was: item1, item2, item3, item4, item5, item6, item7, item8, item9 and item10.

You would place a comma after each item except the last one before the conjunction, resulting in 9 commas.

I love that it spells out the list, but doesn’t simply count the commas. This is one of the many types of “thinking” that separates computers from humans. We’d simply double check our work if we took the time to write it out like that. "R"s in strawberry and all that.

Tibby · July 6, 2025, 2:36am

ChatGPT o3 got Peter Cannon with just your original prompt:

It was Peter Cannon, Thunderbolt—Charlton Comics’ white-gi-clad, Tibetan-trained hero.
In story after story (starting with Thunderbolt #1 in 1966 and popping up many times afterward) Peter would trek back to Tibet’s mystic Cave of the Peaks—billed on the splash pages as “Where All Things Exist!” to meditate, consult ancient scrolls, or scoop up a handy batch of miracle herbs for his ailing pal Tabu.

Whack-a-Mole · July 6, 2025, 3:05am

I asked Claude Opus:

Let me consider what we mean by “a list of 10 items” and how commas are used:

Standard list format without Oxford comma:
When listing items in a sentence without the Oxford comma, we place commas between all items except:

Before the final “and” (or “or”)
After the last item

So for 10 items, the pattern would be:
Item1, Item2, Item3, Item4, Item5, Item6, Item7, Item8, Item9 and Item10

Counting the commas: 1, 2, 3, 4, 5, 6, 7, 8 → 8 commas

But wait, let me consider other possibilities:

What if it’s a vertical list?
- Item 1
- Item 2
- etc.
This would have 0 commas (unless items themselves contain commas)
What if items contain internal commas?
For example: “Paris, France” counts as one item but contains a comma. The question asks about commas in the list structure, not within items, so I’ll assume we’re counting structural commas only.
What if it’s a different type of list?
- Semicolon-separated list (for complex items): 0 commas in the list structure
- Parenthetical list: (1) item (2) item… → 0 commas

Given that you asked specifically about “not using Oxford commas,” this strongly suggests we’re talking about a standard comma-separated list in a sentence, where the Oxford comma question is relevant.

Therefore, my answer is 8 commas - one between each pair of adjacent items, except between the 9th and 10th items where “and” or “or” would appear instead.

DMC · July 6, 2025, 3:08am

Did you happen to try my other overfitting example:

Is .9999998 recurring equal to 1?

Whack-a-Mole · July 6, 2025, 3:12am

I asked Claude AI this query and added (since its been done) to parse carefully:

Let me identify what’s explicitly stated:

There’s a game show
There’s a new car behind ONE door
There are “terrible prizes” behind the OTHER doors (plural)
I pick door number 1
The host “offers door 2”

Critical ambiguity: What does “offers door 2” mean?

This is NOT the standard Monty Hall problem statement. In the classic version:

You pick a door
The host OPENS a losing door (revealing a bad prize)
The host offers you the chance to SWITCH to the remaining unopened door

But here, the host simply “offers door 2” - this could mean:

The host is offering to let me switch to door 2
The host is offering to open/reveal door 2
The host is suggesting door 2 as an alternative

Also unclear:

How many total doors are there? (The plural “other doors” suggests at least 3 total)
Does the host know where the car is?
What are the rules governing the host’s behavior?
Has any door been opened/revealed yet?

Without these crucial details, I cannot determine if switching is advantageous.

In the classic Monty Hall problem with 3 doors, switching doubles your odds (from 1/3 to 2/3) because the host’s action of revealing a losing door provides information. But your scenario lacks the key element of a door being revealed.

Possible interpretations:

If the host is just offering a switch with no door revealed: Without additional information, switching from door 1 to door 2 provides no advantage - both have equal probability of containing the car.
If this is meant to be Monty Hall but poorly stated: Then yes, switching would be advantageous.
If the host’s offer itself contains information (e.g., the host has a bias or strategy): Then we’d need to know the host’s behavior rules.

The answer depends entirely on details not provided in your question. The most literal reading suggests no advantage to switching, since no information has been revealed to update the probabilities.

Whack-a-Mole · July 6, 2025, 3:14am

Claude’s answer:

You wrote “.9999998 recurring” - I need to determine exactly what’s recurring here.

Possible interpretations:

0.9999998̄ (the entire digit sequence “9999998” repeats)
= 0.9999998999999899999989999998…
0.9̄9̄9̄9̄9̄9̄8̄ (each digit repeats independently)
This notation doesn’t make mathematical sense
0.999999̄ (only the last digit 8 repeats)
= 0.99999988888…
0.9̄ (only the first 9 repeats, the 999998 was extra)
= 0.99999… = 1

Let me analyze the most likely interpretation: 0.9999998̄ where “9999998” repeats.

Method 1: Algebraic approach
Let x = 0.9999998999999899999989999998…

Multiply by 10^7 (since the repeating block has 7 digits):
10^7 × x = 9999998.9999998999999899999998…

Subtract the original:
10^7 × x - x = 9999998.9999998̄ - 0.9999998̄
(10^7 - 1) × x = 9999998
9999999 × x = 9999998
x = 9999998/9999999

Simplifying this fraction:
9999998/9999999 = (9999999 - 1)/9999999 = 1 - 1/9999999

Since 1/9999999 > 0, we have:
0.9999998̄ < 1

Therefore, 0.9999998̄ is NOT equal to 1.

Key insight:

0.9̄ (0.9999…) = 1 ✓
0.9999998̄ < 1 ✗

The difference is that 0.9̄ has ALL 9s recurring forever, while 0.9999998̄ has a “deficient” digit (8) that appears infinitely often in the decimal expansion, preventing it from reaching 1.

The exact value is 9999998/9999999, which is approximately 0.9999999000000099999990000001…

DMC · July 6, 2025, 3:17am

It looks like the newer versions of many of the engines are starting to get better about overfitting. I tried several others that also got it right. So far only DeepMind failed this one.

Whack-a-Mole · July 6, 2025, 3:26am

I wonder, if in an effort to ease CPU loads, the developers have the AI take some shortcuts. Same as we do when we see the Monty Hall Problem but really isn’t. Once told, look at this a bit more carefully, the AI nails it. But without that extra command to be careful the AI will slack off a bit (same as most people do too).

An interesting insight. I am learning how to make better queries because of this.

DMC · July 6, 2025, 3:39am

One of the common ways of reducing overfitting of an LLM is taking shortcuts, but not with finding the answer. The shortcut is in the training portion and is called Early Stopping.

LSLGuy · July 6, 2025, 9:52am

At this point, I’m wondering if the answer to common queries is simply being cached?

IOW when it identifies this is (or is close to) the Monty Hall problem, it simply spits out the canned answer (Either Monty Hall or not-quite Monty Hall) that was iteratively honed to perfection by many earlier queries, then cached as “the answer” based on sufficient positive user feedback.

Hari_Seldon · July 6, 2025, 6:08pm

Oopsie. Yes 8^3 = 512, not 256 as I thought. Sorry. But that is the answer, Sum of the cubes of the digits. So the sequence continues after 1458 as
702, 351, 153, 153,…

I chose 15 as a starting point at random. Wherever you start you have to end at a cycle (which might be of length 1, that is a fixed point).

Hari_Seldon · July 6, 2025, 6:13pm

This is so obviously correct that I do not understand how anyone can doubt it. Even Paul Erdös didn’t believe, but why?

Arjuna34 · July 6, 2025, 6:55pm

That’s interesting that it got the answer right despite the mistake in the input.

Topic		Replies	Views
Maralyn Vos Savant Factual Questions	39	2891	June 3, 2000
The Bricker Challenge #3 Miscellaneous and Personal Stuff I Must Share	45	1323	February 1, 2000
The Bricker Challenge, Series Q Miscellaneous and Personal Stuff I Must Share	81	2582	January 3, 2001
Nagging thoughts that keep Dr_P up at night Miscellaneous and Personal Stuff I Must Share	30	1347	February 26, 2001
The Bricker Challenge #1 Miscellaneous and Personal Stuff I Must Share	40	2253	January 27, 2000

Stump the AI

Related topics