One of the things I’ve repeatedly found with ChatGPT is although it’s usually pretty good at solving problems like this – and often amazingly good at solving much more complicated ones – it usually misses obvious shortcuts, like in this case the fact that the room dimensions were an integral number of yards.
Here is a more involved example. To solve the problem, ChatGPT sets up a system of equations, solves them, and arrives at the correct answer. But as I realized when looking at the puzzle more closely later, the puzzle is actually simpler than it appears, but the AI couldn’t see the shortcut.
We get people in Factual Questions asking the equivalent question. 4% is way lower than I would have guessed, but I’m also sure the “real number” if not 4% is way lower than I would like.
Of course it depends on what AI model one is talking about. In terms of Large Language Models specifically, absolutely, yes. They used to be notoriously bad at simple arithmetic which was in stark contrast to their impressive general problem-solving skills, but from what I’m seeing even ChatGPT 3.5 has improved and GPT-4 is supposedly much better. They are absolutely not designed for arithmetic, but seem to evolve an alien sort of number-crunching capability over time as an emergent property of scale, which yields correct results in many simple cases (even with large numbers) but often results in approximations.
I really think a lot of people would fail it because they go in with certain expectations and assumptions, and those expectations/assumptions make them make the obvious, foolish mistake.
There’s a riddle I ask kids, goes something like this:
You’re driving a bus, and there’s 32 people on the bus. At the first stop, two people get on, and one person gets off. At the second stop, 3 people get on, and 9 people get off. At the third stop, fifteen people get on, and nobody gets off, and at the fourth stop, one person gets on, and seventeen people get off. The question is, how old is the bus driver?
It stumps every child, even when I tell them the riddle a half-dozen times. The information they need is right at the beginning of the riddle, but these kids–fluent speakers of English–miss it, because the “genre” looks to them like a simple question of addition and subtraction, and they ignore what they consider irrelevant information.
Imagine the story problem were phrased this way:
You’re remodeling your room. You measure the living room’s dimensions as 9 by 12 feet. The carpet store sells carpet by the square yard. It’s on sale for $9.49 per square yard. How much will it cost cover the living room floor exactly?
This rephrasing removes the elements that are tricky, emphasizing the unit change by placing the units at the end of two consecutive sentences. If the literacy test is designed to reveal who notices tricks, nifty–but I don’t think that’s what people consider to be literacy, usually.
TBH I haven’t had to calculate the slope of a line in more than a third of a century. I could probably figure it out if I ever needed to do it, but it’d take me some time to reconstruct the method.
I think when we teach mathematics (or any other subject) the goal is always to be able to “reconstruct the method” (which requires a basic understanding of geometry, algebra, whatever). Photographic memory or not, you will not get very far, nor is there much point, to memorizing a book of formulae.
It is true that preschool and early reading is a correlation, not a causation, hard to double-blind and maybe unethical if actually beneficial depending on study design. The conclusion these Interventions seem to boost IQ is problematic for some reasons, but better than the inaccurate media-popularized version of more limited studies showing benefits for video games or Mozart (which have been largely disproven).
(Possible concerns include increases in IQ similar to standard error or reportedly too large as a % of standard deviation to be credible, confounders, media misrepresenting the study or not reporting caveats, others not able to verify results independently, small underpowered study, benefits mainly in the same areas trained so “teaching to the test” instead of a general boost in areas unrelated to the test, etc.).
But it seems to be more robust and repeatable in the case of preschool and early reading. Good data is good data regardless of our shared history. So I would personally choose a yellow flag. I take your point the issue is probably involvement in the story and not its form as written or oral, though these days maximizing intelligence probably requires both, and more modalities are probably better than few. That said, the author does not believe any specific intervention has been definitively proven to boost intelligence. (Though marry someone smarter than you, and neglect can lower intelligence.)
My own problem with real-world math has been an inability to frame the problem correctly. I’m a whiz at simple operations, and since Mr. Legend (the math/engineer of the family) taught me to estimate in order to do a quick check on my results, I do just fine on problems like the carpet cost or credit card comparison. My issue is not always knowing how to approach a situation in order to solve it.
I’m only halfway through my first cup of coffee, so I’m struggling to come up with an example, but I’ve seen even pretty bright kids have the same issue: there’s a situation that needs a practical solution, and they have a good grasp of the middle-school math they’re being taught, but they can’t get a handle on how to translate the variables of the situation into the formulas they know. Kids might be better served by being taught to write word problems along with learning how to solve them. The best teachers incorporate this kind of creative problem solving in their lesson plans, but it’s impossible to give every kid the best teacher.
A lot of intermediate math seems (to me) to be worthless in of itself, but its purpose is to give you the grounding to understand and be able to do calculus, which is the heavy lifter of real world math in things like engineering.
I’d actually be quite surprised if most university degree engineers couldn’t calculate the slope of a line, and would ask for evidence. Comes down to very basic formulae fairly deeply ingrained at an impressionable age.
I got $113.88 without a calculator. Here’s how I did it. The room is 3 yds x 4 yds so 12 sq yds. Had it been $10/ sq yd, that would be $120. Subtract $6 leaving $114 for the price at $9.50 and another 12c. for 9.49. Easy.
I always blank out when I try to remember the formula for calculating the area of a circle. Is it 2 π(r)^2, π(r)^2, or ½ π(r)^2? I have to resort to performing the integration in my head.
Now what would be hard for me is to figure out how to format this post so that those formulas looked good. So I am not so literate in that regard.
Easier done than said, though. Imagine you have a pizza, and you unroll it so the slices form a series of spikes. The base of that shape is 2 \pi r (the circumference), and the height is r. Multiplied together we get 2 \pi r^2, but since the area of a triangle is half base times height, the area ends up being \pi r^2.
Alternately, imagine a unit circle inscribed in a square. The square clearly has area 4. The circle has an area of roughly 3. So the area must be \pi r^2, since r=1. Any other trivial constants (i.e., 2, 1 \over 2, etc.) wouldn’t work.
You might think of a circle inscribed in (surrounded by) as small a square as possible. This square has an area just a bit bigger than the circle. The side of the square is 2r so the right answer from your choices is obvious. At least your dimensional analysis is good.
Added: And of course, now I read the previous post…
Guys, I know how to do the math. I’m just amused that I needed to resort to integrating r² dθ in order to be sure. This happened for the first time early in my career. I didn’t have a table of formulas handy (unlike when you are in college and every textbook has them). This was before the internet, and I sure as hell wasn’t going to ask a coworker (and when you are an intern, every coworker is your superior) and embarrass myself. Calculus to the rescue! But I still question my memory all these years later, so I still do the math in my head. Just my quirk, I guess.
BTW, when you use the integrating r² dθ method, there is both a 2π and a ½ that show up in the math, and that is the source of my uncertainty.
That kind of thing happens to me all the time–maybe not with that one in particular, but other formulae. So I keep around a collection of mental helpers so I can keep the memorization to a bare minimum. That includes some tricks for easy integration and looking for upper/lower bounds on things. I know I always like to learn more of these tricks.
Being able to rederive things on demand is key to being numerate. It’s also what’s so great about math compared to many other subjects. I can forget almost all of it and still fill in the gaps with what I do know.