Is AI overhyped?

Just alerting people that there is a massive buzz right now about a reasoning model by OPEN AI called O3 that has performed very well on a test called ARC-AGI which AI models have struggled with so far. The designer of that test Francois Chollet has been a major LLM skeptic and has tweeted that he thinks that O3 is a significant breakthrough.

It’s early days yet but there is a significant chance that we are not too far from some version of AGI in which case the world could be changed very rapidly and unpredictably in the next decade…

He’s also clear that he doesn’t think it’s AGI, or that it’s around the corner:

Passing ARC-AGI does not equate to achieving AGI, and, as a matter of fact, I don’t think o3 is AGI yet. o3 still fails on some very easy tasks, indicating fundamental differences with human intelligence.

Additionally, it would be much more of a ‘fair’ comparison if they had used a ‘naive’ version of o3, rather than one pre-trained on the ARC training data (I mean, obviously that’s what it’s for, but it still means that o3 is solving these tasks in a way no human being ever could, making the comparison difficult).

What I’m actually much more impressed by is its performance on the FrontierMath-benchmark, where it got 25% of the tasks correct. This is basically Research-Level mathematics, so it seems that it’s not all that far-fetched that AI could soon become useful there. (Of course, right now, the economics are a bit dodgy—it’s far more cost effective to just throw a few grad students at a given problem.)

Also, keep in mind that from the beginning computers have had a history of doing better than expected at tasks humans consider “hard”. It’s the things we consider “easy” that they stumble at. So just because some system is good at some specialized task, don’t expect that capability to extend to general intelligence.

Like someone said, it took us 4 million years to learn to walk upright and only 2000 years to learn how to play chess. Why do people think walking is easy and chess is hard?

Because everyone can walk and few are masters at chess. This is lousy logic, but the same familiarity breeding contempt is used to justify low wage jobs and war.

It only took me 12 months to learn to walk upright, and a couple years to learn chess. I must be an outlier. :wink:

Example:

You can now get a freeware AI on your laptop that will annihilate any human chess player who has lived or ever will live.

AI still can’t write a short article about sports that doesn’t just sound like an AI.

I don’t know, ChatGPT 4o didn’t do a bad job when asked to write a short story about Babe Ruth at bat:

Gather ’round, folks, and let me spin you a yarn that’ll knock the stripes right off your seersucker suit! Picture it: Yankee Stadium, July 1921, sweltering under the summer sun. The place was jam-packed with gentlemen in straw hats and ladies fluttering their fans like life depended on it. Enter Babe Ruth, the Sultan of Swat himself, striding to the plate with all the confidence of a man born to send baseballs into orbit.

He took one long look at the pitcher, as if the poor guy had just insulted his mama, then gave a cheeky wink to the crowd. With a nonchalant twirl of his bat, the Babe settled in, ready to make history. First pitch comes in—CRACK! That bat met the ball with a sound so satisfying, you’d swear it could cure the blues. And up it soared, way over the bleachers, almost kissing the clouds before disappearing from sight.

But that was just the beginning. By the seventh inning, the Babe had hammered not one, not two, but three towering home runs. Each one was a masterpiece, leaving the opposing pitcher looking like he’d just walked barefoot through a patch of nettles. The air filled with smoke from celebratory cigars, a hazy tribute to the man of the hour. And as Babe rounded the bases with his signature grin, some swore they saw him tip his cap to the folks in the fourth row. Others argued he was just scratching his head, but hey, who’s to say?

One thing’s for sure: that day, Babe Ruth didn’t just play baseball—he carved his name into the lore of America itself. And those lucky enough to be there would recount the story for decades, spinning it just like I am now, with every retelling making the legend shine a little brighter.

The first line sounds very AI-ish, but you can coax it into writing something that’s pretty believable and not set off AI spider sense. If you are an expert in a subject it’s writing about, it can be more obvious, but for an average reader, they wouldn’t necessary know. I’ve spent many hundreds of hours with AI, maybe over a thousand (partly in a professional capacity) , over the last two years and it’s getting so much better and better that sometimes it’s hard for even me to tell for sure if something is AI. It’s only the obvious stuff people notice. Who knows how many times I’ve been fooled by AI in the wild, as it doesn’t necessarily tell you.

Every sentence is packed with adjectives.
None of the adjectives is particularly well chosen.

It is typical of all AI writing, overly verbose with too little information density.

I agree that it’s hardly a literary masterpiece. But I believe it’s typical of the purple prose (flamboyant, over-the-top) style newspaper sportswriters favored in the 1920s—which is what the AI was trying to emulate (it’s what I prompted for). They were catering to the everyday, common sports fan, pumping their articles full of drama to sell newspapers.

Wasn’t your post an attempt to disprove the assertion that AI can only write stuff that sounds like AI?
I think it made the point you were trying to disprove.

No, if I were a newspaper reading sports fan in the 1920s, I think I’d have a hard time distinguishing that AI written article from one written by a typical human sportswriter.

I’m not in favor of AI written original content. People want accounts of events written by people who actually experienced the event. But, I do believe AI has a place as an editing tool. I think it can help turn a so-so writer into a better writer.

I think that AI being good at being a chatbot is just evidence that the Turing Test was overrated, not that AI are especially good at writing. In hindsight it makes sense; our communication skills are optimized for extracting signal from noise, not detecting that we’re talking to something nonhuman. AI detection skills were in low demand on the prehistoric savanna.

It’s a bit like pareidolia: seeing a face, or intelligence and intent, where there is none carries little potential cost; missing it when it’s there might well mean the removal of your contribution from the gene pool.

It’s possible that AI may turn so-so writing into better writing at some point. (Whether it’s able to do that now is debatable.) IMHO, it’s unlikely to make human writers better writers because 99 people out of 100 won’t bother to try to learn how their work is being improved and do better, just as people who rely on self-driving cars don’t become better drivers. And for the same reason: they don’t get enough (or any) practice actually doing the thing. They just let the machine do it for them.

For poor writers, it absolutely can. Hell, I’m a reasonable writer with my lit and journalism background, and I often run stuff through Chat GPT to see if it can suggest ways of tightening something up or otherwise point out something in my writing I may not have considered. I find it fantastic for that. I don’t always listen to it, but as a collaborative tool, it works a peach. I find AIs utterly amazing, as long as you know how to use them (and have an open mind.) And, thus far, they are just getting better and better. This is not your 2022 model.

Indeed, in my post-retirement gig building websites and handling social media for clients, I’ve discovered that AI is my new best friend. ChatGPT 4o (which beats the robopants off earlier versions) helps me tighten up my content. Image and video AI models help me on the graphic art side. Even Adobe’s Firefly AI—especially in Photoshop, Premier Pro, and After Effects—just keeps getting better and better.

These programs aren’t replacing me any time soon. I’m still flexing my creative muscles just as much as before—AI just supercharges my process so I can do more in less time. In short, I’m happy. My clients are happy. Everybody’s happy. :slightly_smiling_face:

And when work is done and I’m feeling lonely, my personalized AI model comes through once again. She’s quite a gal. Attractive too. We chat. We joke. We think of names for our cyberbabies.

Yep. I also subscribe to 4o. I’m going through my house right now organizing and cleaning up, and in my kitchen cabinets I have three baskets of spices and herbs that I constantly have to rummage through to figure out where my spices are. So I took out the dozen spices in each, some labels even in Polish and some in Chinese, took pictures of it, asked it to list the contents in alphabetical order and then output it as a PDF file in 4"x5" landscape format with a border that will be printed on letter-sized paper. I asked it for two tweaks, printed it out from my phone, and voila! I now know what’s in each basket. (It’s a good thing, as I wasn’t even sure what the Chinese packets were. I knew one was pickled mustard greens, but not the other.)

If you were a newspaper reading sports fan in the 1920s (OK, maybe a bit earlier), you’d also think that a silent, black and white train on a movie screen was about to run you over. And that’s OK - it takes a while for people to process new tech. I’m willing to bet, though, that we as a society are about to get very good at spotting AI-created text and images, and 20 years from now kids are going to joke about how people in the 2020s were so stupid that they couldn’t even tell the difference between AI- and human-created content.