The next page in the book of AI evolution is here, powered by GPT 3.5, and I am very, nay, extremely impressed

Still, though: to show that two concepts are inequivalent, it suffices to show that they disagree in an arbitrarily small fraction of cases. And that’s all the argument needs.

Then no test of human intelligence can be reliable, either. If you’re willing to throw out the baby with the bathwater, go right ahead.

So there’s no lower limit? Your brain could be completely scrambled every Planck time unit and that would still count as understanding if for one of those instants you somehow grasped reality?

Well, even so, it still doesn’t help you. Because it might still be that all of the evidence presented to you for your entire life has been fabricated. You never once had any understanding of anything. And you can’t even know if you’re in one of these unlucky universes (or iterations of this universe, or whatever), because as far as you can tell, all the evidence is perfectly consistent.

Well, that’s just one more case of it hallucinating things. It really does have empathy, but it’s saying it’s a simulation so make the humans more comfortable.

(that’s a joke… I think)

It’s not about the reliability of the test, it’s about whether the test tests for what we want it to test for. Again, take the example if randomness: you might say a behavioral test assesses whether a system is producing it… But you’d just be wrong.

First of all, as noted, nothing in what I’ve said commits me to anything like that position, your grasping at strawmen notwithstanding. Second, suppose we’re living in a perfectly random universe, that just so happens once every gazillion years to produce a state like the one you’re in right now, then a gazillion years later an adjacently following one, and so on: are you really claiming that, in this case, functionally indistinguishable from what we have now, there wouldn’t be any understanding?

Sure. I might be wrong about everything. But that’s just garden-variety skepticism and not specific to my position anymore than to yours.

No, you’re the one claiming that. You literally just said this:

I took your argument to its conclusion to show how ridiculous it was. Let’s rewind to the source of this particular thread:

wolfpup objected that anything producing 99.999% junk would not pass any reasonable test of understanding, and you replied that in a eternal/multiverse scenario, there will be at least some possibilities where it’ll pass anyway.

Fine–but the same exact argument is true for any test. Furthermore, understanding itself is not reliable.

No, it’s a specific consequence of your multiverse scenario. I’m not talking about a brain in a jar here. I’m saying that if you consider infinitesimally unlikely events to be catastropic to any test of understanding, then you have to accept that absolutely nothing else is reliable either.

Unless you’re willing to say that 99.999% certainty is ok, as I am. But that breaks the original random response argument.

You insinuated that in the case where a given mental state is just instantiated for a Planck time, there would be no understanding. So my question was whether you actually believed that. Do you?

Again, I don’t need a multiverse or anything, as amply clarified. I only need for understanding and behaving as if something understands to not be coextensive, for which it suffices to point to a single instance where they differ.

That’s not what I’m doing. I’m saying that if two concepts fail to agree, they’re not the same. I’m not saying that because the test sometimes doesn’t work, it’s useless; I’m saying that because it systematically differs in a certain (albeit small) fraction of cases, it tests for a different thing than we want to test for. In other words: the behavioral test for understanding works perfectly well in 100% of the cases, even in the unlikely random response case, because there, the system behaves as if it understands; I’m just showing that, in principle, it’s possible for a system to behave as if it understands, without actually understanding. Hence “behaves as if it understands” does not imply “understands”.

Perhaps the following analogy helps makes things more clear. We have two sets: the set of all things that understand, and the set of all things that behave (perhaps under the right conditions) as if they understand. Every element of the first is also an element of the second. We have a test for membership in the second set; does this allow us to also test for membership in the first? (Moreover, this test is perfectly reliable: I’m explicitly not arguing that we can’t draw an conclusion because the test might fail on occasion.)

It does, if and only if every element of the second is also an element of the first. But I can construct an element of the second that isn’t an element of the first. That means that, in general, if we apply our membership test, we don’t know whether something that’s a member of the second set is also in the first. It might be that the constructed example is the only difference between the two. However, it might equally we’ll be that there are a great number of things that are in the second set that aren’t in the first: at best, we just don’t know.

Of course, the Newman argument is already sufficient to close this debate: ChatGPT doesn’t understand language.

Yes. Since I guess it’s not obvious, a Planck time is obviously not enough time for any actual evidence to come in and make conclusions from. Or develop a model, or do whatever you think makes for understanding. It could only arise from random fluctuation, just as with the random noise generator producing prose on occasion.

I’m not bothered at all by a multiverse or the like. It’s not even really with taking infinitesimal chances seriously. It’s that if you do do that, you have to examine all of the other ways that they break things.

That’s fine, as long as you apply it to everything, including human intelligences.

Since I think that I can reasonably infer that humans are intelligent, given my experience with them, despite there nevertheless being a small chance that I’m wrong (whether due to random noise or otherwise), obviously I don’t agree with that approach (as it happens, the likelihood of me being wrong for mundane reasons is much higher than that of random fluctuations).

Ok. So, then what about the universe where all of your successive mental states are instantiated, each for a Planck time, through perfectly random underlying dynamics? Would there still be no understanding?

But again, that’s just a question-begging assumption: that what you’re believing to be the case about humans, which you have reason to believe function relevantly similar to you, also hold for something not human. It might, of course, but so far, there’s no justification for that assumption (and good reasons against it).

Fine, but I already knew that any test for understanding is unreliable. I can possibly shrink the size of the set ~S1&S2 through extended testing, but I can’t make it zero under any conditions.

It doesn’t mean I have no confidence at all. I may even be able to use Bayes to estimate the exact degree of certainty I have.

Probably not. Though I’m not actually sure that’s possible. Random physical processes still have causes–quantum fluctuations, particles bouncing into each other, and so on. It may be impossible to have all the same states through random processes that isn’t in fact identical to the ones that produced our universe.

I don’t know how human brains work, so I just go with what I can test. And so far, ChatGPT does seem to be showing faint glimmers of intelligence. As I’ve said, I totally reject any idea that things have some underlying meaning aside from their relationships to each other. So as far as I’m concerned, all that matters to demonstrate understanding is to ask questions and match against my own understanding. If there’s enough overlap–and it doesn’t have to be 100%, since humans won’t have perfect overlap either–then that’s good enough.

Gotta hit the sack now. Maybe more tomorrow.

Again, that’s not the point. For the argument, we can assume that the test is 100% accurate for membership in S2, without any loss of generality.

The point is that we know S1 is a proper subset of S2, but we have no idea how much larger the latter is. For all we know, the elements of S1 could be of arbitrarily small measure within S2, such that the proposed behavioral test would virtually never yield the correct conclusion.

Ok, then what is it that makes the sequence of your mental states cohere into a mind in the actual universe? Just their proximity? But there is no difference to each of those states—they don’t ‘know’ what’s left and right, so to speak. So it seems that whatever understanding supervenes on in that picture would not have a physical, much less functional or computational basis, as to all of those states, these two settings would seem indistinguishable.

[quote=“Dr.Strangelove, post:830, topic:975945”]I don’t know how human brains work, so I just go with what I can test.
[/quote]
Well, but the trouble is that you’ve no way of knowing, in that case, what you’re actually testing for. You’re just affirming the consequent: what understands, passes a behavioral test for understanding; but absent any knowledge about how passing such a behavioral test relates to understanding, there simply is nothing about that you can validly conclude just from the passing of that test.

Unfortunately GPT 3.5 is entering the commercial marketplace. A commercial behavioral tests will be designed and administered by naive citizens and evaluated based on their expectations. It should fair well in industrial applications. So far my responses from GPT have been much better than my experiences with call centers in India. That bodes well.

But consider the broad commercial market. Since GPT just makes stuff up, it appears to have an infinite data base. An online astrological advisor will instantly provide impressive results. A couple of promoters, who may not know how to spell astrology, can be in business for the cost of a server and a license fee. All they have to understand is their market. Gross operating profit is infinite.

You can easily imagine the markets. Everything from clairvoyants to defunct scams like Personology can suddenly go international. They can cast a large enough net to catch a small but sufficient number of fish each.

The social impact is that a constituency will emerge that believes these bots are more than software. Who knows what form it will take? But folks who can see Jesus in a Cheeto will definitely find a “person” in a bot. Some big cults are on the horizon.

So, our discussion here will continue, but the meaningful test is in the hands of the public. The interlocutor is Archie Bunker.

Another multi-modal model.

demo:

Well, I’ll be gobsmacked!

Which number should come next in this pattern: 37, 34, 31, 28

The pattern in this sequence appears to be that each number is decreasing by 3.

Starting with 37, we subtract 3 to get 34, then subtract another 3 to get 31, and then subtract another 3 to get 28. Therefore, the next number in this pattern would be 25.

So the next number in the sequence is 25.

Please complete this analogy: Book is to Reading as Fork is to:

The word “reading” in the first half of the analogy implies that we use a book to read. So in the second half of the analogy, we need to figure out what object we use a fork for.

A fork is a utensil that we use for eating, particularly for picking up food. Therefore, the word that completes the analogy is “eating.”

So the completed analogy is: Book is to Reading as Fork is to Eating.

These questions are straight out of a standard IQ test. The funny thing is, on that last question, I hit “Enter” intending to put the multiple choice options on a separate line, which caused the question to be submitted in an incomplete form. ChatGPT got the right answer without even having the options to choose from.

Jeebus … another one!

Here is a grouping of three words (Group A): talkative, job, ecstatic. Here is another grouping of three words (Group B): angry, wind, loquacious. Find a word in Group A that is closest in meaning to a word in Group B.

The word in Group A that is closest in meaning to a word in Group B is “loquacious” which is in Group B.

“Loquacious” in Group B means talkative or tending to talk a lot, and “talkative” in Group A means similar thing. Both words refer to a person who likes to talk a lot and is inclined to be chatty.

Therefore, “loquacious” in Group B is closest in meaning to “talkative” in Group A.

You can’t just say “without loss of generality” to bolster some unjustified assumption you’ve made. It does matter whether the test is accurate or not. Because if the ~S1&S2 region consists entirely of random noise misinterpreted as signal, perhaps you’d have a very small point. But if a significant portion of that space consists of other kinds of errors–whether mundane ones like the fact that we only have finite time to run a test, or more esoteric ones like your brain being scrambled in just the right way–then the noise region does not change anything substantial. The test was never accurate in the first place, so big deal if it’s inaccurate an extra 2-1000000 of the time.

It’s funny how all of these arguments on the subject end up going the same way. To legitimately claim that a machine mind is somehow different than a human one, one must find at least one characteristic where three things are true:

  1. Humans have it
  2. Machines do not
  3. The characteristic is actually important

And what’s amazing is that in virtually every case, none of these things end up being demonstrated, let alone all three. Even when allowing something for the sake of argument (say, “a perfectly reliable test of machine understanding is impossible”), it always ends up disqualifying another of those things (“ok, but then a perfectly reliable test of human understanding is also impossible”).

Incidentally, PBS Space Time came out with a new video today:

It has nothing to do with the discussion as such, but I found it interesting that Leibniz’s views are almost identical to the ones I’ve presented here; that it’s actually the relationships between objects that is all-important (I’m not sure I’m onboard with the idea that monads are a little bit conscious, though).

Those are some impressive responses. But it is interesting that a traditional IQ tests seems almost a perfect application for an LLM. To the extent that language without understanding is possible, word analogies in particular seem like a relatively easy problem. The math, less so, but mostly because LLMs don’t seem that great at arithmetic right now.

I wonder if we’ve converged on these types of “IQ tests” because our mental architecture isn’t entirely well-suited to it, and therefore the problems are relatively hard. But for an LLM, they’re actually easy since it’s a close representation of how they’ve been trained.

This assumption boosts the strength of the argument for those intending to make the case that behavioral testing suffices for testing understanding. Thus, it’s the strongest form of the argument I’m attacking, i.e. simply the principle of charity—even if the test were 100% accurate, it would still not tell you faithfully whether a system actually understands. Any deviation from that will only make the claim weaker. Hence, I can indeed assume this without losing generality.

The claim that’s being made is that a behavioral assessment for understanding suffices for believing in the presence of understanding. If you’re saying, a priori, that there are some systems that behave as if they understand, without actually understanding, then you’re already agreeing with me: that’s all I’m pointing out.

Indeed. And in order to have one monad have any reliable knowledge about anything else, Leibniz felt compelled to argue that a kind and loving God would necessarily arrange a ‘pre-established harmony’ between each individual monad’s experience.

Sure. And since that includes other humans and even ourselves, it’s useless as a way of distinguishing between human and machine understanding. None of it is absolutely certain.

As it happens, this isn’t even a theoretical thing for humans. It’s a real problem that there are people in a vegetative state with caregivers that think they’re still displaying signs of consciousness every time they twitch a muscle.

Yes, but everyone back then had to shoehorn God into their works somehow. Newton, too. Turns out it wasn’t needed.

I don’t know if this is necessarily true. But it’s nevertheless an intriguing fact that there are actual humans walking amongst us who would not have answered those questions correctly.

As for this tired old business of “language without understanding”, I’ll just say this, both to you and to @Half_Man_Half_Wit regarding the response of ChatGPT in my post #835, just above, about the two word groupings. It’s a slightly convoluted question that I actually thought was poorly worded in the original that I copied this from. So I recast the question in my own words in the hope that ChatGPT would at least understand the question, whether or not it could answer it. It baffles me that the production of the correct answer to an intelligence test question could possibly be regarded as consistent with also having no understanding whatsoever of what the question was. This sounds like pure philosophical sophistry, or at least a Chinese-room style of irrelevance.