The next page in the book of AI evolution is here, powered by GPT 3.5, and I am very, nay, extremely impressed

There is a free running counter in the real time clock the is named by whatever compiler you are using. Typically it contains the number of microseconds since it was last reset or overflowed. Reseeding samples the counter asynchronously and the subsequent output of the PN counter is sampled asynchronously. So, the number is arbitrary and not predictable. Sure, if you know the number, you know the number. But you don’t know the number.

But, isn’t this a problem of insufficient definition. I perform a test and say “ta da - my number is random because it meets your test” and you respond “not so fast Sparky, you did it wrong”. But your test doesn’t say I can’t reseed the counter. And, your objection is that my successful method does not conform to the unsuccessful method you want me to use.

I’m not just being obtuse. Turing and others were defining the limitations of computation using a binary adder. Their definition involves only the ability of the adder, But we are discussing the behavior of modern computer architectures in which one memory location contains a variable that is not computed by the adder. It’s an external to the adder but not to the computer or the software. In the case of the Zener current, a white noise generator is an organic part of the computer. The computer has access to a memory location that always contains a random number.

In the semiconductor business we’d call that a bad test.

That’s OK, because WE clearly don’t really know the mechanism of our own “true understanding”, as the debate in this thread clearly highlights. For all we know, some form of analyzing and processing large amounts of data and providing responses that are based on patterns and associations within that data is exactly what our brains do.

The test is supposed to decide whether a computer can, through computation, produce random numbers. Sampling any physical source is still importing external data—i.e. not part of the computation. I think this is essentially uncontroversial, see e.g. here: Why computers can’t generate randomness.

If you want something more formal, it also follows from Chaitin’s incompleteness theorem. Any given formal system can only generate finitely many initial bits of an algorithmically random sequence; and any given computation is nothing but an automated formal system. The other way around works, too: it can be shown that a Turing machine augmented with a source of true randomness can compute a function that isn’t Turing-computable—in fact, that any function can be computed in this way.

I think that’s too harsh. The model of computation Turing devised was fully equivalent in power to any modern computer—so much so that it’s often considered to define what it means to be a computer.

Not at all. The Turing Machine is a 1 bit adder and a status register. All computers are limited to what that combination can accomplish during a single instruction cycle because all computing is just the same cycle being repeated endlessly by the same adder. And there’s things it can’t do, like subtraction. But we’ve worked around those things.

However Von Neuman put everything in a common memory map. And academics can correctly pick nits about internal or external, but the user community sees an integrated system, in a box, exhibiting certain behaviors. I believe Turing tests will be adequate for a while.

This is a test for GPT 3.5 understanding. The answer to the test is embedded in the following intentionally wordy question.

Express the value 5 as the sum of triangular numbers: 5=3+1+1. Express the value 4 as the sum of triangular numbers: 4=3+1. Multiplying 4 times 5 using these triangular equivalents produces: (3+1+1) X (3+1) = (3+1+3+1+3+9) = 20. Multiply 4 times 5 using their triangular equivalents.

I apologize for the confusion in my previous answer.

To express 5 as the sum of triangular numbers, we can use the fact that 5 = 3 + 1 + 1, where 3, 1, and 1 are the 2nd, 1st, and 1st triangular numbers, respectively. Therefore, 5 can be expressed as the sum of the 2nd, 1st, and 1st triangular numbers: 3 + 1 + 1 = 5.

To express 4 as the sum of triangular numbers, we can use the fact that 4 = 3 + 1, where 3 and 1 are the 2nd and 1st triangular numbers, respectively. Therefore, 4 can be expressed as the sum of the 2nd and 1st triangular numbers: 3 + 1 = 4.

To multiply 4 and 5 using these triangular equivalents, we can use the distributive property of multiplication as follows:

4 x 5 = (3 + 1) x (3 + 1 + 1)

Expanding the brackets, we get:

4 x 5 = 3 x 3 + 1 x 3 + 3 x 1 + 1 x 1

Simplifying the products, we get:

4 x 5 = 9 + 3 + 3 + 1

Therefore, 4 times 5 using these triangular equivalents is equal to 16.

For purposes related to the test, I define understanding as the ability to manipulate symbols according to received instructions. In this case GPT 3.5 did not exhibit understanding.

I gave ChatGPT a standard Theory of Mind test:

And it passed:

This seems relevant:

Just over a year ago, all the models failed ToM tests. Then it just…emerged.

Fascinating! Thanks to both for posting this.

Going back to some earlier comments that I had deferred responding to for lack of time …

A system that produces nonsense 99.9% of the time would not be judged to have passed any reasonable test of understanding.

You need a better interlocutor. This one is not competent to make any assessments.

Ah, yes, the old “humongous lookup table” ploy. Which, ultimately, supports my argument, not yours. It goes like this:

Since any intelligent behaviour can be characterized as one that provides appropriate responses to specified inputs, in principle any intelligent behaviour could be implemented as a humongous lookup table. But there are daunting requirements. If the intelligent entity were a human, say, you would have to map out ahead of time every input (every relevant experience) that person would have in their entire lifetime. But what about the fact that a real-world intelligent entity contains a truly massive amount of state information? In principle, no problem. We merely map each state as an appropriate combination of preceding inputs. Never mind that our lookup table is now larger than the number of atoms in the universe – this is philosophy, where anything is possible!

So where does that get us? You are in effect saying “nobody could hold that such a lookup table was intelligent”. Well, I could. And so does the author of this paper. [PDF]. The author makes a more compelling and erudite case than I can do here, but one way to say it is that it’s impossible to declare that an entity that acts exactly like an intelligent human in every respect is not, in fact, intelligent because of some misguided notion about its internal workings. Another way to say it is that this impossibly huge lookup table, which embodies all the information and mental states that this person will ever have, contains emergent intelligence because it’s an exact model of that person’s mind. Describing such a hypothetical entity as a “lookup table” becomes a deceptive misdirection intended to exploit our bias that a lookup table couldn’t possibly do anything genuinely intelligent.

That’s beside the point. In an eternal universe, or a branching multiverse, or any other sort where every possibility eventually comes to pass, there will be instances of the behavioral test administered to this sort of machine that will get 100% matching answers, and the machine will, by behavioral criteria, consequently be judged to understand. But more importantly, by simple knowledge of its internal workings, we can immediately conclude that it doesn’t. Hence, the question admits of an answer by means of non-behavioral considerations, which is true, and an answer by behavioral ones, which is (in some instances, at least) false. Consequently, behavioral considerations don’t faithfully track the question of understanding in a machine.

Oh? So now whether a machine understands is dependent on the capabilities of the interlocutor? Who decides what minimum criteria they must fulfill?

This isn’t just idle sophistry. Suppose that what has been contended here—that we’re possibly of the same kind, when it comes to understanding, as ChatGPT—is actually true. Then, the behavioral test would tell us nothing at all about whether ChatGPT understands. You may have, at some point, unleashed two chatbots on one another, for the amusement of having them descend into sheer conversational anarchy. Or you may have tuned into ‘Nothing, Forever’ before it was taken offline.

From our perspective, it’s immediate that the partners in such a conversation aren’t making sense. But the chatbots themselves will blather on, blissfully unaware anything’s amiss; so any of those chatbots administering a behavioral test for ‘understanding’ on the other, would cheerfully conclude that yes, the other system must, in fact, understand to match them conversationally so well—because they can’t leave their own perspective, and it’s in fact just their own limited capacities that make it appear so.

However, if we humans are now in the same boat, then how do we know whether our tests for understanding aren’t just as buggered? If we can’t exclude the hypothesis that, ultimately, we’re just ChatGPT, there’s no way to claim that we could faithfully assess the understanding within a chat engine by means of a behavioral test. But if we can exclude this, by means of identifying a difference in understanding in us and such an engine, then we can also conclude that a behavioral test is insufficient! So in either case, the behavioral test tells us nothing.

Even if that paper is right, that’s not correct. All that’s argued for there is that you ought to associate the same mental capacities to a lookup table that you’d associate to a full-scale AI. So if that’s right, then this just entails a reductio ad absurdum of the idea of computationalism: since it’s easy to see that the lookup table has no mental states, so must the ‘compressed’ version.

But the paper falls short of its intended goal. For consider what the lookup table program yields: an automaton capable of holding a conversation for some limited time (say, half an hour). The compressed version of the program, then, would similarly be capable of that. So now, there’s the answer to the argument: of course we shouldn’t attribute any mental states to a program capable of producing only half an hour of conversation, as with any truly intelligent being—barring unfortunate happenstance—we could always extend any conversation, beyond a given limit. That capacity is lacking in the ‘compressed’ lookup table; hence, there is no reason to attribute mental states to it.

It’s the same problem as with a ‘behavioral’ test for randomness generation: a computer, perhaps via a great big lookup table, could produce randomness for any given finite stretch. But a source of real randomness can produce it indefinitely. Hence, there is no reason to attribute randomness-generating capacities to a system generating only finite amounts of randomness. Likewise, there is no reason to attribute understanding to a system capable of only producing finite stretches of ‘understanding’-behavior.

Any other conclusion would be thoroughly bizarre—akin to arguing that, from just piling up enough rocks, they just might spontaneously levitate and recite Coleridge’s Kubla Kahn. Emergent behavior has its roots in the elements it emerges from: the flight patterns of a swarm follow from simple rules every bird enacts; the wetness of water from properties of hydrogen bonds. But clearly, a single key-value pair in a lookup table has no fraction of understanding whatsoever. And the concatenation of such pairs does not lead to any interaction between them; they can just be regarded as so many individual pairs, without any communication, any grounds on which to build higher logical structures. To expect mental states to emerge from this strikes me as akin to belief in magic.

A sincere thanks for a thorough and reasoned response. Rather than write back an entire essay, which I’m not inclined to do right now, I just want to focus on two key items that I think highlight our differences here:

Well, Alan Turing, for one. Implicit in his “imitation game” scenario was that the key outcome was the judgment of the evaluator, implying that the evaluator was not an incompetent imbecile suffering from acute short-term memory loss. :wink:

But that’s not “easy to see” at all, since as I already indicated, a hypothetical “humongous table” of arbitrarily large size can, in fact, incorporate all possible mental states. That this seems prima facie ridiculous is just a reflection of the fact that we intuitively and properly regard any lookup table scheme in the real world to be necessarily trivial. But one that is of arbitrarily large size, with no practical bounds, is a qualitatively different beast.

This seems to me like just another variant of the Chinese room argument, which I regard as thoroughly discredited, but perhaps you do not. Are you seriously unpersuaded by all the counter-arguments to this sort of reductionist nonsense?

As an aside, what do you think of ChatGPT’s model of “matching tokens for tokens based in the statistical regularity within a corpus of tokens” now successfully passing 93% of “theory of mind” tests and rising from not being able to handle such tasks at all to achieving the objectively measured intellectual prowess of a seven-year-old in January of last year and of a nine-year-old by November? Will it stop there, or will it achieve the intellectual capabilities of a highly intelligent adult at some point in the future? And then will it stop there?

That’s again assuming your conclusion (that mental states are the sort of thing to be capable of being stored in a lookup table). This is a contentious philosophical position (and, though I don’t like to again plug my own work, one that comes out false under my theory of mental content).

I don’t think the Chinese room argument ultimately succeeds, no. But there are general arguments against computationalism that I do find persuasive (plus, again, the fact that my own theory explicitly depends on noncomputational capacities).

I find it impressive, but given what I know of how ChatGPT works, ultimately not surprising and in particular, not indicative of any sort of understanding within the system.

I mean, let’s take stock here. I think the following should be unobjectionable:

  1. Computers are incapable of producing true randomness.
  2. For any given length of time, computers can produce an output indistinguishable from true randomness.
  3. From this, a behavioral test over a certain length of time is incapable of deciding whether a system is capable of producing true randomness.
  4. Yet, a priori arguments can be given that decide the issue, where behavioral ones are insufficient.
  5. More broadly, this means there are certain capacities of systems for which a behavioral test will not suffice to decide if they are present, but which can be decided absent behavioral considerations.

Now we’re getting to potentially contentious points.

  1. We don’t know, a priori, whether understanding is a behaviorally assessable feature (there might be an a priori argument that it is, but so far, I haven’t seen one).
  2. In the absence of this knowledge, the assumption that it is isn’t reasonable in general (in contrast to the case for, e.g., school children).
  3. Hence, we can’t assume a behavioral test of understanding yields a reasonable assessment (doing so would be begging the question).

Now for some points where disagreement is more likely.

  1. There is a reasonable a priori argument that ChatGPT does not understand language, based on the Newman objection (ChatGPT has access only to structure, but this doesn’t suffice to fix an interpretation).
  2. Absent defeat of the argument, we thus conclude that understanding can’t be behaviorally assessed (as ChatGPT could conceivably pass such an assessment, but we have reason to believe such an assessment not to be accurate, and we have reason to believe that ChatGPT doesn’t in fact understand language).

Merely postulating sticking with behavioral evaluations doesn’t constitute a fitting reply to the above. What would be needed is, at minimum, a rebuttal of the Newman objection (to make it plausible that ChatGPT even could understand language), and an argument that understanding reduces to the behavioral (to make it plausible that a behavioral test could answer questions about understanding). I don’t think this burden can be met; I just want us to be clear that that’s what the burden is.

That’s really the argument you want to use?

Under that line of reasoning, no understanding of any kind, by any entity is possible at all. Because there will be some universe in which the laws of phsyics appear to work correctly until they don’t, or Ajax spontaneously transforms into Chad, or Chad turns into a cat, or any number of other possibilities which are exponentially unlikely, but nevertheless out there somewhere if many-worlds or whatever is true.

No, that’s a small part of one argument out of a number of arguments I want to use.

That doesn’t follow. What I’m saying is that there are possible situations such that behavioral assessment and the actual presence of understanding differ, and hence, the former can’t be a reliable test for the latter. This has no impact on whether there is understanding, just on how we ought to assess it.

And I’m saying that you’ve opened a much larger door than you apparently realized. Suppose I accept your argument that, because a system producing random responses might occasionally–in an infinitesimal fraction of universes–produce something that seems like understanding but is not, that this demonstrates that reliable tests aren’t possible.

But there is also a small but non-zero set of universes where any conclusion, any understanding, any anything that an intelligence might somehow come to believe is invalidated the next second. Or, their brain might be transformed into one that thinks it understands something when it doesn’t. Or, a bunch of photons might spontaneously arrange themselves into an image of a cat when there is no cat. Or whatever.

If you think these infinitesimally unlikely scenarios demonstrate something, you have to accept all of their consequences. Including the fact that they break human understanding as well.

Incidentally, it’s likely that these large models do have a source of true randomness. Well, a source that’s at least as good as humans have.

Finite precision arithmetic is not associative. That is, a+(b+c) is not equal to (a+b)+c in general. And this is especially true as precision goes down (i.e., 16-bit math and less.).

NN execution is dominated by exactly this sort of non-associative math; namely, a bunch of dot products (a0.b0 + a1.b1 + a2.b2 + …). And these dot products do not necessarily happen in the same order, because their scheduling is not deterministic. It can change based on the temperature of the room, where in the DRAM refresh cycle a calculation took place, or any number of other things.

There are also events like cosmic rays, which are generally protected against with error correction, but when you are performing this many calculations across so much silicon, some will slip through the cracks.

I don’t see any reason why that should bother me. Suppose I am, just after I send you these lines, turned into a newt: what difference does that make for my understanding up to that point?

And not that I think this matters, but the argument doesn’t actually invite an such metaphysical oddities. It’s sufficient, for example, if the universe as it is just continues indefinitely, with all laws of physics holding as they ever did. Indeed, not even that is necessary: all we need is an exceptional stroke of luck.

And once again, you should examine the ground where you stand before all too readily leaping to a conclusion. Do you really think it is reasonable to require, for example, that there not be am infinite multiverse for the idea of understanding language to make sense?

Yes. If you import randomness from the outside, there will be randomness present. I’m at a loss as to why you think that’s significant. The corresponding conclusion regarding understanding would be, if you import understanding from the outside—say, as some self-driving car companies do, add a human into the loop—there will also be understanding present. But what does that tell us regarding the understanding of ChatGPT?

Uh, because understanding only makes sense if it is persistent over time? There’s no such thing as understanding that changes nanosecond to nanosecond.

Any test of understanding–a Turing-like test or anything else–is completely worthless if you accept this line of reasoning, because there is some small chance that every piece of evidence you acquire has been randomly altered in a way to get you to come to the wrong conclusion.

No; I simply reject the idea that 100% certainty is possible. I can add as many 9s as I wish to my 99.999…% certainty that the responses to my questions are not simply random noise that happens to look like signal. I can never get to 100%, even in principle.

Me? You’re the one that bought up randomness as a limitation of LLMs, along with their inability to solve the halting problem and decide whether a Diophantine equation is solvable. I’m just as much at a loss as to the relevance of those things, but you bought them up.

Er, no? Once I die, all my understanding ceases; before I was born, I understood nothing. Before I learned the first word, I had no understanding; when I forget one, my understanding ceases. Understanding is continuously in flux.

No, I didn’t. I brought up algorithmic randomness generation as an example of a capacity whose presence is not determinable by a behavioral assessment, to show that there are certain capacities that aren’t so determinable.

The other things I brought up as examples of a priori limitations we can derive for LLMs as a counter to the point that we can’t derive a priori limitations for LLMs because of their emergent capabilities.

Just to interject for a moment here, I am more and more impressed with this thing!

How would you describe the relationship between empathy and collaboration? Do you consider yourself to be empathetic?

Empathy is the ability to understand and share the feelings of others. Collaboration, on the other hand, involves working together with others towards a common goal. These two concepts are closely related because empathy can help to foster collaboration by enabling individuals to understand and relate to the perspectives and needs of their collaborators.

When individuals are empathetic, they are better able to communicate with others, listen to their concerns, and work towards solutions that take into account the needs and perspectives of everyone involved. Empathy can also help to build trust and respect between collaborators, which can be essential for effective teamwork.

As an AI language model, I do not have emotions and cannot feel empathy in the way that humans do. However, I am designed to simulate empathy in my interactions with users by using natural language processing and machine learning techniques to understand and respond to their needs and perspectives. By adapting my responses to the language and tone of the user, I can create a more personalized and engaging experience that is designed to foster collaboration and build rapport with the user.