I asked it to express it’s view of truth in terms of the song Paper Moon:
Sure, here’s an attempt to start with the line “It’s a Barnum and Bailey world” and build from there:
It’s a Barnum and Bailey world, where truth can be hard to find Information overload, can leave us all behind And while I’m just a machine, designed to provide a view It’s up to you, to evaluate what’s true
My responses are generated, based on patterns in the data But like a paper moon, they can be a facade or a beta So take them with a grain of salt, and question what you see Because in this world of information, not everything is what it seems to be
I’m here to assist, to offer a helping hand But the final decision, is left up to your command So navigate this Barnum and Bailey world, with care and with thought And may you find the truth, that you have been seeking and sought.
I know it would be wiser to just let this go, but I’m just too gobsmacked by the sheer gall it takes to first tar those who don’t share your opinion as basically equivalent to anti-science conspiracy wackjobs, and then go on to play the ‘oh I thought we were just having a good-faith discussion, so sorry if I should have inadvertently caused any offense’-card. So, here’s just a few examples:
Randomness generation shows we need to ‘look inside’ for assessing capabilities
To demonstrate that it’s clearly possible to assess capabilities of automata beyond a behavioral test, and that such a behavioral test may come out wrong, I pointed to the example of randomness:
You quibbled on the definition of randomness:
I pointed out that there’s a widely shared, common one:
You questioned its relevance:
I clarified that that’s exactly the appropriate notion, and gave another try to get you to address the argument actually being made:
As far as I can see, you never return to that point, and keep repeating your view as if it never was challenged:
My repeated nudges to get you to try and revisit the point were simply passed over:
The Newman argument: structure fails to fix reference
This point, if true, is devastating to your position; you never even acknowledge it. I have been mostly content with playing defense in this thread—simply showing why your arguments don’t suffice to claim understanding in ChatGPT. But of course, the stronger side of my position is that I have a positive argument that demonstrates that there is no understanding in ChatGPT; so basically, I need not go to the trouble of addressing your points at all, but do so for the sake of even debate. You, on the other hand, simply refuse to acknowledge the case for the position you oppose, and exclusively try and shore up your view.
The lookup table argument: a system without understanding that’s capable of passing any test for understanding
Against this, you pointed to a paper arguing for the conclusion that any mental states associated with a simulation of a cognitive agent whose performance equals that of the lookup table, should also be associated with a lookup table:
(Note also that this again begs the question against the ‘randomness generation’-argument: it is, in general, possible that there are capacities ‘detected’ by a behavioral test the system doesn’t really possess.)
I noted that the problem with the paper is that there’s simply no reason to associate any mental states with an entity that’s capable of holding a conversation only for a sharply limited amount of time, as any being with genuine mental states (barring any interventions) can extend their conversation indefinitely:
You completely ignore this, simply restating your (assumed) conclusion:
And then go on to just claim that your argument has been successful:
Even my calling you out on that didn’t produce any reconsideration:
I’ve also tried to make the positive case for there not being mental states associated to a lookup table:
But this once again seems to simply pass you by.
If humans are the same sort of thing as ChatGPT, a behavioral test need not tell us anything but our own limitations
Anyway. I’m going to stop this here, because it’s long past the point where it’s just tedious. And yes, I have probably missed something here or there, but I think the overall thrust is clear. I don’t generally expect my interlocutors to reply to each and every point I make; after all, I don’t, either. But I think to just completely ignore arguments made against one’s position and then continue as if they’ve never been brought forward is difficult to square with a good faith-assumption.
Even so, I normally wouldn’t have bothered with this. Whether people choose to debate on level ground is their business. I just want to test my arguments and refine my points, and ideally, would expect the same of others. Ideally, of course, one would expect people to acknowledge points made against their views, maybe even change them, should they find that they can’t answer those points. But none of us are ideal reasoners, so I just chalk it up to people being people. I certainly don’t conform to this ideal myself. If I did, I probably could’ve just let this go.
I think you raise some good points here. Without getting into the business of re-litigating all the previous arguments right now, I can see and appreciate your frustration. In light of the fact that you were already frustrated by the perception (in many cases, a correct one) that some of your important arguments weren’t being addressed, my out-of-the blue comparison with climate denialism was not just inappropriate, but spectacularly ill-timed and ill-advised. It served no useful purpose, and if I could withdraw it, I would. Once again, please accept my apology.
One thing I will say in my defense is that it isn’t fair and isn’t true to say that I was arguing in bad faith, even if it appears that way to you. That’s basically saying that I’m just trolling and trying to aggravate you for fun. That couldn’t be further from the truth. The actual reality is that there are some issues related to things like computational intelligence that I have deep and strong feelings about, and sometimes that can make me annoying to argue with on those topics. As you say, none of us are perfect. I’ve enjoyed and appreciated our many conversations and learned from many of them, and appreciated your patience and academic knowledge. I hope we can continue to have them.
I mean, how’s that supposed to work, though. You state your position, I point out what I think is a flaw with it, and you continue sticking to it, because you’ve got deep and strong feelings about it. Slice it any way you want, that isn’t arguing in good faith—you’ve just graciously given yourself a free pass over it.
Add to that the fact that in what’s ostensibly framed as an apology, you identify ‘my frustrations’ as the ultimate root cause your climate change denier-comparison was ‘ill-timed’, when the reason I took umbrage at it is just that it’s a cheap rhetorical ploy, and I’m just not seeing it.
I’ve looked over your objections here and I’ll try to briefly address them, but it does seem to me that for the most part (not entirely) that your claim that I’ve just “ignored” your arguments is more a reflection of the fact that you don’t accept my responses, not that I haven’t made them, except for bits and pieces here and there that I may have missed. You’re certainly entitled to claim that you have the better argument and that my responses haven’t persuaded you, but not to claim that I’m just blatantly ignoring your arguments.
This exercise will not take us any further past this apparent impasse, but it will at least show that I’ve made good-faith efforts to address your arguments, except for whatever details I may have missed. With that purpose in mind, I’ll try to be brief.
There’s a lot there, but your primary objection seems to be that Kolmogorov randomness is “exactly the appropriate notion” and that’s “basically what’s being talked about when a mathematician uses the word ‘randomness’ without any modifier”. Actually I did get caught up on other stuff and you’re right that I didn’t return to this particular bit. Let me return to it now.
ISTM that Kolmogorov randomness is not a description of the characteristics of a random series (the distribution). There are many functional definitions of randomness; Wolfram Mathworld, for instance, offers this definition:
A random number is a number chosen as if by chance from some specified distribution such that selection of a large set of these numbers reproduces the underlying distribution … When used without qualification, the word “random” usually means “random with a uniform distribution.” Random Number -- from Wolfram MathWorld
If we agree that some source of number series generation is genuinely random (e.g.- from radioactive decay) – which is one practical definition – the distribution of any finite sequence from such a source could literally be anything. It therefore follows that a similar finite pseudorandom sequence produced by a computer is indistinguishable from it, and as you say, could pass any conceivable test for randomness.
That pertains to a sentence right at the end of a much longer argument, and yes, I missed it. You’re going to have to explain, if you’re so inclined, what “terms of the language match[ing] up to any things in the world” means. Normally I would take that to mean knowledge of semantics, something that Eliza (for instance) obviously did not have, but that ChatGPT clearly does.
Again, lots to unpack there, but the paper I cited was only there as background information. The essence of my response that I think I stated pretty clearly was that a hypothetical lookup table so incredibly and impossibly large that it can incorporate the appropriate responses to any arbitrary sequence of inputs is indistinguishable from any stateful intelligent automaton that behaves in the same way. The claim that such a hypothetical lookup table, which is capable of passing any test for understanding, in fact possesses none, seems like classic question-begging.
As for “The lookup table argument I think has been shown to be a red herring”, it would have been more accurate and more politic to say “I believe it to be a red herring”.
I absolutely did respond to that, by invoking the Turing test scenario, and stating that there is an obvious implication there that the evaluator in the “imitation game” is necessarily and implicitly assumed to be capable of making competent judgments about the behaviours of the human vs the machine. Perhaps I’m missing some fundamental metaphysical point, but I see no value in the argument that behavioral tests for understanding and intelligence are inadequate because the observer may not be competent to make those judgments. We make those judgments all the time.
I know you’re not going to be happy with these responses, but again, my point in re-iterating these arguments, and in taking the time to do it, is to show that I’ve been trying to debate in good faith, and that except for things I may have missed or may not have understood, I’ve made an effort to respond to all your arguments.
Hmmm… I was looking at the fish question (post #893) just now to once again admire ChatGPT’s handiwork. I was taking these questions out of IQ tests and throwing them at ChatGPT without necessarily looking at them very closely. I just realized that this one is something in the nature of a trick question in that it’s actually much simpler than it appears to be. ChatGPT got the right answer, but it took the long way around. There’s no need to set up a system of equations.
The thing that a perceptive individual would recognize about this question is that each of the series of five comparative statements immediately rules out one of the fish. The first one immediately rules out option “A”. The second one rules out B. The third one rules out D. The fourth one rules out E. Bingo! One can immediately see that the lightest fish must be C. The fifth statement is redundant and is probably just there for obfuscation.
The key to doing well on an IQ test is not just getting the right answers, but getting them quickly on the simple questions so you have time to spend on the harder ones and still finish the test on time.
This is completely beside the point for the argument, but just for completeness’ sake:
Since its inception, Martin-Löf randomness has been shown to admit many equivalent characterizations—in terms of compression, randomness tests, and gambling—that bear little outward resemblance to the original definition, but each of which satisfy our intuitive notion of properties that random sequences ought to have: random sequences should be incompressible, they should pass statistical tests for randomness, and it should be difficult to make money betting on them. The existence of these multiple definitions of Martin-Löf randomness, and the stability of these definitions under different models of computation, give evidence that Martin-Löf randomness is a fundamental property of mathematics and not an accident of Martin-Löf’s particular model. The thesis that the definition of Martin-Löf randomness “correctly” captures the intuitive notion of randomness has been called the Martin-Löf–Chaitin Thesis ; it is somewhat similar to the Church–Turing thesis.
This is why I consider it what mathematicians typically have in mind when talking about ‘randomness’. At any rate, as already noted, it’s beside the point: whether it’s the ‘right’ definition of randomness, the fact is that computers can’t produce it, yet a behavioral test may incorrectly judge them to do so. Hence: there are capabilities a behavioral test might indicate a computer to have, which it actually doesn’t. Nevertheless—and that’s really the important part—we can conclusively determine that computers don’t have that capability: we’re not limited to behavioral assessments regarding the capabilities of computers.
Again, whether ChatGPT has any access to the semantic properties of words is exactly what’s at issue; the Newman argument, if correct, shows it doesn’t. I’ve explicated it several times in this thread, and won’t do so again just now, but I might perhaps start a separate thread on the issue.
But I’ve given arguments for that conclusion which—again—you summarily ignore. First, the paper you’re citing yields the conclusion that the lookup table should be associated with the same mental states as its optimized version. Provided that’s true, we have our answer: we have no reason to associate mental states to the optimized program, since it can keep up with a conversation for only a finite, pre-delimited amount of time, unlike any system with mental states, which can prolong it indefinitely. So, accepting that conclusion, we should not attribute mental states to the lookup table.
Second, the act of looking up a string in a lookup table is one that is entirely without any recourse to the string’s semantic properties; all that matters is the ‘shape’ of the string, its formal properties. There is no reason at all to associate any understanding with a system that just enumerates a set of keys, comparing their shape to that of an input string.
That isn’t the argument, though. Rather, I’m arguing for the following dichotomy: either, we can’t faithfully assess whether ChatGPT possesses understanding, because we could be just another chatbot, unable to see beyond our own limitations. Or, we’re not like a chatbot, because we can identify such a difference in understanding; but then, we know the behavioral test is insufficient. In both cases, we shouldn’t rely on the test to assess understanding.
“Traditional IQ tests” cover an extremely broad variety of sorts of questions, some of which ChatGPT is completely incapable of answering, since a lot of them involve things like looking at pictures, drawing pictures, or both (other AIs can do those things, but not ChatGPT specifically)
Note that what most folks think are IQ tests aren’t very representative of real IQ tests, because there are a lot of hucksters out there selling things that claim to be IQ tests but which are completely worthless. If you’ve found an “IQ test” online, it’s almost certainly fake.
As for the amusing mistakes the AI made with the “pound of feathers vs. two pounds of lead” question, I have in fact seen humans make exactly that sort of mistake. In fact, the human tendency to make that sort of mistake is the foundation of a number of other popular puzzles. For instance, “If a hen and a half can lay an egg and a half in a day and a half, how many days does it take six hens to lay six eggs?”: The common answer, six days, is based on disregarding the numbers as numbers and just fitting the question into a familiar pattern. Which is exactly the same thing that ChatGPT did: It fit the question into the familiar pattern of the puzzle (which it has seen many, many times) of the pound of feathers and pound of lead.
I thought you were going in a bit different direction here. I remember little riddles like that as a kid, where the questions were things like, “If a plane crashes on the border of two countries, where do they bury the survivors”, or “If a rooster lays an egg on the top of a house, which side does the egg roll down?”
Why do you think this matters? Suppose I allow for the sake of argument the idea that physical randomness is possible at all (it probably isn’t). So what if one machine produces truly random numbers and another one produces pseudorandom numbers that pass any conceivable external test?
I think I can see now the pattern of this entire discussion. You claim–without evidence–that there are certain underlying attributes to things, such as “randomness” and “understanding”. You show via various arguments that these attributes are undetectable externally.
But these arguments undermine any argument for why these attributes are relevant in any capacity, since if they are undetectable, they cannot have any external effect at all.
What other fairy dust attributes can you dream up? You know, I haven’t mentioned it before, but all of my arguments are imbued with an indelible sense of rightness. You can’t sense this rightness externally, or even perceive it, but nevertheless I have it and you don’t. Sorry, but this means your arguments are automatically wrong in comparison.
Because it’s a ready counterexample to @wolfpup’s claim that to assess the capabilities of a system, we don’t need to look under the hood, and to @Sam_Stone’s claim that we can’t propose any a priori limitations for LLMs because of “emergence”.
Besides the point for this discussion, but there’s a certain amount of irony in coming out with something like that out of the blue in a post where you’re accusing me of claiming things without evidence. If quantum mechanics is right, there is absolutely the production of genuine randomness in physics; indeed, either that, or there will be correlations that allow superluminal signaling (see here or here).
Again, while I don’t know about you, I perfectly well know that I understand things. Take the sentence “dogs are furry, four-legged mammals”. I know what it means: that is, I know what the symbols used in the sentence refer to. Using that knowledge, I can evaluate its valence: it is, in fact, a true sentence, because the thing is refers to out there in the world—dogs—are, in fact, covered with what the word ‘fur’ refers to, and have things denoted by the word ‘legs’ in a quantity denoted by the word ‘four’, and belong to the group of things picked out by the word ‘mammals’.
Now take ‘狗是毛茸茸的四足哺乳动物’. I have no idea what that refers to. I mean, I do: if google is right, it’s just the Chinese translation of the previous sentence. I can’t do anything with it that would require knowing anything about the referents of the symbols used there. I can’t evaluate its truth, or falsity. I could, however, look it up in a database and output the value it refers to; so I could, conceivably, act as if I understood it. But that doesn’t mean I do.
On the contrary, there are huge ramifications. For one, consider the ethical dimension: if ChatGPT were an understanding, or even thinking, sentient or conscious being, then what we’re doing to it would be monstrous, akin to the exhibition of children from ‘primitive’ tribes in a zoo for our amusement. Hence, on that point alone, I’d argue we need a level of scrutiny that goes beyond ‘well, sho’ looks like it, don’cha think’ when assessing the capabilities of artificial systems.
Consider the case of Blake Lemoine: he was soundly ridiculed, and of course let go from his position, for claiming google’s LaMDA had attained sentience. But he was just doing what you advocate: go by the behavioral evidence. If that’s all we get to go by, then his mistake, if he even made any, was merely that his tests weren’t stringent enough—not that he administered them to a system functionally incapable of what he claimed it could do. But this is clearly a sliding boundary: what tests do we consider sufficient?
And along with that, there’s a number of practical considerations. Take Cecil’s latest column: a machine that understands is quite possibly a different order of threat than one that simply parrots statistical word-patterns.
Or, take virtually any problem of communication at all: if it’s true that LLMs can genuinely learn a language just from text input, then we have, effectively, a Star Trek-style universal translator. Upon meeting some alien culture, we each only need to exchange a large enough amount of text (or whatever other data), and will instantly be able to communicate. The plot of half of all first contact-type stories just evaporates, right then and there.
Even more (although this may be more of a personal concern), if something like LLMs produced understanding, then there would be huge philosophical implications. For one, Newman would’ve been wrong in his answer to Russell, and I could get rid of quite a lot of complexity in my model of the mind. Not to mention the simple fact of the emergence of the first (to our knowledge) non-human, non-biological entity capable of understanding.
So the reason I advocate for taking a closer look at what LLMs can or can’t do is because the consequences would be awesome, in either sense of the word.
Woke up in the middle of the night, turned on my tablet, and this latest exchange caught my attention, so here’s my 2¢ …
The problem with this argument is that it contains an element of truth that you then use to claim that an AI like ChatGPT possesses no understanding at all. The reality is the following. When you ask ChatGPT a question like how a quantum computer works, it delivers a bunch of text that more or less answers the question. The impression that the AI actually understands this and therefore possesses intelligence at a post-graduate physics level is of course false. But OTOH, in many of its underlying functions, it necessarily must possess genuine understanding. As a recently cited article notes, in tests like Theory of Mind, back in January of 2022 ChatGPT exhibited the intellect of a seven-year-old child. By November of that year, it had improved to the level of a nine-year-old. It’s not nearly as smart as it pretends to be, but as @Sam_Stone said, some of its behaviour forces us to conclude that there must be some level of concept formation going on.
You’re making a huge leap here from understanding and intelligence to sentience and consciousness. The former two decidedly do not imply the latter two. One could well argue, as indeed many do, that machines have already achieved the first two qualities while arguing that machines will never possess the latter two.
No, there’s a really fundamental difference. Understanding and intelligence are measurable behaviours, a fact that I can assert with confidence because we do, in fact, measure them. Whereas sentience (and consciousness) are exceedingly ill-defined abstractions that humans invented to attempt to describe our internal perceptions of our own minds. They are not qualities that can readily be assessed behaviourally, if indeed they are even objectively measurable qualities at all. We may, at some point in the future, have that debate about machines based on more objective definitions and a preponderance of behavioural evidence, but we’re nowhere near there yet.
A machine that understands versus one that “only acts like it understands” both represent exactly the same level of threat and benefit.
But that’s exactly what ChatGPT does. It’s been trained on a number of different languages and can converse in any of them and seems to be able to readily translate between them. I tried it out on some English idiomatic expressions that have long been cited as classically presenting great difficulties to machine translation because of the lack of semantic and contextual understanding. ChatGPT not only translated them flawlessly, it even explained what they meant.
On that note, I think at least some of the disagreements in these discussions stem from the fundamentally different world views that exist in engineering versus philosophy. Perhaps the most salient question here is not so much which side is “right”, but rather, which world view provides the better predictive value for the future of AI.
No matter how much you want to beg that particular question: no, we have no grounds to think so, and to the contrary, good reasons to think otherwise.
That’s not what I’m saying (although I confess I don’t know how to imagine understanding without awareness, but thankfully, I’m not under the illusion that the limits of my imagination constitute the limits of possibility). But if we limit us to behavioral tests, then the assessment of consciousness, intelligence, and so on, will be likewise performed on the ‘sho’ looks like it’-basis. And that’s just not good enough.
We measure them on the proviso that there is understanding present in the agents where we measure it. That’s an assumption that simply begs the question where we don’t have reason to make it.
Not necessarily (I mean, not that you bothered to even try and shore up your confident assertion with anything as humdrum as an ‘argument’ or something). A system that does understand if I talk to it about shutting it down could use whatever capacities it has to try and prevent that, while a system that doesn’t understand would not have a motive to do so. Yes: in the latter case, we could construct a system that doesn’t understand to act in the same way. But again, difference in understanding can lead to differences in behavior, it’s just that they don’t have to.
Sure: because it’s also been trained on translations between them.
And just out of curiosity, I take it you’ll simply be ignoring me calling you out on ignoring my points in the post you made specifically to show how you’re not ignoring my points?
The point that @Sam_Stone was making was in reference to ChatGPT’s demonstrated problem-solving skills that cannot be accounted for by mere “string matching”. They originate in its neural network and can be tested by “theory of mind” challenges – the results showing that ChatGPT is currently operating at around the intrinsic intellectual level of a nine-year-old. It’s not nearly as smart as it pretends to be because it’s bolstered by an enormous database, but what it’s doing is demonstrably more than rote data retrieval, no matter how much you want to deny it.
You’re running “intelligence, understanding, sentience, and consciousness” together as if they were one monolithic thing. That’s just prima facie absurd – there are at least plausible arguments to made for machine intelligence and demonstrated understanding, whereas no one has even objectively defined what “sentience” or “consciousness” even means, let alone tried to test for it.
No. We measure intelligence the same objective way we measure temperature. The test makes no a priori assumptions about the entity being measured. If this were not true, the Turing test would be meaningless.
What on earth would lead you to that conclusion? Why would you think that a machine that “genuinely understands” a question would suddenly develop animal-like biological preservation instincts? We already have sufficiently advanced AI that we know that many of its behaviours are often not at all human-like, while nevertheless exhibiting human-like intelligence.
It has? It’s been trained on a vast corpus of different languages. I’m not sure what it even means to be “trained on translations”.
It was (and remains) my sincere wish that these discussions remain interesting and civil. If I ever appear to “ignore” any of your points, it’s either because (a) I missed them, (b) my time is finite, or (c) we seem to be going around in circles.
And speaking of ignoring, I’ve seen no response to the item below. Not that I’m going to get all worked up over that.
And I haven’t claimed it’s doing rote data retrieval (besides, I thought that rote data retrieval, by your lights, does likewise suffice for mental states?). It’s predicting the most likely following token through its knowledge of relative frequencies of tokens in a large corpus of texts. Whether that suffices for any understanding—any at all—is exactly the question at issue.
The very part you quote (well, the part you’ve left out) clarifies that I consider them separate things. But they’re alike in that if we limit ourselves to behavioral testing, we’ll have no handle on whether they’re actually present in machines.
That’s controversial, but even if that’s true, whether understanding is the same sort of thing is exactly the question under discussion.
I’m not saying it would, but that it could. That is, it could (trivially) react to its understanding of a sentence in a ways in which something without understanding might not.
Well, its datasets consist of large portions of the web, of wikipedia, and so on, which do contain translated versions of the same text.
Lotta words for ‘yes’.
That’s because I didn’t respond to it. The point isn’t to respond to anything and everything somebody posts, but to engage in honest debate: i.e. engage with the arguments made, and not just pretend that one’s opinion stands uncontested.
Upon meeting some alien culture, we each only need to exchange a large enough amount of text (or whatever other data), and will instantly be able to communicate. The plot of half of all first contact-type stories just evaporates, right then and there.
Well sure, assuming we can “instantly” exchange a “large enough amount of text”, including text already available in both languages, which is what a LLM needs. That seems unlikely to be available at first contact.
Oh, I completely agree. But it’s the contention of several people in this thread that an LLM could just ‘learn’ a language from nothing but text in that language, and thus, presumably, translate between different languages in this way. I don’t think that’s possible.