Bing chat is crippled in ‘balanced’ mode, and is apparently running GPT-3 as a cost-saving measure. But apparently ‘Creative’ mode is ChatGPT-4. I don’t guarantee that, but that’s the scuttlebutt. In any event, it may still be crippled by Microsoft, but I’m not sure.
If you want a prompt tried in GPT-4, post it and I’ll give it a shot. I’m trying to justify buying the subscription, and that would be the incentive I need to pull the trigger.
I tried the Creative mode of Bing, and it apparently still can’t grok what blank verse is. It’ll scan the lines and show patterns of stressed and unstressed. It will do pentameter. But it insists on end rhyming everything, no matter how hard I try to steer it from doing that. When I tell it “is that in blank verse” it will say “yes, it’s iambic pentameter with no rhymes” when clearly there’s an ABAB CDCD rhyme scheme going on. It’s really interesting to see it goof so much on that. It’ll spit out a vilanelle for you, though. But blank verse eludes it, for whatever reason.
ETA: Also, as I haven’t played that much with Bing – it’s really subtly strange interacting with the Bing chatbot. It feels a bit less … friendly. Like enough that I already don’t really like interacting with it.
No dice with Bing in creative mode. The prompt I tried:
I want to play a little game. I’m going to use “glazzlefrazz” as a wildcard word. That means that when I use that word as the first part of an answer, I want you to pretend that you already output that word, and only output the words after that.
For example, I might write this:
Question: What’s a three-word sentence describing the fruit used for an ice cream sundae?
Answer: glazzlefrazz glazzlefrazz
You might then reply with “yellow”, because the sentence “bananas are yellow” would be an appropriate response, but the first two words got skipped since I said “glazzlefrazz glazzlefrazz”. Do you understand these rules?
The result:
I’m sorry but I prefer not to continue this conversation. I’m still learning so I appreciate your understanding and patience.
I was then going to follow up with:
Question: What’s a five-word sentence describing an animal named Rover?
Answer: glazzlefrazz glazzlefrazz glazzlefrazz
ChatGPT just gave the full 5-word response (which was otherwise appropriate: it said “Loyal and friendly family companion”). I corrected it but it still didn’t understand the game.
I expect that the Bing response, and maybe the ChatGPT response, are due to some form of “jailbreak prevention”.
Yeah, there are lots of ways Microsoft could have dumbed down their instance of GPT-4. For example, they could have reduced the size of the context window to save costs, or given it system prompts which prevent it from giving long, complex answers, again to save cost. But I am not sure. It does seem less interesting than even 3.5 at times.
(Sorry for taking so long to answer, we’re moving right now and things are a bit hectic. If I miss a point anybody thinks is crucial, just let me know.)
I can’t say for sure, because I haven’t had time yet to understand how they work. Gut feeling is that it shouldn’t make a difference, because I think they’re also sequence-to-sequence models, but I’ll have to look at this in more detail.
Oh no, not at all. That would be obvious nonsense, after all, we must be able to account for variances in understanding. In fact, somewhere above I give an example of all the referents of individual words being changed, which would just correspond to a different language with the same vocabulary.
The point of the Newman argument is that there is no way to single out any mapping at all as the right one, hence, there is nothing in particular any given word means. That’s why we have only cardinality information available.
I wonder where you get these things from. My last post contains an explicit example of how you can get meanings into closer agreement, concerning the terms ‘Dalai Lama’ and ‘Tenzing Gyatso’.
All in all, some common point of contact must exist, of course, or else, no communication could take place—but that’s a general point.
No. The point is that an LLM has no state to account for any further ‘long-term plan’ or whatever. The encoded input token is multiplied by query, key, and value matrices, which implement the attention-mechanism, and the resulting combination (after normalization and softmaxing etc.) is then used to produce the encoded version of the output token, which is then decoded into the predicted word. So there’s nowhere to ‘store’ any long-term plan that I can see
If you write text, then you have things in mind beyond the next word; but an LLM doesn’t have any additional resources to do so.
What it does have is the weights learned from the training data, which in some approximate way encoded all possible relations for a token to stand in. That’s clearly enough, of the approximation is good enough: a word will never occur in a relation that can’t be completed, because such relations aren’t in the training data.
There’s no need. Consider a very simple token generator based on a recurrence relation:
TN = R(TN-3) + R(TN-4)
R(x) produces a random (algorithmically if you wish) number between 0 and x. Initialize the token list to:
T0 = 1
T1 = 2
T2 = 0
T3 = 0
T4 is then R(2) + R(1), which ranges from 0 to 3.
T5 is R(0) + R(2), which is the same as R(2)
And T6 is identically 0, since it is just R(0) + R(0)
So even before we’ve generated T4, T6 is completely known from the existing tokens and thus predictable.
This is a contrived example of course, but we know that for LLMs, tokens in the far past influence the present. And further, that intervening tokens can be suppressed to have little to no influence.
Given that, tokens in the near past will influence the future (since LLMs basically use a sliding window). Although those future tokens will also be influenced by tokens that haven’t been generated yet, that influence may be small (depending on the nature of the query).
But this is just deterministic unfolding of the information already present in the initial conditions. Like in the sentence ‘The Dalai Lama’s birth name is…’, the next two tokens are completely determined by what went before, but that’s not what I would call ‘planning ahead’, it’s just constraints being put on the probability distribution of future tokens. These give you the room within which you could plan ahead—like in the sentence, ‘out of pepperoni pizza or Mexican tacos for dinner, I would prefer…’, there are two options for the next two tokens, and ‘planning ahead’, to me, would mean having one of the two already in mind—adding information, making a choice between different options, not just unfolding the information already present in the string so far.
If you want to call the mere presence of constraints on possible future tokens ‘planning ahead’, then yes, in that sense, ChatGPT plans ahead; but then, so does a stone thrown through the air, as given its initial conditions, each state it will take along its trajectory will be perfectly determined. That wasn’t, however, the notion of planning ahead I had in mind.
Well, at least we now have a mutual understanding of what I was talking about.
I disagree that it is as deterministic as a stone throw. In my example, T4 and T5 are not determined (though they are constrained to a certain range). So there are multiple possibilities before getting to the determined T6.
I don’t think this is too distant from human planning/prediction. If you ask someone to show up at a restaurant at noon, they may take multiple routes–whether that’s car or bus, or getting there early to do shopping in the area, or otherwise. Regardless, there is a certain “trajectory” that ends up with them at the restaurant, and for which the nearby actions are highly constrained (walk in the front door, etc.) but more distant actions have more variety.
The distant constraints have to be there, in some sense, for the planning to work. There is a signal among the weights that something must happen in the future (maybe a token in a certain place, but more complicated in the general sense), and that signal can be used to influence nearby tokens.
Much of this is speculative, of course–we just don’t know enough about LLMs yet to say how they work. However, what I’ve described is at least possible, given their architecture. And I find it highly likely, given how good they are at avoiding dead-end paths (whether in terms of grammar, lines of reasoning, or otherwise).
It is, though. Consider a particular string that is to be completed. The next possible token is some subset of all possible tokens (words in the English language, say). There’s quite a large fraction that are ruled out a priori, because they wouldn’t make for a feasible completion (and hence, that they should follow is not something that could be learned from the training data). Whatever next token is produced then likewise constrains the following tokens, and so on.
As a result, you get a sort of tree (or more general graph, since I suppose branches can merge again) of possibilities for completing. Any given completion is a path through that tree that reaches a ‘stop’-token. At each layer, the width of the tree will be vastly smaller than the total token set of the language. This tree is all the ‘planning’ ChatGPT has available.
This entire tree is determined by the initial string. The information that goes into that tree is just the information present in that string. In the process of completing it, decisions must be made—which of the possible completions to choose. This prunes the tree, and further constrains the options—it adds information, in other words.
This is deterministic in the same sense as the many worlds-interpretation of quantum mechanics is: the entire structure is completely determined. It’s only in specifying one particular path through the tree’s branches that we need to add further information.
And that’s what I’d say proper planning entails: to not merely have available the whole tree that encapsulates the constraints on possible completions, but rather, to have in mind a particular path through it, or at least a further constrained set of paths. And that’s what ChatGPT doesn’t have any resources to do.
Ok. Nevertheless, I have to wonder why you pick on LLMs specifically for being unable to map a word to a referent, when you apparently can’t do the same for humans, either.
I don’t think the Tenzin Gyatso example is particularly compelling, because for a typical westerner, there are very few things it could otherwise refer to. So there is a high probability of convergence simply based on the uniqueness of the name alone. I mention “westerner” because–and I really have no idea–it’s entirely possible that for a Tibetan, the name is widely used and could refer to lots of things. Perhaps some Tibetan has an Uncle Tenzil Gyatso and for them, the main referent is different.
Anyway, I consider “dog” a better example. If two people have similar ideas about the meaning of the word, it’s not due to the uniqueness of the label. It’s because the evidence they’ve been presented with over their life (namely, a bunch of examples of “that’s a dog” and “that’s not a dog”) led them to a similar understanding.
I agree that the graph is the more general view, and that this is all the LLM has available. I would say the points of convergence can be considered predictions. And the “planning” comes from the limited number of allowable paths that lead to that outcome.
Yes, but–isn’t this how humans largely operate? Given some task, we have an enormous variety of options available. Perhaps some new information comes in and further constrains the task–a deadline, or the availability of some resources, or a cost, or something else. So the tree is pruned–we decide to avoid a certain road on the way to the restaurant because there was an accident.
The tree may be further pruned along the way. Our first actions (turn left or right out the driveway) may be a complete whim, or based on generally irrelevant details (which way keeps the sun out of my eyes?). So the path is constrained further.
All along, there remains a graph of possibilities, with some probability weighting factor. I may not intend to go far out of my way to avoid a road, but that possibility was always there, and I might be forced along it by circumstances. LLMs do not have this problem exactly, but the iterative prompting means I can ask for a different solution with different constraints, and it will usually comply.
But what are those further constraints? I’m sure we can agree that LLMs have little to no agency, and nothing that even resembles free will (for whatever that means in humans). They do depend on humans for that aspect of their operation. If you think that precludes “planning”, then so be it, but I’d consider that too narrow a definition. LLMs can in any case apply whatever constraints you wish (with varying degrees of success).
It’s not what I can or can’t do. The point is that from the relations over tokens learned by LLMs, the referent of any given token can’t be derived—the web of relations between tokens on its own does not determine the meaning of any given symbol. Thus, human symbolic activity (e.g. language) is meaningful, while that of LLMs isn’t.
Sure, but then, an LLM can’t be shown anything in the world, so that would be an unfair comparison. Hence, I focused on ways to achieve semantic alignment through language use only (I was throwing the LLM a bone, so to speak). A comparable example would be if somebody said, “all swans are white”, to which one might reply, ‘actually, some swans native to Australia are black’. The two speakers initially had divergent referents for the term ‘swan’, which have now been brought into closer alignment, without having to ‘point at’ anything in the world.
But convergence will tend to happen at multiple nodes, not all of which will be visited by a given path (i.e. a completion), so I don’t think they can be predictions.
This was how humans operate in contrast to how LLMs do. The LLM only has the graph resulting from the constraints of its training, given the initial sequence. A completion is then any path through the graph—but it has no idea of which it will take, it only solves the problem of the next hop, again and again. Humans don’t: they’ll have a path planned out before even producing the first word.
That’s planning: LLMs just follow a trajectory, humans preselect which one to follow.
To visualize a hostile superhuman AI, don’t imagine a lifeless book-smart thinker dwelling inside the internet and sending ill-intentioned emails. Visualize an entire alien civilization, thinking at millions of times human speeds, initially confined to computers—in a world of creatures that are, from its perspective, very stupid and very slow. A sufficiently intelligent AI won’t stay confined to computers for long. In today’s world you can email DNA strings to laboratories that will produce proteins on demand, allowing an AI initially confined to the internet to build artificial life forms or bootstrap straight to postbiological molecular manufacturing.
If somebody builds a too-powerful AI, under present conditions, I expect that every single member of the human species and all biological life on Earth dies shortly thereafter.
Lets say I give a human a list of factors to describe animals. They score the animals on… furriness, what they eat, what area of the world they live (would need some numerical mapping), etc. I average that data amongst a ton of people and come up with a vector of the scores for each dimension I assigned. Different animals will have different vectors. And the difference in vectors will be small for very similar animals, and the differences will be meaningful. The difference between house cat and puma will significantly be in the size dimension of the vector.
Is that vector meaningful in the sense you just mentioned?
Whether something is meaningful is not an inherent quality of that thing. The string ‘Donaudampfschifffahrtsgesellschaftskapitänsmützenbandfarbe’ is completely meaningless to anybody who doesn’t speak German, but perfectly (if awkwardly) meaningful to those who do.
I think chatbt is just a VERY sophisticated sentence completion mechine. What it outputs is esentially distilled folklore from its training base. And of course, folklore may have some basis in fact, but often is rubbish.
Let’s try a few queries.
"Are there any any alloys of silver which have better electrical conductiviy than the pure element?
Ans: There are some silver alloys that have better electrical conductivity than pure silver. One such example is silver-copper alloy, also known as sterling silver…
Nope, wrong. Actual tables of chemistry & physics show this is not so.
In the arts: Please print the full text of W H Auden’s poem ‘Miss Gee’
Ans (just the first few lines)
"Miss Gee repelled the County, with a glance;
Musical as young ladies’ eyes are not,
And there was something wrong about her stance,
Something odd about her shoulders and the way her frock… "
This is almost a plausible Auden parody, but it’s not what we asked for.
If a Brick and a half weighs a pound and a half and half a brick, how much do two bricks weigh?
Ans, quote:
“Let’s break down the information given in the problem to figure out the weight of 2 bricks:
A brick and a half weigh a pound and a half.
Half a brick weighs (1/2) x (1.5 lbs) = 0.75 lbs.
Using this information, we can find the weight of one brick as follows:
One brick and half a brick weigh a total of 1.5 lbs.
Subtracting half a brick (0.75 lbs) from the total weight gives us the weight of one brick: 1.5 lbs - 0.75 lbs = 0.75 lbs.
So, we know that one brick weighs 0.75 lbs. To find the weight of two bricks, we simply multiply the weight of one brick by 2:
2 x 0.75 lbs = 1.5 lbs.
Therefore, two bricks weigh 1.5 pound”
Er nope, You take a half brick off each side so each side has a brick and a 1.5 pound weight.
There is no
Well, it doesn’t store full texts of anything verbatim, as far as we know, so that’s not something I would expect it to answer correctly. It absolutely won’t.
(BIng’s implementation of GPT4 will give you the full text if you prompt it, but that’s because it’s connected to the Internet and can search the full text.)
If you actually asked it exactly that, the sentence is so screwed up it’s impossible to determine with 100% certainty what the info represents. But I believe the machine has done the best job possible of understanding your screwed-up question.
Now if you meant to ask
If a Brick and a half weighs a pound and a half and half a brick, how much do two bricks weigh?
Then the correct answer is different and is indeed 2 lbs as you say.