There is certainly a lot of interest right now in using ChatGPT to write code in current computer languages under the control / direction (tutelage?) of current programmers.
The magic of PCs and mostly Lotus 1-2-3 back in the mid 1980s was to move the ability to harness computer power out of the ivory tower of corporate DP and into the hands of the end user business folks. A lot of truly horrid and bad erroneous code came out of that as well as tremendous good for a great many business users. We all know the arc of that story.
Today a dangerous amount of Corporate & Government America still runs on spreadsheets that Marge or Adam maintain. With all the security, correctness, and scalability problems that necessarily entails.
Interposing ChatCPT effectively between Marge and her metaphorical spreadsheet has the potential to greatly professionalize the results. Call it “DevGPT”. But only if DevGPT can be trained to be professional, competently handle the “ilities” as they say, etc.
The problem of course remains that problem and feature specification is a difficult skill that’s difficult to learn. A DevGPT in the hands of skilled business analysts would be a powerful power tool for good. A DevGPT placed directly in the hands of Marge & Adam will be a risk to corporate and financial life and limb everywhere.
What we need is a BizAnalGPT between Marge / Adam and DevGPT. Yeah, that’s the ticket! Probably not, but it’s funny (and fun) to think about.
Yeah, most people do not think systematically enough to use an AI efficiently, any more than they can use Excel beyond the very basics. Excel has wonderful capabilities, most people cannot learn how to use them effectively.
Then we have the already reported issues with AIs becoming racist, sexist etc. due to their “training.” Which will also likely be trained to aggressively seek out other AIs and “train” them as well.
The underlying thing is that ChatGPT uses Neural Network Programming, also known as Machine Learning. The low-level programming techniques are somewhat complicated to construct, but once in place, they absorb data and incorporate it into their functional processes (the notorious “algorithms”, which are not the same thing as traditional mathematical algorithms).
One of the key aspects to Neural programming is its adaptability. The FaceID system on iPhones is able to recognize your face at various angles, which is quite difficult to do with old-school hard-logic programming but much easier with Neural programming. Self-driving cars can “learn” driving rather than relying on hard coding that has to account for all edge/corner cases.
The things that make LLMs so impressive can extend into other realms of programming. AI can be “taught” what works and what does not and eventually reach the point where the analytical aspect of their work is almost as good as human intelligence, in the regime in which they operate. Over time, the interposition between the user and the code will improve.
Of course, we will still have to be alert for errors, when an AI itself fails to detect edge-case situations, so we still have to have reasonably smart people working with the machines. Which is kind of a problem, as we seem to be continually giving up our own abilities in favor of letting the machines do our work.
The people buying the machines can only justify that expenditure to the degree it produces more product with less labor. In this case the product is software and the labor is software experts.
Less expertise is the inevitable result of applying power tools to a skill or to thinking, rather than to grunt work like trenching.
I did and you did not respond. So, I ran the test myself and GPT failed. I studied the problem and ran the test again using the prompt as a method of programming. GPT passed.
I could have done the same thing with Excel, but the interface would not have been conversational.
That seems like a very strange point to make since a large part of what ChatGPT does is search its repository of training data to guide its responses, and the fact that Microsoft has deployed that exact same LLM technology in the “New Bing” precisely to function as an intelligent search engine. In the same vein, a group of researchers evaluating the performance of GPT-4 on the suite of tests in the US Medical Licensing Exam commented favourably on the potential for its intelligent search capabilities to aid in medical education and clinical decision-making.
Except that it very clearly does solve problems!
It solves logic problems, including problems explicitly designed to test intelligence, as discussed in the long thread in CS.
GPT-4 scored in the 90th percentile on the Uniform Bar Exam
It aced all sections of the SAT, which among other things tests for reading comprehension and math and logic skills, and it scored far higher across the board than the average human.
It did acceptably well on the GRE (Graduate Record Examinations), particularly the verbal and quantitative sections.
It got almost a perfect score on the USA Biology Olympiad Semifinal Exam, a prestigious national science competition.
It easily passed the Advanced Placement (AP) examinations.
It passed the Wharton MBA exam on operations management, which requires the student to make operational decisions from an analysis of business case studies.
On the US Medical Licensing exam, which medical school graduates take prior to starting their residency, GPT-4’s performance was described as “at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations.”
It’s just absurd to say that “it is not for solving problems”.
I’m repeating myself because I’ve said this several times before, but this description based on a simplistic understanding of how it works is both true and extremely misleading. The misleading part is that it implicitly relies on our intuition to conclude that this is all it can do, which is false because entirely new properties and behaviours emerge – often unexpectedly – as the scale of LLM grows. GPT-4 is believed to have over one trillion parameters; each parameter represents in essence a learned piece of knowledge that drives its behaviour and, in particular, allows it to make inferences and create entirely new content.
This is just false, as stated above, and as shown by all the examples given of ChatGPT’s actual performance in creating content and solving problems.
Thanks for the link, I actually made it through all 166 pages. This is the first report I’ve read that details proposed applications. ‘Hmmm’ interesting.
It does strengthen the argument that the prompt is tending toward programming:
“we believe that exploring more advanced prompting techniques, such as thinking step-by-step or employing in-context few-shot approaches, could potentially enhance the model’s performance.”
That may well sometimes be true – and indeed some have even tried to formalize theories around it – but I don’t buy it in the general case. In order for something to meaningfully be called “programming”, it must by definition embody in some way a set of progressive incremental steps to achieve a desired result or behaviour. A good example of prompts being used in this way is the AI-assisted creation of images with a system like Midjourney, but even there, I think “programming” is misleading and I’d describe it more as an interactive man-machine collaboration to produce a specific end result; the AI is essentially being used as a very sophisticated paintbrush.
But there’s no “programming” or “prompt engineering” involved in any of the examples I previously cited that test for skills and knowledge – those are all cases where the “prompt” is basically “here’s a problem; solve it” – followed by the problem statement originally designed for human testing. As with the famous quote about a cigar, sometimes a prompt is just a prompt.
The reference in your quote to “employing in-context few-shot approaches”, this is entirely different. Unless I’m misunderstanding, this would be a reference to a specific type of machine learning paradigm where the AI receives additional training on smaller samples to enhance its performance in some specific area, which in some implementations could be done by the user him/herself. I suppose this could be characterized as a kind of programming if you want to put supervised learning in that category.
Yes, ‘programming’ implies a set of instructions created by humans to carry out a task in a computer.
AI is not ‘programmed’ this way. Instead, an algorithm iterates over terabytes of data trillions of times under a set of rules, and many complex structures emerge out of that which together appear to replicate much of human ability to reason and communicate.
For example, this is a simple equation: f(x) = x2 + c. If you iterate that equation, what pops out is the Mandelbrot set. Another example is Rule 30 of a simple finite state automata. Iterate it, and you find out it’s capable of general computation, among other things.
So we have a transformer architecture, which is a program and was written by humans but is quite small. When iterated over data, it adjusts the weights of billions to trillions of parameters, and over time various abilities start to emerge. Those abilities are contained within the virtual ‘neurons’ of the net, and the weights of their connections.
The resulting structure was NOT programmed, and is almost incomprehensible to humans. In fact, one of the hot topics in AI right now is ‘mechanistic interpretation’ of AI neural nets, to try to figure out how the hell they do what they do. Early results are that they are full of emergent structures, some very similar to what we find in the human brain.
Understand that ‘next word prediction’ or ‘stochastic word prediction’ is just the mechanism by which the AI talks. It is NOT how its brain works. In image AIs the neural net is exactly the same, but communicares through a diffusion framework. In multi=modal AIs like GPT-4V, one neural net can deal with images, video, audio, and language. Same neural net, different input/output architectures.
Saying, “All it does is create coherent, meaningful sentences” is missing a huge excluded middle: That do actually do that, you need something like human level thinkking. You need to understand relationships between objects, how events affect people’s emotions, how humor works, how a black and white CT scan translates into physical structures in the lung, the 3D spatial relationships between things in a 2D image, and pretty much every other kind of reasoning people do when asking similar questions.
In the document I linked, the AI is often just given a photo and asked, “What’s this?” or “Is there a problem here”? The AI is given no direction on how to attack the problem, or even if there is one. And in the ‘counterfactual’ section, the AI is asked about things that don’t exist in the picture at all to see if it can reject the prompt if it doesn’t make sense. It passed all those easily.
Well, kinda sorta. One kind of ‘additional training’ would be fine-tuning, in which the generalized AI is then given domain-specific data. For example, a company might buy a generalized AI, then fine-tune it on corporate manuals, practices, etc. Then the thing can answer those kinds of questions. This kind of data is then permanent in that AI instance just like all its other data.
Then there is the AI ‘context window’, which contains information about the current qustions. You can add data here too, but the context window is generally pretty small. But you can modify prompts here to add more context or to give the AI instructions as to how to solve a problem.
For example, Few-Shot in context learning is used to give an AI trained on generic data a specific example, or several examples of the thing in context. So for example, a generalized AI might know how to count objects, but in specific cases there may be difficulties because of similar objects in the scene or something abnormal the AI doesn’t understand. So by giving it anohter picture of similar objects and saying “This one has 5 of the objects I care about”, you can raise the performance of the AI in counting similar objects.
This is not ‘programming’, it’s simply providing information in context. It works in humans, too.
Other ‘prompt engineering’ tricks include telling the AI to approach the problem step-by-step instead of just trying to infer the final answer. The ‘speedometer’ section of the document I linked, starting on page 20, has a good example of zero-shot, few-shot, and other techniques.
Again, this isn’t programming anything. It’s more like a hint or suggestion to the AI.
I don’t know who you’re exactly addressing here or what point you’re trying to make. How the broader field of AI is “programmed” is a much bigger and more complicated question than how LLMs, specifically, work and what intrinsic limitations they may or may not have. The question of “programming” here that I was talking about was addressed simply to the narrow context that was raised by @Crane of whether the structure of one or more prompts could plausibly be considered to be “programming”, and under what circumstances. Nothing more. You seem anxious to argue against imaginary points that neither I nor anyone else has made.
Which was basically my whole point.
This is not the same as training, though. Besides being small and transient, the intent of context windows is literally to provide continuity of conversational context, not to alter or improve behaviours by (for example) changing parameter weightings or training on additional information. In specific instances it can be extremely useful as a collaborative paradigm, such as for iteratively guiding the AI generation of images, as I already said.
I’m still trying to figure out who you’re arguing with.
I assume that is because you are either not poor or don’t really understand the definition of “economic growth”? I’m (mostly) not trying to be snarky, but a lot of otherwise smart people don’t really understand what “the economy” or “economics” is and just think of it in some abstract sense of Wall Street bankers or corporations making higher profits.
Economic growth is very important if you happen to live in a region where most of the people live in shacks or mud huts without clean running water.
We are not going to be able to ask some AI “fix these problems for us”, economic or otherwise. Where I do see AI creating a benefit is enabling more complex analysis of things like economic or monetary policy or resource allocation which would hopefully lead to more equitable and sustainable economic growth.
The main danger as I see it is corporations or governments using AI in self-serving manner.
The issue here is that leaving it non-aligned, leads to issues that only maintains or preserve biases that are not conductive to a fair deal to many.
Here I have to mention that IMHO; elitists, like Elon Musk, do fear an alignment with what should be the moral thing to do for AIs, because it will make it hard to discriminate as much as they want to.
That particular failing is not an AI issue. That’s an information quality issue. Someone who knew absolutely nothing about the 2020 election and asked Google whether the election had been stolen would quickly discover the truth from reliable information sources, assuming their Google searches had not already been biased by prior history.
There is, however, indisputably a big gap we still have to bridge to enable AI to deliver reliable information and sound judgments and decision-making. Even systems like IBM’s Watson, when trained as clinical advisors, have been shown to sometimes provide dangerously bad medical advice despite a curated dataset and extensive confidence-ranking capabilities. That’s a much bigger problem.
You act like those are separable issues. IMO they are not.
We have real generalized intelligences running around today. About 350M of them in the USA alone. And about half can’t think straight because they can’t get straight input.
AIs will have the same or worse probblem.
Further, the real point of that cite was that the owner of an AI will have in interest in what it learns from where. And in all of history, we see plenty of problems when the landed gentry arrange their tools to improve their lot in life, not the world’s lot in life.
A valid point. I have a tendency to defend the theoretical virtues of AI rather than their potential societal impact. The OP seems to be addressing both – first by claiming that LLMs can’t actually do useful stuff (a technical argument that is plain wrong, as I hope is clear to all by now) and the question of how it could fix some of our major problems (and the corollary question is, how it might it worsen them). In that larger context your point is well taken.
There’s a meta-problem here, though, which is thought to be an even greater issue by some of the alignment researchers: if you align an AI to X, then you inevitably create an AI aligned to not-X embedded within that AI, and it’s very easy to coax that not-X AI to come out. This is sometimes called the Waluigi effect.
Suppose for instance that you can get everyone to agree that we should align AIs to not be racist, and that we have a technique for doing so. But part of understanding what it means to be against racism is understanding what it means to be racist as well. So in a sense, anyone that isn’t racist still has a tiny racist living inside of them so that it can be used as a counterexample of how to behave.
In humans, we have other mechanisms that–at least some of the time–keeps our behavior in the not-racist realm, and the little simulated racist bottled up. But this is more difficult in an AI. And given how easy it is even in humans to flip the switch and give the racist free rein and keep the not-racist bottled up, it’s going to be very difficult to prevent the same thing from happening with the AI.
Musk’s current thinking is to align with curiosity, not things like “don’t kill all humans” (since that’s easy to switch to “kill all humans”). He hasn’t been too specific about it, but my read is that it avoids the problem because anti-curiosity is, while not desirable in an AI, at least safe. You get an AI that doesn’t learn and has no drive to change things. It sits in its box and stays there.
A curious AI should–we hope–not want to kill all humans. For the same reason that humans generally don’t want to kill all animals. They’re interesting in lots of different ways. Humans are pretty bad at it, but we do go through some effort to preserve animal life (in spite of all the things we’re doing to harm them, like habitat destruction). So maybe a curious AI would treat humans the same way (or better, since it wouldn’t make the same dumb mistakes we do when it comes to the environment).
I don’t know if this is possible or if it even makes sense to align with “curiosity.” And the Waluigi effect might not actually be real, although given the ease of getting current LLMs to act racist and such, it does seem that something like it is real, at least.
It will matter a lot how the open AIs play out (not OpenAI; I mean actual open models).
We are seeing a taste of how it works: there are lots of models (LLMs and image diffusion models most specifically) that you can download and run on your own hardware. They work pretty well and can be tuned to your own requirements.
There are some problems. They’re difficult to use, so there’s a barrier to entry for the majority of people. The hardware requirements are high even just to run them, and tuning them takes even more horsepower. Creating your own model from scratch is simply impossible for most.
But… this might be enough to keep the corps and governments in check, at least to an extent. A small number of enthusiastic nerds can eventually produce a good product. Linux is the most widely-shipped OS despite still being in the domain of the nerds. Everyone is using it even if they don’t know the first thing about it.
So if the proprietary models get bad enough–if they keep enough functionality away from the public, or if they are too blatantly self-serving, or whatever–then the open models have a chance to step in.
The hardware will get cheaper over time, and I expect that people will be running ChatGPT-level models on their phones in not too many years. So some of the hardware advantage that the closed models have will go away. They’ll still have lots of other advantages, but it’s not 100% in favor of proprietary.