Just FYI, there’s a long thread on ChatGPT you might be interested in. I’ve tried it from the point of view as an AI scientist to test it, and try to understand its inner workings. It is pretty cool, while somewhat overhyped.
I showed it to my coworkers yesterday in a team meeting and asked it do an example of filling out our weekly reports to make them more interesting (subject being very boring cost reduction activity). The site is down right now, but its response made everyone laugh and then they admitted that there were phrases that it came up with that they actually would incorporate into a weekly report.
It’s equally terrible at poetry. It tries, that it does, but it’s terrible. That said, I had absolutely zero expectation of it being good at that. Even its meagre efforts surprised me.
It’s great – for me – at brainstorming. I’ve fed it bits and pieces of stuff I wrote 20 years ago that were sitting in a notebook somewhere (I’m sure plenty of us have nuggets of creative writing we’ve just abandoned), and it’s been fantastic at helping flesh out characters, giving numerous possible scenerios for where a story should go, to specifically adding fantastical characters I hadn’t even considered, and keeping things cohesive. As long as you interact with it as a buddy you’re bouncing ideas off of, I think it’s nothing short of a magic. There have even been times I’ve fed it a small part of a finished story and asked it how one could proceed, and it had a very good idea of my intent and even played around the same themes and ideas I was generating in the beginning I fed it. Pretty much came to the same ending, though with much plainer and less interesting language and dialogue.
So that’s what I’ve found fun with it. I’ve spent hours just chatting about characters, various plots twists, asking it to try making more absurd suggestions, and it’s been a great catalyst for my imagination. For me, it’s not overhyped in the least. I’m continually astounded and surprised by it.
My wife watches knitting videos on YouTube. In one of them, the host asked ChatGPT to answer several fairly technical questions sent in by viewers. (“How does the material of the needles effect your knitting?”) I was quite impressed by the replies, even if the host pointed out a few mistakes.
If the program can do that on a fairly obscure subject now, imagine what it might do on a more mainstream topic in a few years’ time.
I tried generating rap lyrics on various science topics. They were not good.
For the TL;DR types, Midjourney is an AI art generator.
The quality of its output highly depends on the quality of the input. There are many tricks you can use to improve the output, which people are learning as they go.
For example, I gave it this prompt: “Please write a scene of humorous dialog between a narcissist and a psychopath, who are arguing over what to do with a cat they just rescued.”
Interesting but fairly pedestrian. The way to improve it is to give more details to ChatGPT, which can be done through conversation. So I asked ChatGPT to describe a narcissist, then a psychopath, then I said, “Given what you just told me, write a scene about two people nmed Nora and Pete arguing over a stray kitten.”
The first conversation is very on-the-nose, using pretty much simple definitions of narcissists and Psychopaths. But by getting ChatGPT to first generate a list of features for both, the input becomes much more complex, resulting in better output.
Next I asked ChatGPT to give me a description of an apartment a typical poor person might live in in a big city. It did so. Then I said, “Okay, now rewrite the argument as a scene in such an apartment. write it like you are a great writer.”
ChatGPT can handle 8K of token in its input. That’s roughly 5,000 words or so. The easiest way to generate complex inouts is tomstart a new session, then ask questions that cause ChatGPT to generate lots of details, then feed that in woth your prompt by telling ChatGPT to write its output after considering everything it’s told you.
I was surprised at how judgmental it could be.
I dabbled, and was rather impressed, with it when I fed it reasonable requests (e.g. give a definition of linguistic typology in the form of a play by Samuel Beckett).
But when I asked it to give me a recipe for Mac & Cheese written by a drunk Shakespeare, I got stuck in a loop where it would basically answer variations of “it is unethical to depict real persons as drunk”.
Darn thing has been “too busy” every time I’ve tried to access it?
I couldn’t get on either. My son asked it a question, "Who discovered the [a mathematical construction] and it totally fanned on it. The construction was perhaps a bit obscure, but there was a specific published paper that described it.
Start with, “There is another system.”
There are already about 5 or 6 LLMs out there doing what ChatGPT does.
I just read that it cost about $50 million in computer time to train ChatGPT. That’s a lot of money for a startup, but not a lot for a large tech company that wants to get into the LLM game. And the cost will come down over time, Moore’s law slowdown notwithstanding.
These things are going to be everywhere. And through APIs we will eventually find an LLM lurking behind a whole bunch of stuff you wouldn’t expect.
ChatGPT also announced their paid model - $45 per month if you want API access, guaranteed 24/7 availability and faster output. Otherwise, the free one will remain available but will probably be even harder to get on as they prioritize paid subscribers.
I saw a cool plug-in description - it’s a plugin for a word processor or E-mail, where you can right-click on a sentence and ask for five alternatives if you’re not happy with the one you wrote. You can also just right-click wherever you are and choose , “please finish this for me” and ChatGPT will read what you wrote, figure out style and subject, and finish your E-mail or dpcument section for you.
We are going to see stuff like this everywhere. There are a lot of people out there who are very talented in their field but terrible writers due to dyslexia or other issues, and in the digital age this holds them back. For them, ChatGPT will be a great equalizer in E-mail communications, letters, etc.
I’m pretty sure that was a reference to Colossus: The Forbin Project
And of course, even if it’s expensive to train one of these things, it’s cheap to take one that’s already trained and make a bazillion copies of it. It’d pretty much just be the cost of the hard drive space. A quick Google isn’t revealing just how big the trained model is, but the training data was somewhat less than a terabyte, and the trained model is almost certainly smaller than that.
Geez. And what’s even a terabyte these days? (Granted, that’s text.) I don’t know how many terabytes of hard drives I have lying around. At least 40 or 50. (My current main data drive is 4TB, over 50% full, and I just put it in in August, to replace the previously filled 4TB drive).
ChatGPT has about 175 billion parameters. That’s its size. it would be that size if it read a gigabyte of data or a petabyte. A parameter is just a number in a matrix or a vector. So, a 32-bit parameter would need four bytes of memory, so 700 billion bytes, or 700 gigabytes. My understanding is that the entire model has to be in RAM to be efficient. So it’s not going to be running on a PC.
Smaller models can. Here’s an article describing how to run a 6.7 billion parameter LLM on a PC with 12GB of RAM. This would normally take 54 GB, but the author splits the model to be partly in CPU RAM, partly in GPU RAM, and the rest cached to the hard drive.
I wonder how long it takes to process an input. This stuff is heavily math intensive.
Generally, that’s overkill except during training. 16-bit float formats (either FP16 or BF16) are almost always sufficient for inference. And in many cases, 8 or even 4-bit formats. The latest NVIDIA Hopper chip does 2 petaflops in FP8. Reducing the precision both increases raw performance and the number of weights you can store.
Pytorch has automatic mixed precision (AMP) which allows training in lower precision without zeroing out gradients. But you don’t really care about that when running a model, so yeah, 32 bit is overkill and it’s very common to “quantize” a model to make it efficient for inference.