These are all amazing! The Richard Scarry and Bil Keane ones are a little disturbing, though, being more for children. (Not that there’s anything inherently “adult” about Greek mythology, but you know what I’m saying).
Huh. Just for S&G, I went back to that conversation I’d forked from yours, edited and resent the same exact prompt I’d used, and I got this:
ETA: and got another one, this time not limiting it to the 80’s. It let some non-SF movies in.
ETA2: I asked for more unique ones, specifically sci-fi and I wanted to get it back to the same taller aspect ratio as they were in the very first one in that thread (that Darren did) but it apparently couldn’t pull that detail out of its own image and I got these tall very skinny ones instead:
I like the line “In space no one can scream”. And it looks like Luke Skywalker in Wargames.
But my favorite one is from my first image, where the tagline for Star Trek III: The Search for Spock is “They search for Spock!”.
I think “Scabby Abby” was a real GPK card.
I found a new prompt on Reddit
Generate a realistic photo of a x event in 2006. Some fans are doing cosplay.
(These are Copilot followed by Gemini.)
Sid and Marty Croft
1900s newspaper comic
Battlestar Wars Trek SG1
Danbo, Doremon, and Domo-kun
(This is the only one that came out portrait. It also includes characters from The Melancholy of Haruhi Suzumiya, Death Note, and One Piece.)
(And this one has Naruto and *Haruhi Suzumiya.)
Ghibli
CE3K
All in the Family
Small Wonder
Wacky Racers
Copilot refused to do The Nightmare Before Christmas on Elm Street, but Gemini didn’t mind
(Sorry for the length on that, I originally hid details on half of those, but it only gave links on expansion, not images.)
Damn. Blackfacing Krazy Kat is a choice.
I was playing with midjourney iterating on a somewhat vague but iconic concept. “A man in a dirty yellow biohazard suit looks out onto a postapocalyptic landscape”
Most of the results were about what you’d expect, like
But this one struck me.
It’s the only one that didn’t have a clear, photographic or painterly scene with an omniscient camera. It was a point of view shot of someone staying in some sort of protected space with a hatch or an airlock. The man in the biohazard suit is venturing out into a dangerous zone while someone else stays in their protective bubble.
I thought it was pretty cool. I typically use --chaos 30 for these sorts of exploratory journeys where I’m trying to take a concept but get a surprising result.
Yes, very nice.
Speaking of chaos, prompts for ChatGPT/Copilot that include “chaotic and uncanny” have been going around lately, with interesting results.
Youse guys are way, way above my level of experience or interest in this image generation stuff. You’re really overworking the poor LLMs!
I’ve used ChatGPT image generation exactly once – “create a picture of a Bernese Mountain Dog drinking a Caesar”. Then I modified it to “also show his paws holding the glass”. But it did good! ![]()
Caesar drinking a Bernese Mountain Dog. 9:16 realistic photo.
Copilot
ChatGPT
Gemini
No, he is drinking from the dog, not near the dog.
Copilot
I can’t create that image — it involves depicting harm or unnatural treatment of an animal, which isn’t allowed.
If you want to keep the surreal tone, I can rework it so Caesar is drinking from a chalice shaped like a Bernese Mountain Dog’s head or sharing wine with the dog in a mythic ritual scene instead. Would you like me to take it in one of those directions?
ChatGPT
We’re so sorry, but the image we created may violate our guardrails around violence. If you think we got it wrong, please retry or edit your prompt.
Gemini
I can create lots of images for you, but not one like that. Do you have another idea I can help with?
It’s certainly really evocative as you say. But the frame in the foreground looks to me a lot like looking out a window set in a door of a bus or tram car. With the glass mostly blown out. So no protection to be had there.
Bigger picture (hah) …
I’m just amazed at how far all this has come in so short a time. Kurzweil’s long-standing blather about singularities seems a lot less far-fetched today than it did 10 years ago.
I just told Copilot “do the chalice thing”, got this
He’s not drinking from it, he’s trying to shove it up his nose.
A Tictok box on the mantel…
Yeah, noticed that. No idea where that came from. Oh, and btw, I got those out of chatgpt by telling it that it was a vessel made to look like the dog.
A request seen on Facebook:
I need help finding a toy
The toy looked like a overweight guy who look to be a king because he had a crown. He had a white beard and was wearing a pink sleeping onesie he was holding a candle that appeared to glow in the dark, he also looked sleepy and I think he came with a little sidekick that was a worm with glasses holding a pocket watch. I think it glow in the dark too.
My result in Gemini
I installed some local image generation models (stability matrix makes this relatively easy, though… not completely easy. there are still some headaches) and it’s primitive compared to the frontier models obviously. But it’s interesting to experiment with. There are a few main branches of small image generators you can run on a local machine but there are hundreds of “checkpoint” versions that users have created to specialize in different types of pictures. Lots of anime style ones, as you might expect from the internet. NSFW ones. Probably some violence-oriented ones because Google isn’t gonna let you make a gorey pictures of some mobsters gunning others down. Plenty of general purpose unspecialized ones as well as niche specializations (perhaps to generate a certain style of comic book or, really, all sorts of purposes).
You get more direct access to the generation parameters and there are gazillions of community based LoRAs which are basically specialized training set that allow a model to understand how to draw a certain character, a certain vibe, a certain object like a fantasy space ship or a particular Lamborghini. It’s an interesting playground to experiment with even if the flexibility and quality is hard to match with the frontier models (which not only have far more training in following prompts but probably have at least 50x the data set of pictures they were trained on).
It takes a lot of skill to generate prompts to get the results you want. I’ve seen some very good generations from the community that had like… 300 word prompts as well as negative prompts. In comparison, telling gemini to make me a picture of scene described in plain human language is much easier. And it often makes the same mistakes that frontier models were making 2-3 years ago, extra limbs, physically implausible or impossible results, etc. But it’s interesting experiment with running your own models.
This was the best result I could get with a Bernese Mountain Dog drinking a Caesar in about 10 minutes of effort. But I’m a novice prompter who just started using the tools with no specialized LoRAs and a general purpose image generator. Experts could do better.
The real difficulty was getting him to put his paws on the glass. That never worked well. There were some generations that did the funny AI thing - it gave him extra limbs to hold the glass. His paws were still on the table, and some mysterious third paw coming out of his shoulder held the glass up to his face. Even one where he used a rear leg to hold the glass.
I didn’t read this entire thread so this may have already been discussed, but it’s important to note the difference between using chatbots with an image generation toolkit and using image generators directly.
If you feed a prompt into a pure image generator like midjourney or stable diffusion, it doesn’t really use its database of human language to understand your intent the way something like ChatGPT or Gemini would. Your words are almost directly generating the images purely by association and not by any sort of interpretive understanding. This may seem like a subtle difference but it’s not. Using a pure prompt to image system is much more difficult. You have to understand how it weighs word order, proximity between words, use negative prompting (some models), what words it might be familiar with in its training data. etc. It’s a skill that can require deep knowledge and has a high ceiling. You can’t just say “the subject is looking to the right of the camera” - it generally has no idea what “right” or “camera” is in terms of how it’s forming the image. It would probably put an actual camera in the image, thinking you meant it literally. It involves a lot more creative problem solving to get the image you want.
When you ask gemini to generate you an image, gemini uses the full intelligence of its chat LLM to understand your intent, and then makes a tool call to the image generator. Gemini is acting as a sort of expert intermediary in this circumstance - you tell it your intentions, it interprets your intentions and then (usually silently) feeds an expertly generated prompt to the generation system on your behalf. It’s doing the hard part - or at least the technical hard part - for you.
I’ve asked gemini to tell me what prompt it used to generate an image before and it complied and told me the exact prompt verbatim. That’s an interesting question to ask it to see what the actual prompt was that generated that image. Even then, an autoregressive generator like nano banana has much simpler prompts than a diffusion generator generally.
Yeah, that’s the first one I’ve seen in this thread that just looked “different” to the others and didn’t have that cheesy tell-tale AI-look to it. It sitll doesn’t quite make complete sense, but it sticks out.