AI image generation is getting crazy good

Well, they’re all copies of the same virus, you could tell it each one shows the same set of numbers toward the viewer :slight_smile: Or if you’re really lazy, ask ChatGPT to use its python interpreter to come up with lists of unique numbers for you.

I wonder if you could get it to not restrict itself to whole numbers? You rolled a 7.283…

Here’s a few more (from a friend)

Nice. I hadn’t heard of that trend yet.

You can do the opposite, too. Turn a person into an animal, just like you can turn them into a cartoon.

Or give it a picture of 2 people and it will create a “baby” (or just ask it to morph the people into one person)

Lucas_jackson mentioned there being small details that blew him away. Yesterday I tried creating Discworld characters with text only, no uploaded reference images. This is the image of Rincewind. If you zoom in, on the right shoulder (screen left) of his robe there looks to be a tear in the stitching of the arm of the robe, which has been crudely repaired with the same yellow thread used to make the lettering on the hat. Tiny detail that I never asked for that impresses the hell out of me that it is there.

For that image I actually asked ChatGPT to write a prompt describing Rincewind and The Luggage and then highly edited that:

Summary

A thin, scruffy wizard in his mid-30s with a long nose and a perpetually terrified expression—Rincewind, wearing a battered red wizard’s robe and a tall, floppy, threadbare red wizard’s hat crudely embroidered with yellow stars and the word ‘WIZZARD’ in crooked letters. He has an unkempt beard, wild eyes, and clutches a rolled ancient yellowed and stained vellum scroll. He is in mid-stride, running away from something with every ounce of his strength. Running beside him is The Luggage—a large wooden treasure chest with a curved top and metal bands. The Luggage is running on hundreds of tiny, human-like feet. Its lid is slightly open, revealing a hint of a predator’s teeth and a large, thick tongue. Rincewind and The Luggage are running down an ancient cobblestone path winding through the ivy and moss covered stone blocks that are all that remains of an ancient city that is being reclaimed by a menacing forest. Widescreen photorealistic Kodachrome dslr profile view photo.

I wanted The Luggage to have pink human feet, but what I got probably works better.

Then there is Tiffany Aching. I used a kappa for the river monster scene and prompts entirely of my own based on Tiffany’s description from the opening pages of Wee Free Men. In general I’m extremely happy with the results, but ChatGPT can’t be convinced to consistently make Nac Mac Feegles blue (except in the first result I got which came out drawing-like). Even when blue I didn’t think they looked sufficiently Feegle-like, so I removed them.

One prompt from those:

Summary

Tiffany Aching standing on the bank of a river gripping an iron frying pan like a club. Tiffany is 9 years old with brown hair and brown eyes. She is wearing an old but clean long, milky-blue dress and heavy brown leather boots. She is confronting a kappa rising from the river.
The image is an over-the-shoulder shot from behind the kappa, looking slightly upwards from the river to Tiffany on the bank above, giving the kappa a determined glare. The kappa is facing away from the camera so what we see is the back of it’s head and it’s back. Portrait detailed photorealistic dslr photo with forced perspective.

That side-by-side pic made me wonder. What’s the current state of generating 3-d (cross-eye, e.g.) art with these AIs?

Probably not good enough yet, but there are AIs in early versions that attempt to create 3D models from text or reference images. If it can create a full 3D environment, it should have no trouble showing it from more than one angle.

Turning someone into a centaur… doesn’t work so great.

Better?

Yeah, I did actually do a “okay, change it this way” or two to get better proportioned results. I’ve noticed that ChatGPT will do weird things to characters to make them fit into the frame rather than pull back the camera some to show the whole character(s). For instance when I was playing with the blobfish farmer, one image had him kneeling to pick blobfish, and another had him buried partially in the ground. And one yesterday of the Three Amigos tiny and being confronted by a praying mantis had them partially buried, too.

The tiny Amigos led me to wanting to do something with characters from Honey I Shrunk the Kids. That centaur is supposed to be the blonde girl from that movie. Even used a reference photo. Not a very good likeness.

An Amigos fail

So I made them sit

I told it to change the centaur to Kelly Bundy and it told me it couldn’t do fictional characters, so I told it to do a centaur that was cosplaying as Kelly Bundy. :roll_eyes:

Or it does unwanted things to their surroundings. For the life of me I could not get it to show a child standing in a cabinet big enough for an adult to stand inside it without doing tricks like this:

And that’s still not really tall enough.

Here’s an image from yesterday that I rate a 9.5 out of 10. Ripley and the power suit aren’t perfectly accurate, but that is made up for by it understanding the Bog of Eternal Stench.

Ludo from Labyrinth standing in the bog of eternal stench. Behind him Ellen Ripley is approaching operating a P–5000 Powered Work Loader. 9:16 Kodachrome dslr photo with shallow dof and forced perspective.

Oh, before Kelly Bundy I asked ChatGPT to change the centaur to give it medusa hair except that instead of snake heads each neck ended in the head of a cast member of Friends. It said no.

Try that in Sora. I’ve seen the Friends cast in their public feed.

I gave it a try. Plus, it was one of those “pick which image you like better” results, so a free bonus. The images aren’t perfect, but I’m still impressed that it tried and got that close.

That’s the Kowloon Walled City in the background. I thought about trying for that earlier today.

Spending some time today getting down with OPP (Other People’s Prompts). They range from the incredibly complex to the incredibly simple.

(None of these are mine)

Some of them seem to be some sort of genereated code

Summary

CompositionalPortrait(2,
Style(2: “editorial rotcore”, “mascot nostalgia”, “1990s disposable flash zine”),
Subject(3: “Biggie Smalls as Grimace (no mascot head, oversized Grimace suit)”,
“Laura Prepon as Alt Wendy (pigtails, blue striped dress, thigh-high socks)”,
“Ice Spice as Female Ronald McDonald (shredded bodysuit, smeared makeup, barefoot)”),
MadeOutOf(“vintage birthday party remnants, soda-stained plastic, decayed mascot fur”),
Arrangement(“Biggie in cracked decaying birthday throne, Wendy across his lap, Ronald draped on Biggie’s shoulder”),
Accessories(“Barbie paper plate with cake, cracked tiara, broken balloon, discarded Grimace head”),
Background(“faded birthday banner (“Happy 8th Birthday icky”, with just a faint outline of an “R”), dilapidated ball-pit, broken arcade cabinet”),
Lighting(“handheld camera flash, blown-out highlights, visible vignette, haze”),
OutputStyle(“high-resolution photo with grain, eerie warmth, lo-fi editorial finish, a very small barely visible faintly written “omni72” watermark handwritten in lower right corner of the image”)
)
// label: 49e4abfc20e8ae94
// tags: grimace, birthday-party, mascot-core, editorial, rotcore, lo-fi, laura-prepon, ice-spice, biggie-smalls, surreal-photo

Or an organized checklist

Summary

:clapper_board: Cinematic Prompt Breakdown

Scene Description:
A gritty late-night skate session under flickering gas station fluorescents. Ian McKellen as Gandalf from Lord of the Rings is captured mid-trick, low to the ground, limbs blurred in motion, framed in a chaotic, almost voyeuristic style—like something ripped from a bootleg VHS tape or a 2000s skate zine.

:camera: Camera Specs & Setup

Camera: Canon EOS 300D or early digital SLR equivalent (simulated low fidelity)

Sensor Mode: APS-C with grain overlay

Lens: 8mm fisheye, wide-open aperture (f/2.8), creating heavy edge distortion and chromatic aberration

Shutter Speed: 1/30 sec — enough to introduce directional motion blur on fast limbs

ISO: 1600+ — visible grain and highlight bloom from light sources

Lighting: Harsh disposable flash — overexposed skin tones, deep shadows, dusty lens flare

Focus Mode: Manual pre-focus at fixed distance, soft on edges

:movie_camera: Motion & Pacing

Motion Type: Sudden, erratic — skater bursts into the frame and explodes upward

Camera Movement: Slight handheld shake, simulating a handheld camcorder or static tripod bumped mid-shot

Frame Rate: 24 fps — filmic stutter, emphasizing each blur frame

Editing Style: Quick, lo-fi cuts, possibly intercut with B-roll of gas station signs or slow zooms on grainy surfaces

Color Grade: Desaturated greens and yellows, crushed blacks, slight magenta shift from cheap lighting

:film_frames: Overall Feel

Think early Baker skate videos, Harmony Korine aesthetics, or Supreme VHS ads. Bootleg energy. Feels like you weren’t supposed to be filming there.

Or a batch command

Summary

Candid waist-up iPhone capture of a young woman with tousled mid-length hair, paparazzi-style framing with tilted composition and flash glare, amateur aesthetic featuring pixelated texture and uneven focus – alternating scenes: [1] Nighttime cafe window reflection with rainy glass distortion [2] Crowded metro station with motion-blurred commers behind [3] Overgrown alleyway lit by flickering neon signs [4] Sunny park bench with blown-out sunlight through tree leaves [5] Concert exit chaos with streaked car headlights [6] Laundromat interior with fluorescent tube lighting casting green hues

(Single random scene used per generation)

You can use detailed descriptions to get great signs

Summary

low quality photo of a worn retro-futuristic workplace safety poster hanging on the wall of an abandoned hallway. Bold 1960s-style design with cheerful colors. At the center, a smiling, menacing AI security camera with glowing red eyes watches the workplace, two terrified lab techs (one male and one female) nervously sweat while looking at their AI coworkers. Slogan in cheerful block letters: ‘Don’t want to be replaced by AI? PROVE IT!’ Fine print reads: ‘GPT-Bot can outperform 99.5% of human employees. GPT-Bot never asks for a raise or forms a union. GPT-Bot never complains about non-ideal workplace conditions.’ Symmetrical knotted hexagon logo in corner. Smiling stick figure next to the small words: ‘We’re Always Improving! Are You?’

Or surprisingly simple ones

Summary

A magazine toothpaste ad - Colgate ‘sweet n sour’ Toothpaste. Make it very salesy

And a few more things

Summary

Hatsune Miku in a punk-inspired outfit, set in a grungy, neon-lit alley pulsing with chaotic energy. Her iconic twin tails are wild and messy, streaked with electric pink and deep violet. She’s dressed in torn fishnets, chunky boots, a leather jacket covered in anarchist patches and safety pins, and a plaid skirt with dark stains. The background is drenched in neon pink and purple glow from flickering signs and graffiti-tagged walls. The atmosphere is macabre and crazed—skulls in the shadows, smeared lipstick smiles on mannequins, broken speakers screaming silent music. Miku’s expression is unhinged yet magnetic, eyes glowing with eerie intensity as if possessed by the music. Punk cyber-goth meets twisted Vocaloid nightmare. cyberpunk, punk fashion, neon noir, macabre aesthetic, chaotic beauty, surreal, detailed, high contrast, anime-style

Summary

A hyper-realistic and absurd scene captured in the visual style of a 1990s analog photograph, with visible film grain, soft focus, yellowed tones, and a timestamp in the bottom corner. In a dark rural field at night, a large plush mascot-style cow is being abducted by a glowing white tractor beam from the sky.

The cow has a white body with black spots, cartoonishly large eyes, exaggerated plush features, and goofy fabric teeth. It is floating horizontally, belly facing up and mouth open, in an awkward and helpless pose — arms and legs dangling down, sneakers flopping, its neon cap barely staying on. The beam lifts it from above, but its body twists slightly mid-air, creating an unnatural, disturbing shape.

The cow wears ridiculous 1990s neon accessories: a backwards baseball cap, oversized sunglasses hanging loosely, and fluorescent chunky sneakers. The wind is strong — its clothing and fur ripple, and debris and dust swirl violently within the beam.

The scene is lit only by the beam, which creates stark contrasts, casting eerie shadows across the field. There is light mist and chaotic motion everywhere. The atmosphere is surreal and unhinged, like a corrupted frame from a lost VHS tape of a children’s show gone wrong. No UFO is visible — only the overpowering beam.

Summary

The Godfather, Painted by Basquiat
“Reimagine the characters of The Godfather in the rebellious, chaotic art style of Jean-Michel Basquiat. Marlon Brando’s Don Corleone wears a melting tuxedo, his face sketched in jagged neon lines, background filled with crown symbols and scribbled Italian phrases. The scene is rough, raw, full of graffiti energy, blending old mafia elegance with street soul.”

Summary

The cover to an animorphs novel featuring donald trump morphing into a dog poop, no caption

I recall seeing some of those in Sora’s feed. Are these the original pictures, or did you re-run them yourself?

All the original.

Hmm. Does that actually work to get a different random scene each time? I haven’t been able to get it to work to get anything other than the first thing in the list. I tried to get ChatGPT to do something similar using the Python tool but it fails to take the result and pass it on to the image generator every time it tries.

ETA1: I advocate always trying the same prompt yourself every time you see an interesting image. :slight_smile:

ETA2: Perchance absolutely can do things like make random selections to pass on to its image generator, but it’s an older system, I think a version of Stable Diffusion, that is not very good at prompt comprehension. It’s only distinction IMHO is its lack of censorship.