AI image generation is getting crazy good

Midjourney has no idea what a that centipede is but whatever it made is very beautiful

(This would be an example of the 10% of the time I use nano banana instead)

If it did, it’d probably have spelled “GPT” correctly.

Ah, well that’s cool of you. I know there are basically image equivalents to openrouter like repilka and wavedash where you can send in a prompt and get responses from multiple models. having them self-label would be a nice feature.

I use local image gen with comfyui and you can integrate a captioning system in the workflow – I usually use it to label an image with the prompt that created it but you could put whatever information you wanted there like the model that generated it. But of course I can’t run imagegpt or nano banana locally. I can do flux, z image, sdxl and others though.

A few days ago I was photographing some of my plants and noticed a pattern that looked like a bunny dangling it’s arms from a hole. I played around with doing conversions of that in Gemini and Copilot.

(None of them were quite the kids-book illustration look I was imagining, but still some interesting stuff.)

So I decided to play around with it a bit more just now. Made a couple of new tries in Gemini with results, then tried in Copilot and got this.

That’s weird… What happened to the bunny’s tentacles, or its eyestalks? Why didn’t it render the slime trail? :smiley:

It does sort of look like a Wolf in Sheep’s Clothing, though.

I’m kind of on Copilot’s side haha. That’s a strange command to say “clean up this image [of something] and the picture isn’t that thing” – like, you could’ve said “clean up this picture of a ferrari” and it would be justified in saying well this isn’t a ferrari so…

If you had said something like “there’s an image in this rock that looks like a small rabbit peaking its head out. can you edit this image to emphasize that?” and it may have complied. But I’m not sure it would see it in that case. This is pretty classic pareidolia and that requires the human brain’s tendency to find patterns in noise, especially faces. LLMs are often shockingly good at evaluating what images are, but things like paredolia are going to always be very difficult for them because they don’t really have the same processes that identify patterns in noise.

I’m not sure if you used the same prompt with Gemini. If you did, and it identified it, then that’s pretty impressive.

Claude sort of saw it. I did prime it by asking if it saw any sort of patterns in the object that humans might recognize as something else.

“Yes—the tip of the shoot reads as a face. The pointed apex with the curled, peeled-back sheath looks like a hooded figure or a beaked head (a bit like a bird’s profile or a cowled person looking down and to the left), and the darker mottled blotches lower down can resolve into eye-like spots. The strongest single feature is that hook at the top, which the visual system wants to parse as a nose/beak under a hood.”

But, when asking what the face was:

“Looking at it again carefully: it reads more as a creature than a clean human face, and specifically it has a reptilian or avian quality to me.” The thing it’s talking about as the beak is the bunny’s ear.

So it did see the face, but I also sort of primed it by looking for patterns in the noise rather than just asking “what do you see here”, but it basically identified it but even then it misinterpreted it. Which is more than I expected. I have given it sort of… ambiguous visual pictures like this before with mixed results. Sometimes it shocks me by nailing it, sometimes it misses it. Priming it by telling what sort of thing it’s looking for is a meaningful tip off

A couple of my earlier prompts

I see what you mean. Gemini did a better job, clearly. But I think it was probably generating what you wanted using the image as a reference image. I’m not sure it actually “saw” the bunny exactly. You kind of described where the bunny should be and that it was a bunny.

But Gemini definitely sees something.

Without giving it instructions about what it sees, it said it saw some sort of small creature in the top of it. I asked it to draw it, and it gave me

So… not quite right, but enough to say that it’s actually seeing something face-like and small mammal like in the image.

FWIW copilot is a sort of top layer on top of various models that are selected based on your prompt, what sort of load its systems are under, and other factors.. You might be interacting with gpt 4o - you can’t tell. So it’s hard to say which LLM underneath the copilot orchestrator struggled with your request. If it routed you to gpt 5.3 or 5.5 instead it may have understood it. There are things I like about copilot, but the non-transparency about what model you’re talking to is the most frustrating part.

I asked Gemini to describe the image in detail:

Summary

An extreme, grainy close-up captures a vertical, conical object rising from dark, muddy earth, resembling a bizarre biological structure or a worn, pointed stake.

Structure and Form

  • Shape: The central object is roughly cylindrical at its base, tapering gradually to a dull, slightly asymmetrical point at the top. The apex is darker and appears slightly damaged or split.
  • Texture: The surface is highly textured, covered in a dense, granular pattern that resembles coarse sandpaper, weathered stone, or the bumpy skin of an exotic plant shoot (such as a Konjac or voodoo lily stalk emerging from the ground).
  • Coloring: It features a mottled, camouflage-like pattern of off-white, light gray, and dusty pink undertones, heavily speckled with irregular, dark charcoal-gray and black patches.

Background and Context

  • The Ground: The background and surrounding area consist of dark, damp, and highly disturbed soil or mud. The earth is clumpy, uneven, and out of focus, suggesting a shallow depth of field.
  • Composition: The shot is framed vertically and tightly cropped, keeping the mysterious, textured stalk dead-center. The lighting is overcast and flat, casting soft shadows that blend the object into its muddy environment.

ChatGPT

Summary

The image appears to be a close-up photograph of a strange, elongated object standing upright in dark soil or mud. The object is pale gray to off-white and shaped somewhat like a narrow cone or cocoon, tapering to a pointed tip at the top. Its surface looks thin, translucent, and fibrous, almost like gauze, silk, or weathered organic material stretched tightly over a frame.

Scattered across the surface are multiple dark circular or flower-like spots arranged irregularly. These spots are darker gray or black and stand out against the lighter background. The object’s texture is highly mottled and grainy, suggesting either low image quality, heavy zoom, or dim lighting.

Near the top opening, there is a faint suggestion of a small animal face peeking out — possibly resembling a rabbit, rodent, or cat-like face — though it is ambiguous and could also be a visual illusion caused by shadows and texture patterns. The “face” area is pale with two darker points where eyes might appear.

The background consists entirely of earthy brown dirt with uneven texture, small pebbles, and debris. The lighting is diffuse and muted, with no strong shadows, giving the image an eerie, cryptic atmosphere. The overall composition feels uncanny because the object looks organic yet unfamiliar, almost like a cocoon, fungus, wrapped figure, or naturally formed sheath emerging from the ground.

Copilot (in a new session)

Summary

A single plant shoot is pushing up through dark, moist soil, and the whole scene feels intimate and almost geological in its texture.

:seedling: What the shoot looks like

  • The sprout is conical, tapering to a firm, pointed tip.
  • Its surface is a pale, creamy beige with irregular dark patches scattered across it—almost like mottled scales or freckles.
  • The texture looks thick and fleshy, suggesting a plant that stores water (something succulent‑adjacent or a bulb‑type plant).

:new_moon: The surrounding soil

  • The soil is dark, rich, and damp, with a chunky, organic texture.
  • Small clumps cling to the base of the shoot, emphasizing how recently it emerged.
  • The lighting is soft and close, making the soil look almost like a landscape in miniature.

:sheaf_of_rice: Overall impression

It captures that moment when a plant first breaks the surface—quiet, determined, and full of potential. The close-up framing makes the shoot feel almost monumental, like a tiny geological spire rising from the earth.

If you want, I can also give you a scientific-style description, a poetic version, or help you identify what species it might be.

ChatGPT saw the “animal face”. Gemini correctly guessed that it was a Konjac. Copilot had the least of interest to say.

I’m impressed any of the systems find it at all without prompting / hinting at it. It’s one of those things that seems trivial and obvious to people but really shows how counterintuitive machine vision is because we have specialized brain centers designed to come up with this stuff and the LLMs have to piece it together purely by data and structure and statistics. It’s a way harder problem and less ‘intuitive’ to them as much as that word makes sense in this context.

I showed the full original photo to everybody.

Gemini

Summary

An eye-level, medium close-up shot captures a cluster of mottled, cone-shaped plant shoots and mottled stalks emerging from damp, dark brown soil inside a large, weathered green container.

The Shoots and Stalks

  • Central Shoots: Five pointed, spear-like shoots (cataphylls) poke vertically out of the dirt in the center of the frame. They feature a distinct, organic camouflage pattern—a mixture of grayish-tan, pale pinkish-brown, and heavy dark green or black speckling. The tallest shoot stands a few inches high, while a couple of smaller, younger ones are just breaching the soil surface.
  • Mature Stalks: On the left and in the background, thicker, more mature cylindrical stalks rise out of the frame. They share the same distinctive snake-skin or mottled green-and-white pattern characteristic of the genus Amorphophallus.

Soil and Environment

  • The Ground: The soil is dark, moist, and densely packed. It is littered with small bits of organic debris, including tiny twigs, decaying leaf fragments, a few specks of perlite or small pebbles, and a couple of microscopic green weed seedlings just starting to sprout.
  • The Container: In the upper-left and top background, the thick, smooth rim of a dark green plastic planter or raised bed is visible. The inner wall of the container is heavily splattered and caked with dried mud and soil, showing signs of weathering and outdoor exposure.

Lighting and Composition

The scene is illuminated by soft, diffused natural daylight, casting gentle shadows that give a clear sense of the shoots’ round texture and the wet grain of the dirt. The composition is centered on the emerging cluster, highlighting the contrast between the rough, dark earth and the pale, mottled patterns of the new growth.

ChatGPT

Summary

The image shows a shallow rectangular planter or container filled with wet, muddy soil outdoors, likely in a humid or recently watered environment. Rising from the soil are several unusual, spear-shaped growths that look organic and slightly alien or prehistoric in appearance.

The tallest structures are positioned near the back and left side of the container. They are elongated, narrow, and pointed, with mottled surfaces in gray-green and brown tones. Their texture appears rough and damp, blending with the muddy surroundings. Some of them resemble thick leaves or shoots emerging vertically from beneath the soil.

Clustered near the center and right side are smaller pointed spikes emerging in groups. These are pale gray-beige with dark speckling and tapered tips. Their shapes are irregular: some curve slightly, some are thicker at the base, and others are thin like spikes or tusks. A few appear partially wrapped in translucent sheath-like material near the base, suggesting newly emerging growth.

The soil itself is very wet and compacted, with small twigs, bark fragments, and bits of leaf litter scattered across the surface. The container walls are stained with dirt and moisture, giving the whole scene a raw, earthy look. In the blurred background beyond the planter, faint greenery and tree trunks suggest a forest or garden setting.

Overall, the image has a mysterious botanical feel. The emerging structures could be unusual plant shoots, fungal fruiting bodies, or tubers breaking through the soil, and their clustered arrangement gives the scene an almost miniature-jungle or fantasy-creature appearance.

Copilot

Summary

The image shows a potted plant with exposed, upright roots emerging from compact, moist soil. These roots look like pneumatophores — vertical, snorkel‑like structures some plants use to pull oxygen from the air when the soil is dense or waterlogged.

:seedling: What’s happening in the pot

  • Aerial roots rise straight up from the soil, each one smooth, cylindrical, and slightly tapered.
  • The soil is dark, damp, and tightly packed, with bits of organic debris scattered across the surface.
  • The container wall is visible on the left — a greenish, slightly worn surface that suggests a plastic or ceramic pot.
  • The lighting is soft and natural, highlighting the texture of the soil and the pale, woody tone of the roots.

:herb: What this usually means

These structures are typical of plants adapted to low‑oxygen environments, such as mangroves or certain swamp‑tolerant species. They grow upward to access air, functioning like tiny breathing tubes.

If you want, I can also break down root adaptations, identify possible plant species, or help you figure out care tips for something like this.

Again Gemini got the type of plant right. Copilot was very wrong.

This is the creepiest Alien reimagining that ever existed.

So google released a flash omni video model last month. It’s pretty clear that they were forcing it to be released at Google I/O and it wasn’t ready. It came with only a fraction of the features it’s supposed to have. But they put a sale on it in flow (half credit cost) so I’ve been goofing around with it.

I was trying to create a John Wick style action scene. But google heavily moderates violence, so I had to tell it to replace the real guns with nerf guns, and have the stunt men act like the impact from the nerf dates was forceful. And sometimes it gave pretty decent results – not great, but at least reasonable. The library and the lighting are beautiful anyway.

And sometimes it comes up with hilarious nonsense. It tries to make a lot of cinematic style cuts, so that one 10 second video might be 2-5 shorter shots. But it’s very bad at remembering how things are supposed to be oriented when it makes its cuts, so cuts are often a random rescrambling of the scene. But often the complete failures are great in their own way. I love both the way the nerf dart can’t escape the barrel and the completely nonsensical orientation of the combatants in the fight.

Edit: It actually does really well with cartoons. I haven’t experimented with that very much, but I suspect these models are better with cartoons because they sort of follow a much stronger and simpler convention about physics and motion than the real world.

That looks like the training may have included a bunch of game video.

I agree. I’m not saying it looks like a movie clip, just that it’s coherent and the environment at least is aesthetically nice. The choices on how and where the characters move are pretty awkward. On the other hand, the cartoon is genuinely pretty close to being (lower end) professional animation.

I noticed in copilot under the experiments tab microsoft is letting people use the MAI image generator they’re working on. There’s other cool stuff in there too like the council of AI but obviously the image generator is the relevant one to this thread.

Artificialanalysis puts it at the 14th best text to image generator, but arena.ai leaderboards which uses a blinded A/B ELO comparison puts it at #4. I haven’t used it enough to guess where it really should be, but I figured you guys might be interested in trying it. I’m not sure if its available to all copilot users, and you do need a microsoft account.

Edit: Correction, that part of artificial analysis is also an ELO blind A/B test leaderboard. So it’s apples to apples, but the ranks across the sites are significantly different.

I could link it directly but it may be better if you just use it from within the copilot experiments tab – it may have be activated in some way there.

Create a photo that has the disturbing vibe of an old photo that is the basis of a strange cryptid creature urban legend.

Gemini

Copilot

The real winner was from ChatGPT