AI image generation is getting crazy good

Here’s a weird one. I’ve been browsing through my folder of source images used for Copilot/Sora prompts and putting some through Grok. I upped one photo of a Japanese actress I had been trying to use as a character face months ago. With no prompt it created a video of her saying “Sometimes I wonder what it would be like to just let go of everything.” I tried running it again promptless a couple of days later and she sings “Sometimes I wonder na…” A couple of days after tgat I run it a third time and once again she says “Sometimes I wonder what it would be like to just let go of everything.” I googled that phrase and it doesn’t seem to be a quote from anything, so who the hell knows how/why Grok came up with it 2.5 times for the videos.

On a tangental subject, the character I was trying to use her face for interacted with some other of my ideas. Months back I talked about trying (and failing) to get the accurately cheesy look for a reptile creature frim the 1966 Batman. Not long after that, I tried the same thing with a creature called Garamon. I really wanted to get it screen accurate, looking like the cheesy low-budget TV show costume that it is, especially the two giant paws dangling from the front like a mole-kangaroo hybrid. But despite using several source images and multiple prompt descriptions I never got the look I was hoping for out of dozens of images. But I did upload one of the images when I first started testing Grok.

Pretty well done (promptless) scene, even made a new Garamon (even if it is green). But it treated Yongary as a static part of the background and apparently didn’t even notice that it represented a living creature.

I tried again tonight with the updated Grok. Here’s the promptless result

Now it recognizes that Yongary is a giant creature in the scene and in this case does a pretty good job of deducing his 3D appearance.

I decided to give the image a few prompts:

She does jumping jacks
She crouches into a fetal position
She pulls a hidden knife and attacks
She pulls a gun and shoots the monsters

None of them do Yongary as well as the first one. It doesn’t quite get attacking with a knife or gun. Also jumping jacks. But I liked the cinematic camerawork on the crouching clip.

Touching back on that quality update, here’s a video I made on September 23rd

vs a retry today

I didn’t do a video of this OEOHFPPE image in the earlier Gro to compare to, but I like the details of this one, with the PP’s distress and the PPE’s dangling tongue.

Seems like you were extra-alarmist because you can now do 15.

I see that Gemini is allowing 8 second Veo 3.1 clips for free users now. I thought up a quick prompt and it has been telling me that the video will be ready in 1-2 minutes for around half an hour now.

How does dialog between VEO characters currently compare to Sora characters?
Sora2 has been impressive in that respect. When it gets the right lines and voices coming from the correct characters (a major frustrating problem), the dialog naturally flows. Decent emotional reactions to what the other character said, all that.
The last time I saw someone attempting to tell a story via VEO, the lines came out stilted and unnatural, not coming anywhere near close to matching the intended tone of the scene.

I’m still waiting the 1-2 minutes for my first clip to finish. No idea how long that 1-2 minutes will take.

So now on Sora2, 15-second videos cost 2 toward the current cap of 30 videos per day. And so far, every single one of the longer videos I’ve attempted has screwed up the action and/or dialog in some major way, usually swapping which character says and does what.
Storyboards where maybe that doesn’t happen are now available on the Pro tier, but I ain’t dropping a whopping $200 per month for that.

Are you doing a free trial of one of Google’s paid plains to test out the new Veo? I tried from within Gemini on the free tier and it told me that wasn’t possible, but it also gave me the impression it didn’t even know how even though the website says the feature is available that way.

No, through the free Gemini chatbot, web version. I went there to do an image prompt and there was a popup telling me that new Veo 3.1 was available, and there was a “make video” option, so I thought of a quick prompt and plugged it in. But it has been waiting hours now to generate it and I don’t know if it will.

And I open Gemini in another browser to see if I can get the popup window again, but I was already signed in there, too, and it showed that the video was completed, so the original chat is apparently hung on the “generating” message. No idea how long ago it was completed. My first (hastily chosen) prompt was “A possum and a cat teaming up to steal a fish from a fish market.” And this is the video

And it is no longer showing the option to create a video. Maybe it somehow gave me the option by mistake once.

I noted the man didn’t notice his fish being stolen. :fish: I was pretty sure Sora2 wouldn’t make that mistake, and I was right.

I only altered the prompt enough to make sure the seller would be present too in the scene.

Prompt:
A possum and a cat teaming up to steal a fish from a fish market. There’s a man in an apron right there at the stand placing more fish.

After that one time Gemini has so far not given me the video option again. No video button like the first time and when I prompt for video it claims that it doesn’t know. But I had at the time written a more elaborate prompt for my theoretical second video:

Summary

A scary alien is talking to a human in a grassy field. Cows and the alien spacecraft are in the background. The cows are grazing, the saucer ship is spinning with flashing lights. The alien says “Take me to your leader” The human says “Um…about that…”

Absent the video option I modified the prompt for a Sora image:

Summary

A scary alien is talking to a human in a grassy field cows and the alien space raft are in the background. The cows are grazing, the saucer ship is spinning with flashing lights. Iphone 15 photo with shallow dof and forced perspective.

And unsatisfied with the human results refined it further:

Summary

A scary alien is talking to a human in a grassy field cows and the alien space raft are in the background. The cows are grazing, the saucer ship is spinning with flashing lights. The human is a farmer, a weathered old man in worn overalls and a tattered straw hat. Iphone 15 photo with shallow dof and forced perspective.

I then tried two of those images in Grok a few times using the original video prompt.

The very first generation is probably the best overall, with the proper characters speaking the proper lines. The voices and inflections could be better, though.

This one also gets the characters and lines correct, but has an unnecessary zoom out.

All the other generations had some sorts of problems, such as the wrong characters speaking the lines.

Others had the same character speaking both lines, and mangling some of the words. (But I liked the rotating ship and cow details in all of the clips I made.)

Are you able to try to contextualize any of the dialogue? For example, to inject hesitation into what the alien says to make it make more sense, etc.

No, the moat I tried is specifying if it was the one on the right or left talking. (That didn’t help.)

Human: Take me to your leader!
Alien: About that… I’m only the pet, the master just let me out for.. reasons..
.
Alien talking low: Take me to your rest room!.. Now!!

In the Gig Harbor high school principal thread there is a link to a page about Gig Harbor high school that features the Gig Harbor high school principal fist-bumping with Gig Harbor high school students. I thought it looked like the Wonder Twins activating their powers

So I upped a photo to Grok with this simple prompt

They say “Wonder Twins powers, activate”. One transforms into a cat, the other transforms into a cloud.

and was very impressed with how well it did the cat transformation on the first try.

The cloud transformation wasn’t quite as successful, though, so I tried more times and with more water-based forms (ice cubes, water, snow man). None of those blew me away, and no others got such a smooth cat transformation, either.

Looks like they’ve fixed that issue, in an inobvious way. I just noticed this new option in the settings:

Manually updated the app (went to Google Play even though it hadn’t prompted me to update) and now it has that nenu option, too. (But no option for a starry background, darn it!) The options for generating are the same: Normal, Custom and Fun. (I believe the browser version has a “Spicy” option?)

I do continue to experiment with Grok videos though I haven’t mentioned them for a while.

Back when the “ultrasonic chef’s knife” thread was active in CS I decided to see what Sora/Copilot thought an ultrasonic chef looked like. One image turned out to be a chef handling a small ultrasonic ckeaner (I had one of those once, meant for jewelery and found at a flea market, I never noticed any useful results from it) and the other had the chef holding some tool of unclear utility. I uploaded the tool image to Grok and said a stream of water was shooting out of it and pulling him up into the air. The result wasn’t exactly as I had imagined but showed much more of an understanding of the concept than it might have.

(I was hoping first his hand/arm would be dragged upward to the full natural limit, then his body lifted dangling under it.)

For me, it comes and goes seemingly at random these days. I saw some online discussion that behind the scenes they are turning it on and off at various times and possibly for different users at different times. But it was apparently unconfirmed user speculations. I haven’t tried to keep track of whether I only get the spicy option (or it’s suppressed) on particular source images.