I have sound here.
Well, there’s at least one more who can hear it. Thanks.
Source image from ChatGPT, first result from Grok when it won’t let me enter a prompt:
And another one where the sound in Grok didn’t download into the file, so there was no sound to play when I uploaded it.
@Joey_P, The DRMGIRL should have put her car in park before getting out. Amazing what can happen at a Carl’s Jr. next to a TA Express in Blaine, Washington. 
I’ve just discovered a shortcoming of Google Photos free Veo3 videos: it only makes 9:16 videos. Grok will automatically make the video in whatever supported aspect ratio closest fits the ratio of the source image. Here’s a manga panel that I converted in Sora (which I posted earlier in this thread) in Grok. It did a pretty reasonable job of animating spider movement, I think.
And here’s Veo3. As you can see, it horribly bungled the aspect ratio. The spider movements also aren’t bad here, but the crowning really came out of nowhere.
I like how ChatGPT just knew who the statue on the left was supposed to be.
Source:
Prompt to ChatGPT:
Redraw the image, change the statues to realistic people. Do it 1:1 aspect ratio and you can crop out the sides of the image, concentrating on the former statues and their base.
Bonus Grok video (took whatever it gave me on first upload without redoing it with a prompt)
I just discovered something. In the app, at least, there is a mute audio button. It doesn’t just mute the playback of the audio while playing in the app, it doesn’t download the audio track if you do a download while muted. Unmute, redownload, and you get the audio.
I found that when I googled the issue; there’s online discussion about the problem. It only rarely works for me, and most others as well. The company has acknowledged the problem and still haven’t fixed it.
I’ve been digging through my archives for images to convert, in the past two days it has let me do around 50 free videos per day before hitting a limit, which is insanely generous compared to most other AI video sites. They allow like half a dozen videos per month and you tend to have to wait hours in the queue for it to be generated. Those sites have more control, more features, and probably tend to have better output, but still.
Here’s a fun one. I showed Copilot this comic strip
And told it
“Convert the middle panel to a realistic photo with no text. Iphone 15 photo. 9:16”
And got this, which is pretty funny, but doesn’t quite get it.
I then told Copilot
“Look at the original again. His legs are bare and the front of the worm costume continues downward past his torso towards the floor.”
It started adjusting the image with pretty good results, got down to around his knees, and aborted the image.
I tried again, with the strip cropped to just the center panel
And got this
Meanwhile, I was also trying it in Sora. I accidentally didn’t tell it to use the middle panel so it made its interpretation from the whole strip.
And here’s Sora with the cropped single panel (the first of those I concider a pretty strong success)
Devoid of context, he mainly looks ashamed to have pissed himself
Shofar, sho good.
EDIT:  Dangit, beat by @Maserschmidt .
You were shoclose, but shofar.
Here I am again in this mean old town
And you’re shofar away from me
And where are you when the sun goes down?
You’re shofar away from me.
So far Grok seems to be pretty bad at understanding prompts for image to video. Here’s an example from last night.
For the source image, there are certain creatures and objects the ChatGPT/Copilot/Sora is especially good at for whatever reason, with the Necronomicon as protrayed in The Evil Dead being one of them. Months back I took an image of a girl holding a book (found on the Sora feed) and just told Sora to replace the book she was holding with the Necronomicon. I fed that photo into Grok.
Here’s the video it automatically made
I then prompted a new version: “The book is screaming gibberish while the girl looks on in horror”
Which is almost 100% completely unlike what I wanted.
Did you choose the speech option when you made that second one?
IME, if I choose speech, every character in the image starts speaking the literal thing I put in the field with one voice.
If I don’t choose speech, when the character(s) speak, it comes out sounding like Simglish.
I don’t remember if I picked “speech” or “custom”. Probably “custom”.
And a fresh experiment. From a few months back when we were experimenting with “sisters” prompts and did a set jsing words for Japanese fashion types. I was looking through those for images to upload to see how well Grok interpreted facial expressions/body language in determining the direction of the video output. It occurred to me to try more than one image at a time to see what it would do with the content and the unusual aspect ratio.
So on this instance, at least, it had all three subjects perform a similar action. And it squeezed the aspect ratio a bit from the original 2:1 to approximately 1.8:1.
Showed this to Copilot, only instruction was “Convert this to a realistic photo with no text.”
The only thing it didn’t understand was the drawing of the upside down goose. It replaced the “Gaggle Maps” screen with an actual map route, but I don’t think that hurts the joke.
Showed this to Copilot
Told it “Create a realistic photo of the animals that are casting these shadows.”
(It was supposed to be donkeys. I cropped out the US Capitol at the top of the original.)
Here’s a prompt that Night Cafe invented for me tonight
Summary
Whimsical illustration. A girl with vibrant, flowing hair rides a giant, friendly snail through a fantastical forest filled with glowing mushrooms and oversized flowers. The style is a blend of impressionism and retro-futurism, with a color palette of bright, saturated hues. Soft, dappled sunlight filters through the canopy. A whimsical, dreamlike atmosphere. 10 keywords: vibrant colors, impressionism, retro-futurism, whimsical, fantasy, detailed illustration, magical, enchanted, dreamlike, high detail.
I trimmed it down to this to use in Copilot
Summary
A girl with vibrant, flowing hair rides a giant, friendly snail through a fantastical forest filled with iridescent mushrooms and oversized flowers. Soft, dappled sunlight filters through the canopy. Iphone 15 photo with shallow dof and forced perspective. 9:16
And I fed the resulting image to Grok.
This is the automatic video, which is pretty good, other than the shift in colors and progressive loss of details typical of Grok.
I tried as custom prompt “The snail rears up and she falls off backwards”. She just begins looking like she is falling off by the end of the clip, but the snail never rears up.
Kling understood the directions and the result is pretty hilarious
 
   
   
           
           
           
           
           
           
           
           
   
   
   
           
           
           
           
   
  