Open AI's latest wonder: text to video with Sora

Gah, forgot I was in Cafe Society. :frowning:

Or that the phone isn’t the camera that would have been recording the image. There are half a dozen ways to keep cameras out of reflections, so the footage isn’t necessarily unnatural in that regard.

Gemini 1.5 can do video, and it now has a context window of a million tokens. It can ingest an entire movie and make sense of it. In fact, one of the demos was that they gave it a copy of a n entire silent film with no text, no subtitles, and no description of the film. The AI was able to accurately summarize each scene and the whole film, describe the themes and plot, etc.

Astounding how far video has come in just a few months.

This is the Gemini demo and yeah it’s quite astonishing. It will probably revolutionize police/security/military video analysis very soon among other things.

This was state-of-the-art text to video in March of 2023. At the time all the rage was making x eating y (typified by Will Smith eating spaghetti). These are some of the very few tries I made. (Note that the AI thought all video should have a Shutterstock watermark.)

A cat eating a frog

Taylor Swift eating a cat

Bojack Horseman eating honeydew

Well, that could be Bojack eating a Honeydew, filmed woth a macro lens…

Yeah, the latest videos re actually production quality for things like commercials and corporte videos. Second unit directors shooting B-roll and commercial directors are probably in trouble.

If you want to try text-to-video that is better than my above examples but not as good as Sora, you get a number of free seconds at Runway.

I used most of my free time on image-to-video, which takes a still image and tries to convert that to video with no descriptive prompt at all. Results for that can be a little okay, but fall apart fast even in the short time available. I’d love to see what Sora could do with image-to-video.

Here’s some of my Runway results coverting some of my text-to-image images into video.

And here are some using Stable Video Diffusion, which so far does only image-to-video, no text-to-video.

Some more AI video

Also, this happened recently.

They’re still struggling with hands, but impressive nonetheless.

Awesome

Another day, another mind-blowing AI video demo.

The video embedded in the article:

Ew, no. Those are deeply on the uncanny valley. Impressive technically but not nice to watch.

Same applies to the reference images IMO. Most of them anyway. Why did the company choose those? Are they supposed to express normality? Or do they choose slightly weird to hide the evident weirdness of the generated videos?

Most of those would have fooled me at a casual glance. Hepburn and young DiCaprio were definitely both uncanny valley for me, however. Impressive technology either way.

I don’t think I’ve been more impressed with technology in my life. In fact, watching the YouTubes of the videos it creates is causing me to see the world in the same way as if I’m watching an AI created scene. How the fuck does it get the physics 99.9% right?? That’s mind-blowing. I predict that in the not so far future, a matter of months, that the AI will be able to play our human emotions like a fiddle. And we’ll know it but can’t really do much about it… and then kind of lose of confidence that we know anything is “real”, that we are all inferior to what our computers can create now.

And I’ll just remind everyone here to not put too much stock in released demos. If they made 500 attempts and got one that looked good and 499 eldritch horrors, they’re only going to release the one that looked good, and won’t mention the signal to eldritch ratio.

There’s probably a fair amount of truth to that, and the released demos are undoubtedly the best so far. But it’s also worth noting that the diversity of the different scenes and objects suggests that this isn’t just a one-trick pony, but apparently an extremely versatile tool with a very broad visual knowledge of the world and its physical dynamics.

I also suspect that with videos of weird lengths (27 seconds or 52 seconds, for random made-up example) instead of the full 1 minute available, they were originally made full length but something went weird before it finished.

New video generator. Not as good as Sora, and only makes 2 second clips, but it is free to try. Here is my first try, a clip of Bread Climp.