Open AI's latest wonder: text to video with Sora

The latest sensation sweeping the Intertubes.

Some thoughts:
I don’t follow this space closely but I am stunned at the quality of the videos, the level of detail and the ability to modify on the fly. I would have guessed this would be years away.
I think what we have seen is already good enough to be used in short advertising videos. The video ad industry may be revolutionized practically overnight.

I wonder how much compute these videos require and I am guessing an enormous amount compared to a single picture in Dall-E. The full service will probably be expensive though possibly they may release a lower-quality service with strict quantity limits as part of the Chat GPT subscription.

The scope for mischief, political and otherwise, is obvious. There are no doubt safety features to prevent the use of celebrities on these videos but smart people may find a way around them and there are lots of other harmful things people could do. It will be an interesting race between OpenAI and assorted miscreants, not least in a US Presidential election year.

The videos are a pretty huge leap above the next best, both in quality and duration. One of the most impressive ones to me is the cat in the 5th video of the last set.

Another one that especially impressed me was the second video in the 3rd set, the view out the train window that at one point passes through shadow and you can see the reflection of the passenger.

Here’s that cat video alone in a Twitter post:

Bloody hell! This is highly demotivating: why invest any effort at all in painting or drawing when the AI can do it effortlessly? I should learn prompting instead of drawing, where can I learn that?

That is insane.

I know how you feel. Whenever an AI thread like this one appears, you get some posters saying “what’s left for us humans now?” followed by other posters saying that the AI revolution will open up all kinds of new career opportunities. But doing what? Are we all to become mere AI prompt jockeys?

Studio head: this video of T Rexs battling woolly mammoths in an alien gladiatorial arena is amazing, Smith! I like the little touch of the rainy weather and the reflections from the wet floor of the open arena.

AI Prompt Jockey Smith: That was my idea, sir! I typed “the arena is open to the alien planet’s weather, and the floor of the arena is wet and reflects the action like a mirror”.

Studio head: Fantastic work, Smith! I see a promotion to Head AI Prompt Jockey in your future!

People said the same thing when photography became a thing. Why bother drawing or painting when a camera will do it effortlessly?

Wow.

Just so I’m clear, is AI generating these videos from nothing (versus finding it somewhere)?

That is correct, the videos are generated from the text prompt describing it.

But now it is digital, if you get my drift.

There is still a lot that AI can’t do and for the most part AI be more for brainstorming and raw material directly providing a polished final product. But yes the AI will get better and humans will have to constantly relearn new skills to fill in the gaps of what AI cannot do yet. As AI fills in those gaps, new gaps will appear and the rewards will go to the workers who can fill them the quickest. That is how work will be in the forseeable future.

There is a whole new emerging field of “prompt engineering”. You will find some resources on Youtube and the rest you can pick up on your own through trial and error.

My father is a bit nervous that AI videos will come out with a generated AI Biden saying something heinous and 2/3 of America will believe it enough to swing the election. Almost like the email harangues about Hillary’s computer.

That doesn’t seem beyond reason here.

I’ve posted this clip before in threads about generative AI, but this is the future I’ve always wanted. Tell the computer anything you want, the computer makes it.

(And the holodeck didn’t have celebrity “guard rails”, either. You want to date Leah Brahms? Sure, have at it!)

I mean, kind of? Mostly? (NB he mixes up left and right in the original tweet)

Well, not from nothing exactly - it’s producing these based on its training, which consists of ‘looking’ intently at a lot of existing videos.

It’s not copying those videos or making a collage of pieces of them, or anything as simple as that; it’s ‘watched’ a lot of videos and on the basis of that training, ‘knows’ how to make them.

Thanks. That answers my question.

And not pointing her phone at the window, thereby showing off that this video is impossible.

That was so rare for them to showcase the holodeck’s potential like that.

Another favorite holodeck scene in which inspired use of that tech enables an augmented intelligence hard takeoff:

Deep fakes have been a reality for a while unfortunately. Debunking them has become an industry.

Don’t forget the flip side of that which Trump tries to exploit frequently; that the existence of fake videos gives people the chance to claim a real video, audio, or photo is fake.

Extremely impressive alright, and I’m going to have to start doing some reading about it! But in the spirit of “a picture is worth a thousand words” and a video presumably many times more, ISTM that a great deal of the detail must be supplied by the AI unprompted since it would be practically impossible to describe it all. I imagine this must make heavy use of iteration.