OpenAI has been the frontrunner and trailblazer in bringing innovations to the market based on the transformer architecture. For the past year, OpenAI has been the reigning king in the realm of LLMs, with GPT-series models, especially GPT-4 and GPT-4 turbo displaying unmatched performance versus the entire field on most language-based tasks. These multimodal models take images, text, video and audio and produce textual outputs.
On February 15th, OpenAI announced Sora, a diffusion transformer model that is capable of producing video from input text prompts. The technical description posted on the research section of their website positions this video generation model as being a ‘world simulator’. This remarkable piece of technology might find a home in creating product marketing campaigns or in rapidly producing backdrops for video-intensive tasks like production of marketing and educational videos.
OpenAI has kept this technology out of the hands of the public, for now. It has offered a select few artists to try it out and offer opinions on its usefulness in video production. Paul Trillo was recently interviewed on the Hard Fork podcast and he ran several prompts through it to create a video inspired by Carl Sagan's 'Golden Record' - a recording meant to tell extraterrestrials about our civilization. Artists and film producers are understandably worried about the impact of AI on the film industry, but Paul sees this as yet another tool to help in creating visual effects that might be difficult and costly to achieve otherwise. An interesting performance tidbit on Sora is that going from text to video from a prompt took in the range of 10 to 15 minutes to do the video rendering. There must be a ton of compute going on to get a full scene together!
I had a quick peek at some of these videos on their YouTube channel. Clearly, not ready for prime-time in the artistic world. You could see with the short videos where portions of scenes would change. One video had a person vanish after walking a few short steps. The figures or 'actors' can take on a surreal appearance. It seems to do best with a somewhat static backdrop and a few moving characters, like the wooly mammoth.