Advertisement: Click here to learn how to Generate Art From Text
Google’s new video generation AI model LumiereUses of a new diffusion model calledSpace-Time-U-Net is a program that uses STUNet to determine where things are on a video (space), and how they simultaneously change and move (time). Ars Technica This method allows Lumiere video to be created in a single step, rather than assembling smaller still frames.
Lumiere begins by creating a frame from the prompt. It then uses the STUNet Framework to start approximating the movement of objects within the frame to create more frames which flow into each other to create the appearance seamless motion. Lumiere also produces 80 frames as opposed to 25 frames with Stable Video Diffusion.
It’s true that I am a text journalist, not a video person. But the sizzle reel Google released, along with a scientific paper in pre-print, shows how AI video generation and editing has gone from uncanny to near-realistic in just a few short years. It also establishes Google’s tech in the space already occupied by competitors like Runway, Stable Video Diffusion, or Meta’s Emu. Runway is one of the first text-to video platforms for mass market. Released Runway Gen-2In March of last year, the company began to offer videos that look more realistic. Runway videos have a difficult time capturing movement.
Google was kind enough for me to post clips and prompts at the Lumiere website, which allowed me to run the same prompts in Runway as a comparison. Here are the results.
Some of the clips are a bit artificial, especially when you pay attention to the skin texture and the atmosphere of the scene. But Look at this turtle!It moves as a turtle would in the water! It looks like a turtle! I sent a Lumiere video to a professional video editor friend. While she pointed out that “you can clearly tell it’s not entirely real,” she thought it was impressive that if I hadn’t told her it was AI, she would think it was CGI. (She also said: “It’s going to take my job, isn’t it?”)
Other models stitch together videos from generated keyframes where the movement has already occurred (think of drawings on a flipbook), while STUNet allows Lumiere to focus on the movement based on where generated content should be positioned at a certain time in the video.
Google is not a major player in the text to video category, but has released more advanced AI models over time and has shifted its focus towards multimodality. Its Gemini large language modelBard will eventually have image generation. Lumiere is not yet available for testing, but it shows Google’s capability to develop an AI video platform that is comparable to — and arguably a bit better than — generally available AI video generators like Runway and Pika. Just a reminder that this was where Google has AI videoTwo years ago,
Lumiere allows users to create video in a variety of styles, including stylized generation. Cinemagraphs are animations of only a small portion of a video. Inpainting is a way to mask an area of a video to change color or pattern.
Google’s Lumiere paper, though, noted that “there is a risk of misuse for creating fake or harmful content with our technology, and we believe that it is crucial to develop and apply tools for detecting biases and malicious use cases to ensure a safe and fair use.” The paper’s authors didn’t explain how this can be achieved.