Gemini Omni launches for video, avatars and AI editing

Google on Friday launched Gemini Omni, a multimodal model built to generate and edit video, avatars and other media from conversational prompts, pushing Gemini further beyond search features and coding agents. In a company blog post, Google said the first release, Gemini Omni Flash, can take text, images, audio and video as input and return edited clips or newly generated scenes in the same workflow.

The launch gives Google a more concrete product pitch after a keynote week crowded with model names and demos. Rather than another chatbot update, the company is presenting Omni as a media tool that can move from prompting to editing and rendering inside one interface. That would push Gemini closer to production tasks such as cutting clips, changing backgrounds or generating an on-screen presenter from a short prompt.

Google said users can start with a prompt, a reference image, a rough audio track or an existing clip, then ask for a new scene, a talking avatar or a revised cut without moving to another tool. That editing focus moves the product away from one-off demo videos and towards the iterative work video teams actually do.

The company said it will first roll Omni out globally to AI Plus, Pro and Ultra subscribers through the Gemini app and Google Flow, then add it to YouTube Shorts and YouTube Create. Developer and enterprise API access will follow in the coming weeks, Google said.

Koray Kavukcuoglu, Google DeepMind’s chief technology officer, said in the announcement that Gemini Omni Flash “can create anything from any input, starting with video”.

Google is also using Omni to argue that one model should carry context across different media types, so a prompt, reference image and rough audio track can shape the same output. That is a broader claim than the standalone video generators already on the market, and it helps explain why Google tied the launch to Flow and its Gemini subscription tiers instead of spinning it out as a separate lab product.

From demo feature to platform push

Ars Technica said the bigger signal was Google’s attempt to collapse several generative workflows into one conversational interface, rather than asking users to jump between separate image, video and editing models. Ars quoted Gemini senior director of product management Tulsee Doshi saying “Omni is a step toward that vision”.

Starting in consumer products also gives Google a simpler way to test quality and safety before a wider API release. Features such as talking avatars and clip revisions are easier to trial in the Gemini app, Shorts and Create than in enterprise software, and the company can watch how people use them before opening the model to developers.

For Australian developers and creative teams, the launch is another sign that major AI vendors are pushing from chat into production workflows. Whether Omni becomes a daily tool will depend on output quality, cost and how closely Google ties it to products people already use.

Google launches Gemini Omni for video, avatars and AI editing

From demo feature to platform push

Related

Gemini for Science: Google's AI tools move into research labs

Apple outsources Siri to Google's Gemini in $US1b-a-year AI deal

Canva plugs into Google Gemini alongside Adobe, CapCut