11-07-2025
Google Gemini can now turn your photos into videos with audio: Check how
Google has introduced a new feature for Gemini AI that allows users to animate still photos into eight-second videos with sound, powered by the Veo 3 video generation model. This tool, which adds background noise, ambient audio, or even spoken dialogue, is now rolling out in select regions, including India, for Gemini Advanced Ultra and Pro subscribers.
While currently available through the web interface, Google has announced that mobile support will follow later in the week.
Turning still into video with sound: How it works
With this new tool, users can upload a photo, describe the desired motion, and optionally include prompts for audio effects or narration. Gemini then generates a short 720p video in MP4 format, using a 16:9 landscape layout.
Josh Woodward, Vice President of the Gemini app and Google Labs, recently demonstrated the feature on X (formerly Twitter), sharing how a child's drawing was turned into a short animated clip with synchronised sound. 'Still experimental, but we wanted our Pro and Ultra members to try it first! It's really fun to take kindergarten artwork and make it come to life with sound,' Woodward wrote.
To maintain transparency, all videos include a visible 'Veo' watermark in the bottom-right corner and a hidden SynthID digital watermark created by Google DeepMind. This invisible signature helps verify that the content was generated by AI.
Here are the steps to use Gemini AI's new photo-to-video feature:
Click on the 'tools' icon in the prompt bar.
Choose the 'video' tool from the list.
Upload a still image you want to animate.
Enter a description of the desired motion.
Add optional audio cues (e.g., sound effects, dialogue, ambient sounds).
Gemini will generate a short 720p MP4 video in 16:9 format.
Audio will automatically sync with the visuals.
Google Veo 3: What is new?
First unveiled at Google I/O, Veo 3 is Google's most sophisticated video model to date. It can generate realistic visuals and synchronised sound from either text or image-based prompts.
A Google blog post explains: 'Veo 3 excels from text and image prompting to real-world physics and accurate lip syncing. It's great at understanding; you can tell a short story in your prompt, and the model gives you back a clip that brings it to life.'