Generate Audio for Video

Choose a model and upload a video to generate synchronized audio.

Model Best for Avoid for
TARO Natural, physics-driven impacts — footsteps, collisions, water, wind, crackling fire. Excels when the sound is tightly coupled to visible motion without needing a text description. Dialogue, music, or complex layered soundscapes where semantic context matters.
MMAudio Mixed scenes where you want both visual grounding and semantic control via a text prompt — e.g. a busy street scene where you want to emphasize the rain rather than the traffic. Great for ambient textures and nuanced sound design. Pure impact/foley shots where TARO's motion-coupling would be sharper, or cinematic music beds.
HunyuanFoley Cinematic foley requiring high fidelity and explicit creative direction — dramatic SFX, layered environmental design, or any scene where you have a clear written description of the desired sound palette. Quick one-shot clips where you don't want to write a prompt, or raw impact sounds where timing precision matters more than richness.
1 15
10 50
Sampling Mode
0 8
1 8

Generate audio to see waveform.

Generate audio to see waveform.

Generate audio to see waveform.

Generate audio to see waveform.

Generate audio to see waveform.

Generate audio to see waveform.

Generate audio to see waveform.

Generate audio to see waveform.