Generate Audio for Video

Choose a model and upload a video to generate synchronized audio.

Model Best for Avoid for
TARO Natural, physics-driven impacts — footsteps, collisions, water, wind, crackling fire. Excels when the sound is tightly coupled to visible motion without needing a text description. Dialogue, music, or complex layered soundscapes where semantic context matters.
MMAudio Mixed scenes where you want both visual grounding and semantic control via a text prompt — e.g. a busy street scene where you want to emphasize the rain rather than the traffic. Great for ambient textures and nuanced sound design. Pure impact/foley shots where TARO's motion-coupling would be sharper, or cinematic music beds.
HunyuanFoley Cinematic foley requiring high fidelity and explicit creative direction — dramatic SFX, layered environmental design, or any scene where you have a clear written description of the desired sound palette. Quick one-shot clips where you don't want to write a prompt, or raw impact sounds where timing precision matters more than richness.
1 15
10 50
Sampling Mode
0 8
1 8