AI 视频提示词怎么写｜动作、运镜、时长与稳定性写法全解

AI 视频比图像多三件事要写：动作（动什么）、运镜（镜头怎么移）、时长（几秒钟）。本教程按 4 类常见镜头给出可复制骨架，并附 Runway / Pika / Kling / Seedance 等主流模型对照。

本文目录

为什么视频比图像难写
4 类常见镜头骨架
视频提示词结构图
错误示范 vs 正确示范
5 条真实样本
视频最容易翻车的 6 个点
主流视频模型对照表

为什么视频比图像难写

图像只需要一个静止的画面；视频要让画面在时间轴上演化，并且每一帧都符合物理直觉。新手最容易翻三种车：第一，写得太抽象，模型不知道主体在动还是镜头在动，结果整段镜头乱漂；第二，写得太多，把多个动作叠在 5 秒里，模型只能挑一个或者全部失败；第三，忘记写"稳定性"，模型默认推镜/拉镜/摇镜，把你想要的静止镜头毁掉。

所以视频提示词必须显式写明：主体动作（缓慢/具体）、运镜（静止/推/拉/横移/俯仰）、时长（3-5-8 秒）、稳定性（hold static / no shake）。这四件事缺一个都会失控。

4 类常见镜头骨架

静帧氛围镜头（最稳）

[scene + subject] + camera holds static + gentle [micro motion: drift, sway, ripple] + [time of day + light] + [no shake]

例子：a single raindrop falling on a glass window at night, camera holds static, neon lights blurred in background, gentle vertical impact ripple, no camera shake。这类镜头最稳定，3-5 秒成功率最高。

缓慢推镜（dolly in）

[scene] + slow steady dolly in toward [subject] + [motion direction] + over 5 seconds + cinematic pacing + no jitter

关键词是 "slow"、"steady"、"over X seconds"。不写时间，模型会用一种迷之"标准推镜速度"，通常太快。

横移跟拍（tracking shot）

[subject walking/moving] + camera tracks horizontally to the right at the same pace + [environment scrolling past] + medium shot + steady gimbal feel

跟拍最容易翻车的是"主体走得快、镜头跟不上"，必须显式写"at the same pace"。"steady gimbal feel" 防止画面晃。

主体动作特写（action close-up）

close-up of [subject performing single action] + [single specific motion verb: pours, lifts, turns] + slow motion 120fps look + shallow depth of field + camera holds static

动作特写一定只写一个动作（一个动词）。"pours and stirs and lifts"会让模型崩溃，每段视频只允许 1 个核心动作。

视频提示词结构图

主体动作a barista slowly pours espresso into a glass cup

运镜camera holds static / slow dolly in / tracks left

时长over 5 seconds / 8-second clip

速度感slow motion 120fps look / real time

稳定性no camera shake / steady gimbal feel

光线/场景warm window light · cafe interior

风格cinematic, shallow depth of field

错误示范 vs 正确示范

✗ 错误示范

a beautiful cinematic video of a girl walking in a forest, magical, dreamy, stunning, amazing 4k

没写运镜、没写时长、没写稳定性、动作是模糊的 "walking"。模型会自由发挥，10 次出 10 种不同结果。

✓ 正确示范

a young woman in a wool coat walks slowly forward through a misty pine forest, camera tracks horizontally to the right at the same walking pace, 5-second shot, soft morning backlight, cinematic shallow depth of field, steady gimbal feel, no camera shake

主体走路速度（slowly）、运镜（tracks horizontally）、跟拍同步（same pace）、时长（5-second）、光（backlight）、稳定（no shake）全齐。十次出图风格高度一致。

5 条真实样本

样本 1 · 雨夜静帧Runway Gen-3 / Seedance

a single raindrop slides down a foggy window at night, camera holds static, neon city lights blurred in the background, slow motion 120fps look, gentle vertical motion only, shallow depth of field, no camera shake, 5-second clip

最适合新手起手的镜头：静止+单一微动作，几乎任何模型都能稳出。

样本 2 · 咖啡冲煮Kling / Pika

close-up of a barista's hands slowly pouring espresso into a glass cup, warm cafe interior blurred behind, single action of pouring only, camera holds static, soft side light from the right, real-time pacing, 4-second clip

"single action of pouring only" 防止模型自动加搅拌、抬起、放下等多余动作。

样本 3 · 街头跟拍Runway Gen-3

a young woman in a black trench coat walks forward through a rain-soaked Tokyo alley at night, camera tracks horizontally to the right at the same walking pace, neon reflections on wet ground, shallow depth of field, steady gimbal feel, 6-second cinematic shot

跟拍同步是关键。"at the same walking pace"让镜头不会比人快或慢。

样本 4 · 食品宏观Seedance / Hailuo

extreme close-up of melted chocolate slowly dripping onto a glossy croissant, camera holds completely static, single dripping motion, warm side light, shallow depth of field, slow motion 120fps look, 3-second clip

食品视频核心是单一动作（dripping），慢动作放大材质细节。

样本 5 · 缓推风景Kling / Sora

misty mountain valley at sunrise, slow steady drone dolly forward over the treetops, sunlight breaking through clouds, very gentle pacing over 8 seconds, cinematic wide shot, no jitter, smooth motion

"slow steady drone dolly forward" 给模型明确镜头类型和方向，"over 8 seconds" 控制推镜速度。

视频最容易翻车的 6 个点

坑 1：动作动词太抽象

"moving、walking、interacting"全是模糊词。改成具体动词："slowly pours"、"lifts the cup"、"turns the head to the left"。

坑 2：一条提示词塞多个动作

"she walks in, sits down, picks up the cup, drinks" 在 5 秒里塞 4 个动作必崩。一条镜头只允许 1 个核心动作。

坑 3：没写时长和速度

不写时长，模型按自己默认节奏来。明确写 "3-second / 5-second / 8-second clip" + "slowly / steady / real-time"。

坑 4：忘记写稳定性

不写 "no shake / steady"，多数模型默认轻微手持抖动，把静帧氛围毁掉。

坑 5：主体和镜头都动

主体快速运动+镜头大幅运动同时发生，模型几乎必崩。先固定一个，让另一个动。

坑 6：用图像提示词直接生成视频

"masterpiece, best quality, 8k" 这种图像画质词对视频几乎无效，反而占注意力。视频写"cinematic, shallow depth of field"就够。

主流视频模型对照表

模型	典型时长	擅长场景	注意点
Runway Gen-3	5-10 秒	电影感、人物镜头、跟拍	对动作连续性较好，运镜词敏感
Pika 2.x	3-5 秒	短氛围片、概念视频	需要明确简短的动作描述
Kling 2.x	5-10 秒	人物表演、产品广告	对中文友好，可直接用中文提示
Seedance 2.0	5-8 秒	横屏电影感、运镜	详见本站 Seedance 专页
Sora（部分开放）	10-60 秒	长镜头叙事	提示词可以写得更接近自然语言
Hailuo / MiniMax	5-6 秒	真人和风景	对中文长句友好

实测建议：视频出图成功率不如静态图像，多数情况下需要试 3-5 次。提示词应保持简短、动作单一、稳定性显式，比堆形容词有效得多。

常见问题

AI 视频可以做几秒？

目前主流商用模型 3-10 秒最稳定，Sora、Kling 在特定模式下可到 1 分钟。提示词强度随时长上升而下降，越长越难精确控制。

视频提示词需要负面词吗？

大多数视频模型对负面词支持很弱。常见做法是在正向显式写 "no camera shake"、"single action only"，比单独负面词框有效。

怎么保持人物在多段视频里一致？

目前最稳的做法是用同一张参考图驱动（image-to-video），并保留 seed。纯文本一致性多镜头几乎不可能稳。

中文写视频提示词可以吗？

Kling、Hailuo 对中文支持很好；Runway、Pika、Sora 建议英文。本站编辑器支持中英文双写底稿。

视频提示词如何描述动作和运镜

为什么视频比图像难写

4 类常见镜头骨架

静帧氛围镜头（最稳）

缓慢推镜（dolly in）

横移跟拍（tracking shot）

主体动作特写（action close-up）

视频提示词结构图

错误示范 vs 正确示范

✗ 错误示范

✓ 正确示范

5 条真实样本

视频最容易翻车的 6 个点

主流视频模型对照表

常见问题

AI 视频可以做几秒？

视频提示词需要负面词吗？

怎么保持人物在多段视频里一致？

中文写视频提示词可以吗？