国产人妻精品区一区二区三区_色噜噜狠狠一区二区三区果冻_最新国产の精品合集bt7086_av电影手机在线观看_亚洲国产欧美在线综合其他_成免费crm85171_97人妻人人揉人人澡人人爽国产_色天使久久综合网天天_爱回家之开心速递粤语在线观看

網(wǎng)易首頁 > 網(wǎng)易號 > 正文 申請入駐

復(fù)旦&騰訊提出Baton:首創(chuàng)語義藍(lán)圖指引,實(shí)現(xiàn)音畫邏輯精準(zhǔn)同步

0
分享至



當(dāng)用戶給出一句簡單提示詞時(shí),當(dāng)前的音視頻生成模型往往已經(jīng)能夠生成具有不錯(cuò)質(zhì)量的視聽內(nèi)容。然而,一旦提示詞變得復(fù)雜,問題便開始暴露出來。

例如,用戶可能要求模型生成這樣一個(gè)場景:一個(gè)男孩先完成運(yùn)球訓(xùn)練,隨后開始講話;或者兩個(gè)人在互動(dòng)過程中依次說出不同內(nèi)容;又或者某個(gè)動(dòng)作發(fā)生后,對應(yīng)的聲音才逐漸出現(xiàn)并增強(qiáng)。

這類包含多階段動(dòng)作、復(fù)雜人物交互以及明確時(shí)序關(guān)系的指令,不僅要求模型理解「發(fā)生了什么」,還要求模型準(zhǔn)確推理「什么時(shí)候發(fā)生」「誰在發(fā)生」以及「聲音應(yīng)該如何與畫面對應(yīng)」。

遺憾的是,對于這類需要長程語義理解的復(fù)雜場景,目前大多數(shù)開源音視頻生成模型仍然表現(xiàn)不佳。生成結(jié)果中經(jīng)常出現(xiàn)人物動(dòng)作與聲音錯(cuò)位、多角色對白對應(yīng)錯(cuò)誤、音畫節(jié)奏不同步等問題。

其根本原因在于,現(xiàn)有方法大多將文本提示編碼為單個(gè)全局語義向量,并將其同時(shí)作為視頻與音頻生成過程的條件信號。雖然這種方式能夠提供場景級別的語義指導(dǎo),但卻難以進(jìn)一步拆解復(fù)雜事件之間的時(shí)序關(guān)系,也無法明確描述不同角色、動(dòng)作和聲音之間應(yīng)如何對應(yīng)。

圍繞這一問題,研究社區(qū)已經(jīng)進(jìn)行了大量探索。例如,Ovi 率先構(gòu)建了原生視頻 — 音頻聯(lián)合生成框架,并采用雙分支 DiT 架構(gòu)同時(shí)建模視覺與聽覺信號;LTX-2.3 進(jìn)一步提升了模型規(guī)模與訓(xùn)練數(shù)據(jù)質(zhì)量;MOVA 等工作則增強(qiáng)訓(xùn)練策略與跨模態(tài)協(xié)同機(jī)制入手。與此同時(shí),隨著多模態(tài)大語言模型的發(fā)展,一些研究開始引入 Qwen3、Qwen3-VL、Qwen3-Omni 等模型,希望利用其更強(qiáng)的語義理解與推理能力,對用戶提示進(jìn)行擴(kuò)展、重寫或增強(qiáng),從而為生成模型提供更豐富的條件信息。

然而,上述方法大多仍然遵循同一種范式:先將復(fù)雜提示壓縮為統(tǒng)一的全局語義表示,再將其作為條件注入視頻與音頻擴(kuò)散過程。

這樣的設(shè)計(jì)雖然能夠告訴模型「場景中有哪些內(nèi)容」,卻難以進(jìn)一步描述「哪些事件應(yīng)該先發(fā)生、哪些事件應(yīng)該后發(fā)生,以及對應(yīng)的視覺與聲音信號應(yīng)當(dāng)如何在時(shí)間軸上保持一致」。由于缺乏顯式的跨模態(tài)語義規(guī)劃機(jī)制,視頻與音頻往往只能依據(jù)同一個(gè)模糊條件獨(dú)立完成生成,最終在復(fù)雜場景下逐漸形成彼此偏離甚至相互沖突的生成軌跡。

因此,當(dāng)面對多階段動(dòng)作鏈條、復(fù)雜人物互動(dòng)、長程因果關(guān)系乃至多說話人對話等任務(wù)時(shí),現(xiàn)有方法仍然難以生成穩(wěn)定、高保真且高度同步的視聽內(nèi)容。



為了解決這一問題,復(fù)旦大學(xué)與騰訊混元團(tuán)隊(duì)提出了 Baton。與現(xiàn)有方法直接利用全局文本特征驅(qū)動(dòng)擴(kuò)散生成不同,Baton 的核心思想是將語義推理與內(nèi)容生成解耦:模型首先構(gòu)建一份跨模態(tài)共享的語義藍(lán)圖(Semantic Blueprint),隨后再依據(jù)這份藍(lán)圖同步生成視頻與音頻。

在這一框架下,視頻和音頻不再各自獨(dú)立地理解用戶提示,而是共享同一份包含事件、角色、時(shí)序關(guān)系和跨模態(tài)對應(yīng)關(guān)系的中間規(guī)劃結(jié)果。借助這一顯式規(guī)劃過程,模型能夠在生成開始之前完成復(fù)雜語義關(guān)系的推理,從而為后續(xù)擴(kuò)散過程提供穩(wěn)定且一致的指導(dǎo)信號。

為實(shí)現(xiàn)這一目標(biāo),Baton 設(shè)計(jì)了 VA-Planner 和 Relative Semantic RoPE 兩項(xiàng)關(guān)鍵技術(shù)。其中,VA-Planner 負(fù)責(zé)生成跨模態(tài)語義藍(lán)圖,而 Relative Semantic RoPE 則負(fù)責(zé)將藍(lán)圖中的規(guī)劃信息準(zhǔn)確映射到擴(kuò)散模型的生成空間中,從而實(shí)現(xiàn)視頻與音頻的精細(xì)協(xié)同生成。



  • 論文標(biāo)題:Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation
  • 論文地址:https://arxiv.org/pdf/2605.25195
  • 項(xiàng)目地址:https://francis-rings.github.io/Baton/

方法簡介

如下圖所示,Baton 通過顯式解耦語義推理與內(nèi)容生成兩個(gè)階段,構(gòu)建了一套具備模態(tài)感知能力的語義藍(lán)圖(Blueprint)機(jī)制,用于統(tǒng)一協(xié)調(diào)視頻與音頻的擴(kuò)散去噪過程。

具體而言,用戶輸入的文本提示首先被送入多模態(tài)大語言模型(MLLM)進(jìn)行語義推理,從中預(yù)測出一對分別對應(yīng)視頻和音頻模態(tài)的 planned tokens。這些 planned tokens 充當(dāng)跨模態(tài)共享的語義藍(lán)圖,為后續(xù)生成過程提供明確的內(nèi)容規(guī)劃和時(shí)序指導(dǎo)。為了將語義藍(lán)圖融入生成過程,planned tokens 進(jìn)一步通過 cross-attention 注入擴(kuò)散 Transformer(DiT)中。這里采用的 DiT 延續(xù)了 Ovi 中的雙分支架構(gòu),分別負(fù)責(zé)視頻與音頻的生成與去噪。

值得注意的是,planned tokens 與擴(kuò)散模型中的 latents 分布在不同的時(shí)空網(wǎng)格上,兩者天然存在位置對應(yīng)關(guān)系不一致的問題。

為了解決這一問題,Baton 提出了Relative Semantic RoPE(RS-RoPE)機(jī)制,通過構(gòu)建統(tǒng)一的相對位置編碼空間,實(shí)現(xiàn) planned tokens 與 diffusion latents 之間的精確語義對齊,從而確保語義藍(lán)圖能夠有效指導(dǎo)視頻和音頻的聯(lián)合生成過程。

如下圖所示,Baton 將語義理解和內(nèi)容生成拆分成兩個(gè)獨(dú)立階段,并通過一種跨模態(tài)的規(guī)劃機(jī)制來保持視頻與音頻的協(xié)調(diào)一致。

具體來說,系統(tǒng)首先對用戶提供的文本指令進(jìn)行深度解析,由多模態(tài)大語言模型(MLLM)生成兩組專門的 Planned Tokens,分別對應(yīng)視頻和音頻模態(tài)。這些規(guī)劃信息可以看作一份「生成藍(lán)圖」,明確規(guī)定了內(nèi)容應(yīng)該呈現(xiàn)什么,以及事件在時(shí)間上的先后關(guān)系,為接下來的生成步驟提供導(dǎo)航。

在生成過程中,Planned Tokens 通過跨注意力機(jī)制注入擴(kuò)散 Transformer(DiT),使得生成模型在每一步去噪時(shí)都能參考這份語義藍(lán)圖。Baton 沿用 Ovi 的雙分支設(shè)計(jì),視頻和音頻各自擁有獨(dú)立的生成路徑,但通過藍(lán)圖保持同步。

由于藍(lán)圖與擴(kuò)散潛變量在空間和時(shí)間上存在天然的不匹配,為保證精確對應(yīng),Baton 引入了Relative Semantic RoPE(RS-RoPE)。這一機(jī)制為規(guī)劃信息和潛變量建立了統(tǒng)一的相對位置編碼,使生成模型能夠準(zhǔn)確地將語義指導(dǎo)映射到每個(gè)生成單元上,從而實(shí)現(xiàn)音畫高度同步的聯(lián)合生成。



1. VA-Planner:跨模態(tài)語義規(guī)劃

與直接使用全局文本嵌入不同,VA-Planner 首先利用多模態(tài)大語言模型對用戶提示進(jìn)行顯式語義推理,并生成一組分別對應(yīng)視頻和音頻模態(tài)的 Planned Tokens。這些 Tokens 不再僅僅表示整體場景,而是進(jìn)一步編碼了局部事件的語義信息,包括發(fā)生了什么、發(fā)生在何處以及發(fā)生在什么時(shí)間。

具體而言,Baton 將視頻規(guī)劃區(qū)域與音頻規(guī)劃區(qū)域共同組織到同一個(gè)自回歸推理序列中,并利用 MLLM 逐步預(yù)測對應(yīng)的語義表示。

由于視頻與音頻規(guī)劃共享同一上下文,同時(shí)位于統(tǒng)一的推理過程中,因此模型能夠在生成階段之前就建立跨模態(tài)關(guān)聯(lián)關(guān)系。最終得到的 Planned Tokens 可以被視為一份跨模態(tài)共享的語義藍(lán)圖,為后續(xù)視頻與音頻生成提供統(tǒng)一且細(xì)粒度的規(guī)劃信息。

2. 雙語義對齊塔:構(gòu)建視頻與音頻共享的語義藍(lán)圖

雖然 VA-Planner 已經(jīng)能夠生成視頻和音頻對應(yīng)的語義規(guī)劃,但這些表示仍然位于 MLLM 的語言空間中,與擴(kuò)散模型實(shí)際使用的視覺和音頻特征空間之間存在明顯差異。

因此,Baton 進(jìn)一步設(shè)計(jì)了Dual Semantic Alignment Towers(雙語義對齊塔),負(fù)責(zé)將規(guī)劃結(jié)果轉(zhuǎn)換為更適合生成模型理解的感知語義表示。

具體而言,Baton 分別構(gòu)建視頻塔和音頻塔,并采用 SigLip2 與 WavTokenizer 作為對應(yīng)模態(tài)的感知監(jiān)督目標(biāo)。每個(gè)對齊塔內(nèi)部都包含一組可學(xué)習(xí)查詢(Learnable Queries),用于從視頻規(guī)劃表示和音頻規(guī)劃表示中提取最關(guān)鍵的語義信息。

更重要的是,雙塔引入了雙向跨模態(tài)注意力機(jī)制。由于 MLLM 的自回歸結(jié)構(gòu)天然具有單向依賴關(guān)系,視頻規(guī)劃無法直接感知音頻規(guī)劃的信息。為了解決這一問題,視頻塔在提取視覺語義的同時(shí)還會(huì)主動(dòng)吸收音頻信息,音頻塔則同步引入視覺信息,從而實(shí)現(xiàn)雙向語義交互。最終得到的視頻與音頻 Planned Tokens 不再是兩份獨(dú)立規(guī)劃,而是共享同一時(shí)間軸和語義結(jié)構(gòu)的統(tǒng)一藍(lán)圖。

為了進(jìn)一步建立跨模態(tài)時(shí)序?qū)?yīng)關(guān)系,Baton 還引入了Timestamp-based RoPE,將視頻關(guān)鍵幀與音頻片段映射到統(tǒng)一時(shí)間坐標(biāo)系中,使模型能夠準(zhǔn)確理解不同模態(tài)事件之間的時(shí)間對應(yīng)關(guān)系。具體實(shí)現(xiàn)細(xì)節(jié)和詳細(xì)公式推導(dǎo)請閱讀原論文。

3. RS-RoPE:讓語義藍(lán)圖真正「落地」到生成過程

在實(shí)際生成階段,Planned Tokens 與擴(kuò)散模型中的潛變量(Latents)位于不同的時(shí)空網(wǎng)格上。前者描述的是關(guān)鍵事件和語義結(jié)構(gòu),而后者對應(yīng)的是視頻和音頻在擴(kuò)散過程中的具體表示,兩者之間并不存在天然的一一對應(yīng)關(guān)系。如果直接進(jìn)行跨注意力交互,模型很難準(zhǔn)確判斷某個(gè)潛變量應(yīng)該關(guān)注哪部分語義規(guī)劃信息。

為了解決這一問題,Baton 提出了Relative Semantic RoPE(RS-RoPE)。與傳統(tǒng)位置編碼僅描述 Token 絕對位置不同,RS-RoPE 構(gòu)建了一套統(tǒng)一的相對語義坐標(biāo)系,將 Planned Tokens 與擴(kuò)散潛變量映射到同一參考空間中。

借助這一機(jī)制,擴(kuò)散模型能夠在去噪過程中準(zhǔn)確找到與當(dāng)前時(shí)空位置最相關(guān)的規(guī)劃信息,使語義藍(lán)圖真正參與到每一步生成過程之中。

換句話說,RS-RoPE 相當(dāng)于為 Blueprint 和擴(kuò)散生成之間建立了一座精確的「導(dǎo)航系統(tǒng)」,確保視頻和音頻始終沿著預(yù)先規(guī)劃好的語義路徑協(xié)同演化。

訓(xùn)練策略

Baton 的訓(xùn)練采用三階段策略:

1. VA-Planner 預(yù)訓(xùn)練

在第一階段,模型學(xué)習(xí)將用戶提示轉(zhuǎn)化為跨模態(tài)語義規(guī)劃(Planned Tokens)。利用真實(shí)視頻和音頻數(shù)據(jù)作為監(jiān)督,VA-Planner 學(xué)會(huì)生成能夠反映視覺和音頻感知結(jié)構(gòu)的連續(xù)特征,而不僅僅依賴自然語言嵌入,從而獲得更豐富的語義信息。

2. DiT 適配訓(xùn)練

第二階段旨在讓擴(kuò)散模型(DiT)學(xué)習(xí)這些語義特征的分布。此時(shí),DiT 以真實(shí)特征作為條件進(jìn)行訓(xùn)練,能夠熟悉視頻與音頻的生成規(guī)律,同時(shí)避免被 VA-Planner 預(yù)測誤差干擾。

3. 聯(lián)合微調(diào)

最后,VA-Planner 與 DiT 組合成完整系統(tǒng),VA-Planner 參數(shù)凍結(jié),DiT 接收規(guī)劃器預(yù)測的 Planned Tokens 作為輸入進(jìn)行訓(xùn)練。這一步能夠彌合理想特征與實(shí)際預(yù)測之間的差距,緩解曝光偏差問題,使生成過程更穩(wěn)定、魯棒。

實(shí)驗(yàn)

在定量試驗(yàn)對比上,Baton 與以前的開源模型在 Verse-Bench 和 Sem100 上進(jìn)行指標(biāo)對比,其中 Verse-Bench 為開源的音畫一致生成的測試集,Sem100 為內(nèi)部收集的 100 條測試視頻樣例,相比于以前的開源測試集,Sem100 的 text prompt 具備更加復(fù)雜的描述,包括人物與周圍環(huán)境的多次連續(xù)性交互動(dòng)作,涉及多人的復(fù)雜交互,涉及多個(gè)連續(xù)指定性質(zhì)的復(fù)雜組合動(dòng)作描述。對比結(jié)果如下表所示:



評測指標(biāo)涵蓋視頻質(zhì)量(AQ、IQ、DD、ID)、音頻質(zhì)量(PQ、CU)、音視頻同步性(Sync-C、Sync-D、DeSync)以及提示詞遵循能力(P-Acc)等多個(gè)維度。

實(shí)驗(yàn)結(jié)果表明,在以簡單場景為主的 Verse-Bench 上,Baton 與當(dāng)前領(lǐng)先開源模型 LTX-2 整體表現(xiàn)接近;而在更具挑戰(zhàn)性的 Sem100 上,Baton 展現(xiàn)出明顯優(yōu)勢。

相比 LTX-2,Baton 的提示詞遵循準(zhǔn)確率(P-Acc)提升 32%,多說話人詞錯(cuò)誤率(M-WER)提升 76%,音畫不同步指標(biāo)(DeSync)提升 30%。

其中,M-WER 的提升尤為突出。多說話人場景不僅要求模型理解說話內(nèi)容,更要求準(zhǔn)確判斷「誰在什么時(shí)候說了什么」。這一能力恰恰依賴于 Baton 所構(gòu)建的細(xì)粒度時(shí)序語義規(guī)劃,而傳統(tǒng)全局文本嵌入難以提供這樣的時(shí)間對齊信息。這也進(jìn)一步驗(yàn)證了顯式語義規(guī)劃對于復(fù)雜指令生成的重要性。

此外,團(tuán)隊(duì)還將 Baton 與多款閉源商業(yè)模型進(jìn)行了對比。盡管在視覺質(zhì)量和音頻美感方面,Baton 與頂級商業(yè)系統(tǒng)仍存在一定差距,但在復(fù)雜指令遵循能力上已經(jīng)展現(xiàn)出較強(qiáng)競爭力。



生成結(jié)果展示



視頻鏈接:https://mp.weixin.qq.com/s/UobG7nWamiWMt45L62tCWA

Video Prompt:On a vast barren beach under a pale overcast sky with haze obscuring the flat horizon, a young man with dark messy hair lies face down on the sand, wearing a thick brown hooded wool coat, sand clinging to his clothes and skin. He props himself on his elbows, looking forward. In the far background, a column of sand explodes upward among silhouetted soldiers. The young man flinches in terror and clutches his head while successive blasts draw closer, sending towering columns of sand into the air. A harsh, aggressive soundscape with deep rumbles and piercing screeches builds as the bombardment rapidly approaches his position. Close-up, low angle, rule-of-thirds composition. Natural diffused overcast daylight with soft shadows. Desaturated, monochromatic palette of beige, tan, and muted olive green with a cool color temperature. Gritty realism, tense tone.

Audio Prompt:On a windswept open beach, continuous artillery explosions rumble and crash, growing progressively louder and closer. The blasts intensify from distant muffled thuds into deafening concussive roars, each one shaking the ground harder than the last. Sand and debris scatter and rain down with increasing violence. Beneath the relentless bombardment, rapid shallow breathing and a choked gasp of terror from a soldier [Speaker A] are barely audible.



視頻鏈接:https://mp.weixin.qq.com/s/UobG7nWamiWMt45L62tCWA

Video Prompt:In a indoor martial arts gym with yellow padded bars along the wall, and cool fluorescent overhead lighting, two bald men of Middle Eastern descent stand facing each other. The first, with a short beard, wears a black chest protector over a white t-shirt and remains stationary. The second, with a darker skin tone in a dark polo shirt, stands to his right as the instructor. The instructor delivers a quick punch to the first man's upper body. The camera shifts focus to the first man, who absorbs the hit. The instructor asks him a brief question; the first man nods and responds. The instructor then resumes speaking, using hand gestures to continue his explanation as the instructor looks toward the camera. Medium shot, eye-level, rule-of-thirds composition. Cool-toned overhead lighting with high contrast between brightly lit faces and deep surrounding shadows. Neutral palette of black, grey, and white accented by yellow padded bars and cool blue ambient light. Observational documentary style, focused tone.

Audio Prompt:In a gym with faint ambient echo and a low-level room tone, a mature man [Speaker A] speaks in a steady, instructional tone: \"Think about the idea of short distance power. If someone like this suddenly tries to headbutt me, one, I can hit him." A sharp, percussive thud of a fist striking a padded chest protector rings out immediately after. A brief pause, then [Speaker A] asks in a calm, slightly concerned tone: \"You okay?" Another man [Speaker B] replies affirmatively: \"Yeah." [Speaker A] continues in the same steady, instructional tone: \"And he's not headbutting."



視頻鏈接:https://mp.weixin.qq.com/s/UobG7nWamiWMt45L62tCWA

Video Prompt:At dusk in a desolate clearing beside a rustic log cabin with a thatched roof, shuttered windows, and barrels leaning against its side, under bare trees and a dim overcast sky, a bearded white man with short dark hair, wearing a loose brown shirt and matching pants, squats before a small crackling campfire within a stone ring on dry grass. To his right stands a slender white teenage boy with curly dark hair in an off-white Henley shirt and light trousers, holding a crumpled grey cloth. The man rises from his squat, reaches out with both hands, and takes the cloth from the boy. He turns toward the fire, steps forward, and bends down to drape the cloth over the burning logs. Smoke rises. The boy stands still, watching intently. The man then picks up a long wooden poker from the ground and pokes the smoldering bundle beneath the cloth.

Audio Prompt:A quiet outdoor dusk atmosphere with faint wind rustling dry grass. A small campfire crackles and pops within a stone ring. The soft rustle of cloth being handled. The crackling intensifies briefly as the cloth is draped over the fire, then dulls into a low, muffled smolder. A wooden poker scrapes against stone and prods the embers, stirring hissing smoke.



視頻鏈接:https://mp.weixin.qq.com/s/UobG7nWamiWMt45L62tCWA

Video Prompt:In a dimly lit interior, a close-up shows hands using a knife and fork to slice through a medium-rare steak on a white square plate. After cutting off a piece, the camera tilts upward to reveal a Caucasian woman with long wavy reddish-blonde hair, blue eyes, and fair skin, wearing dark clothing over a collared shirt. She lifts the piece of steak with the fork, brings it to her mouth, and chews slowly. After swallowing, her gaze lowers briefly, then she raises her head and stares intensely forward. Cinematic realism.

Audio Prompt:A knife sawing through steak with a soft, wet slicing sound against the plate. A fork scrapes briefly. Quiet, slow chewing follows. After a pause, a single melancholic piano note rings out.



視頻鏈接:https://mp.weixin.qq.com/s/UobG7nWamiWMt45L62tCWA

Video Prompt:Inside an old car, a girl wearing a grey-white t-shirt first looks down, then smiles slightly while steering along a rural road. A small figurine sits on the dashboard. The camera then pans left to reveal a passenger wearing a colorful wrestling mask.

Audio Prompt:A dramatic orchestral score with sweeping strings. The music is layered with the sounds of a vehicle engine starting and revving. A dog barks repeatedly in the background, its voice echoing slightly as if in an open space. A boy [Speaker A] shouts: \”Ah"



視頻鏈接:https://mp.weixin.qq.com/s/UobG7nWamiWMt45L62tCWA

Video Prompt:On a sunny suburban backyard with green lawn and tall hedges, a woman in a ribbed sweater and black skirt rallies a shuttlecock with a boy across a badminton net. He jogs off-screen to fetch it; she turns toward camera, striding forward with arms raised in playful triumph. A second boy charges in from behind, tackling her into wrestling on the grass — she lifts him, spins him around, both laughing joyously. Handheld tracking shot follows the action, shifting from eye-level to low-angle during the lift.

Audio Prompt:A fast-paced electronic dance music track with a driving beat and synthesized melodies plays throughout the clip. A boy [Speaker A] shouts excitedly in a energetic voice: \"Oh no! Ten points! I'm scared! She's the winner!" A girl [Speaker B] shouts back with equal excitement: \"We're the winners!"



視頻鏈接:https://mp.weixin.qq.com/s/UobG7nWamiWMt45L62tCWA

Video Prompt:On a residential street corner, a young Asian boy in bright blue shorts stands holding a brown Spalding basketball in one hand and a yellow-orange ball in the other. The camera slowly orbits from behind him onto a concrete patio beside a dense green hedge. He drops into a low stance and begins simultaneously dribbling both balls side by side, bouncing them in rhythm as he moves forward step by step along the road. Medium close-up widening to a tracking shot, eye-level shifting to slightly high-angle.

Audio Prompt:Set in an outdoor environment, a young boy [Speaker A] speaks in a clear, instructional tone: \"This is two ball basketball drill.". Immediately after he finishes speaking, the rhythmic, percussive sound of a basketball being dribbled on a hard surface begins and continues for the rest of the clip.



視頻鏈接:https://mp.weixin.qq.com/s/UobG7nWamiWMt45L62tCWA

Video Prompt:A young Caucasian man stands at an outdoor shooting range, holding a scoped AR-15 rifle, he fires several shots at a nearby pine tree, then reloads.

Audio Prompt:In a quiet, open outdoor environment, a sharp gunshot rings out, followed by a male voice [Speaker A] saying \"Ah\" in a neutral tone. Immediately after, another gunshot is fired. After a brief pause, a mechanical click is heard, as if a weapon is being reloaded.



視頻鏈接:https://mp.weixin.qq.com/s/UobG7nWamiWMt45L62tCWA

Video Prompt:On a sunlit outdoor asphalt basketball court, bordered by dense green trees under a clear blue sky, a young man with short brown hair wearing dark sunglasses, a grey baseball cap, a black t-shirt and black Nike athletic shorts. He picks up a red basketball. He stands, turns away from the camera, and walks along the baseline, dribbling the ball between his legs. As he nears the free-throw line he takes a jump shot; the ball arcs over the rim and drops through the net. Medium-to-tracking shot, eye-level, with leading room guiding the viewer toward the basket. Natural high-key golden-hour daylight casts soft shadows across the grey court. Observational tone capturing the action in real time.

Audio Prompt:Set in an outdoor environment with birdsong and faint rustling sounds, a young man [Speaker A] speaks in a calm, encouraging tone: \"Easy peasy, baby.\" The sound of a ball being dribbled on a hard surface is heard, followed by a sharp impact as it hits a backboard or wall. The dribbling resumes, accompanied by the soft thud of the ball bouncing on the ground.

特別聲明:以上內(nèi)容(如有圖片或視頻亦包括在內(nèi))為自媒體平臺“網(wǎng)易號”用戶上傳并發(fā)布,本平臺僅提供信息存儲(chǔ)服務(wù)。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.

相關(guān)推薦
熱點(diǎn)推薦
人民幣突然大漲!有存款的人偷著樂,4類人直接受益,2類人要留心

人民幣突然大漲!有存款的人偷著樂,4類人直接受益,2類人要留心

王二哥老搞笑
2026-06-16 15:09:16
人口50多萬的佛得角爆冷逼平西班牙,當(dāng)?shù)氐闹袊习迥铮罕荣惤Y(jié)束超7個(gè)小時(shí),大批民眾深夜仍在街頭狂歡

人口50多萬的佛得角爆冷逼平西班牙,當(dāng)?shù)氐闹袊习迥铮罕荣惤Y(jié)束超7個(gè)小時(shí),大批民眾深夜仍在街頭狂歡

極目新聞
2026-06-16 14:39:07
世界杯來了,賭球黑幕曝光!莊家如何操縱比賽?

世界杯來了,賭球黑幕曝光!莊家如何操縱比賽?

銜春信
2026-06-17 02:52:40
國內(nèi)喊續(xù)航千里、智駕無敵,出海后為何集體低調(diào)?

國內(nèi)喊續(xù)航千里、智駕無敵,出海后為何集體低調(diào)?

流蘇晚晴
2026-06-10 19:07:09
夫妻性生活:為什么你在床上越“努力”,她越反感?

夫妻性生活:為什么你在床上越“努力”,她越反感?

精彩分享快樂
2026-06-16 12:00:09
在剛剛!中國女籃官宣熱身賽15人大名單!賽程出爐,張子宇瘦很多

在剛剛!中國女籃官宣熱身賽15人大名單!賽程出爐,張子宇瘦很多

老吳說體育
2026-06-16 18:20:30
C羅:是時(shí)候?yàn)槠咸蜒廊σ愿傲?,請像我們一樣相信葡萄牙?duì)

C羅:是時(shí)候?yàn)槠咸蜒廊σ愿傲?,請像我們一樣相信葡萄牙?duì)

懂球帝
2026-06-17 01:51:23
追覓造車被曝?zé)o工廠無資質(zhì)

追覓造車被曝?zé)o工廠無資質(zhì)

三言科技
2026-06-16 08:38:03
老胡這次徹底刷新了壞的下限!

老胡這次徹底刷新了壞的下限!

胖胖說他不胖
2026-06-08 11:50:14
身材豐滿的女生,這樣打扮,給人的感覺,就是太亮眼了

身材豐滿的女生,這樣打扮,給人的感覺,就是太亮眼了

牛彈琴123456
2026-06-09 15:54:55
比2008更恐怖!84歲羅杰斯終極預(yù)言:今年爆發(fā)一生最慘烈危機(jī)

比2008更恐怖!84歲羅杰斯終極預(yù)言:今年爆發(fā)一生最慘烈危機(jī)

流蘇晚晴
2026-06-14 19:47:58
中國女排戰(zhàn)德國14人名單公布,倪非凡主攻,江蘇雙姝缺席

中國女排戰(zhàn)德國14人名單公布,倪非凡主攻,江蘇雙姝缺席

譚顳愛搞笑
2026-06-17 03:31:16
你是什么時(shí)候?qū)γ琅铟鹊??網(wǎng)友:妝前妝后判若兩人

你是什么時(shí)候?qū)γ琅铟鹊??網(wǎng)友:妝前妝后判若兩人

阿康四歲啦
2026-06-11 11:05:35
美參院擬將中國定為“侵略軸心”!為升級涉華敘事戰(zhàn)準(zhǔn)備“惡彈”

美參院擬將中國定為“侵略軸心”!為升級涉華敘事戰(zhàn)準(zhǔn)備“惡彈”

奇跡游行者
2026-06-12 17:56:20
10年來首次,美軍B52墜毀,8人不跳傘當(dāng)場陣亡,特朗普下撤軍令

10年來首次,美軍B52墜毀,8人不跳傘當(dāng)場陣亡,特朗普下撤軍令

司馬平邦
2026-06-16 10:04:59
名嘴:如果必須要用生命來打賭,我相信詹姆斯會(huì)簽約勇士隊(duì)

名嘴:如果必須要用生命來打賭,我相信詹姆斯會(huì)簽約勇士隊(duì)

懂球帝
2026-06-16 15:05:05
14歲初中生離家9天未歸,警方已介入尋找,母親:孩子之前表現(xiàn)反常不想吃飯,哥哥喊他回家他扭頭就跑

14歲初中生離家9天未歸,警方已介入尋找,母親:孩子之前表現(xiàn)反常不想吃飯,哥哥喊他回家他扭頭就跑

極目新聞
2026-06-16 12:39:43
普外科大主任退休,兩個(gè)副主任都不愿意接班!同行吐槽:晉升要花錢、搞關(guān)系,工作量還猛增!年輕醫(yī)生已經(jīng)不愿意當(dāng)領(lǐng)導(dǎo)了?以后誰來管醫(yī)院

普外科大主任退休,兩個(gè)副主任都不愿意接班!同行吐槽:晉升要花錢、搞關(guān)系,工作量還猛增!年輕醫(yī)生已經(jīng)不愿意當(dāng)領(lǐng)導(dǎo)了?以后誰來管醫(yī)院

梅斯醫(yī)學(xué)
2026-06-16 07:53:02
張嘉益得知11歲的小演員王少熙,片酬只有一天三百塊錢很不樂意了

張嘉益得知11歲的小演員王少熙,片酬只有一天三百塊錢很不樂意了

TVB的四小花
2026-06-17 03:22:46
巴基斯坦的天塌了!美國和印度太狠了,中國:真的愛莫能助

巴基斯坦的天塌了!美國和印度太狠了,中國:真的愛莫能助

共工之錨
2026-06-17 00:19:40
2026-06-17 07:23:00
機(jī)器之心Pro incentive-icons
機(jī)器之心Pro
專業(yè)的人工智能媒體
13280文章數(shù) 142670關(guān)注度
往期回顧 全部

科技要聞

DeepSeek融資500億,梁文鋒牢牢握住控制權(quán)

頭條要聞

美被指拒絕以色列看美伊諒解備忘錄 以總理發(fā)聲

頭條要聞

美被指拒絕以色列看美伊諒解備忘錄 以總理發(fā)聲

體育要聞

身價(jià)5萬的門將,擋住了12億歐元的狂轟濫炸

娛樂要聞

吳文忻葬禮:2個(gè)女兒在靈堂內(nèi)茫然失措

財(cái)經(jīng)要聞

從123美元到62美元 白銀價(jià)格上演過山車

汽車要聞

三車齊發(fā) 零跑全新C10/C11/C16上市12.58萬元起

態(tài)度原創(chuàng)

健康
房產(chǎn)
游戲
手機(jī)
軍事航空

粽子一次吃多少不傷胃?專家講解

房產(chǎn)要聞

最新房價(jià):??凇⑷齺?;新房、二手房全線下跌!

LPL迎來破天流量!賽區(qū)最強(qiáng)人氣王重出江湖,賽制卻出大問題?

手機(jī)要聞

2028年的高端iPhone將首發(fā)1.4nm A22 Pro芯片 考慮由臺積電與英特爾共同代工

軍事要聞

美伊達(dá)成諒解備忘錄 內(nèi)塔尼亞胡表態(tài)

無障礙瀏覽 進(jìn)入關(guān)懷版