當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

[论文记录] 2019 - Utilizing Arousal-Valence Relationship for Continuous Prediction of Valence in Movies

發(fā)布時(shí)間：2023/12/29 编程问答 47 豆豆

生活随笔收集整理的這篇文章主要介紹了 [论文记录] 2019 - Utilizing Arousal-Valence Relationship for Continuous Prediction of Valence in Movies 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

[論文記錄] 2019 - Partners in Crime: Utilizing Arousal-Valence Relationship for Continuous Prediction of Valence in Movies

論文簡介
論文內(nèi)容
- 摘要
- 1 介紹
- 2 數(shù)據(jù)集和特征
- 3 預(yù)測模型

論文簡介

原論文：Partners in Crime: Utilizing Arousal-Valence Relationship for Continuous Prediction of Valence in Movies¹

利用“喚醒度-效價(jià)”的關(guān)系進(jìn)行電影中“效價(jià)”的連續(xù)值預(yù)測

以下僅為作者閱讀論文時(shí)的記錄，學(xué)識(shí)淺薄，如有錯(cuò)誤，歡迎指正。

論文內(nèi)容

摘要

The arousal-valence model is often used in characterizing human emotions.
預(yù)備知識(shí):喚醒度-效價(jià)模型經(jīng)常被用于表征人類情感。
Arousal is defined as the intensity of emotion, while valence is defined as the polarity of emotion.
喚醒度被定義為情感的強(qiáng)度，而效價(jià)被定義為情感的極性。
Continuous prediction of valence in entertainment media such as movies is important for applications such as ad placement and personalized recommendations.
應(yīng)用:對(duì)電影等娛樂性媒體中效價(jià)的連續(xù)值預(yù)測對(duì)于廣告投放和個(gè)性化推薦等應(yīng)用非常重要。
While arousal can be effectively predicted using audio-visual information in movies, valence is reported to be more difficult to predict as it also involves understanding the semantics of the movie.
問題:雖然在電影中可以利用視聽信息有效預(yù)測喚醒度，但效價(jià)卻更難預(yù)測，因?yàn)樗采婕暗诫娪爸械?strong>語義信息理解。
In this paper, for improving valence prediction, we utilize the insight from psychology that valence and arousal are interrelated.
依據(jù):本文為了改進(jìn)效價(jià)預(yù)測，利用了心理學(xué)的觀點(diǎn)：效價(jià)和喚醒度相互關(guān)聯(lián)。
We use Long Short Term Memory networks (LSTMs) to model the temporal context in movies using standard audio features as input.
算法:我們使用長短期記憶網(wǎng)絡(luò)（LSTMs） 利用標(biāo)準(zhǔn)音頻特征作為輸入，對(duì)電影中時(shí)序上下文進(jìn)行建模。
We incorporate arousal-valence interdependence in two ways:
我們用兩種方式將喚醒度-效價(jià)的關(guān)聯(lián)性進(jìn)行結(jié)合：
as a joint loss function to optimize the prediction network；
作為聯(lián)合損失函數(shù)來優(yōu)化網(wǎng)絡(luò)；
as a geometric constraint simulating the distribution of arousal-valence observed in psychology literature.
作為一種幾何約束模擬心理學(xué)中觀察到的喚醒度-效價(jià)分布。
Using a joint arousal-valence model, we predict continuous valence for a dataset containing Academy Award winning movies.
利用喚醒度-效價(jià)的聯(lián)合模型，本文在奧斯卡獲獎(jiǎng)電影數(shù)據(jù)集上預(yù)測了效價(jià)的連續(xù)值。
We report a significant improvement over the state-of-the-art results, with an improved Pearson correlation of 0.69 between the annotation and prediction using the joint model, as compared to a baseline prediction of 0.49 using an independent valence model.
結(jié)果:本文的結(jié)果比SOTA有顯著進(jìn)步，利用聯(lián)合模型進(jìn)行的預(yù)測與標(biāo)注之間的皮爾遜相關(guān)系數(shù)達(dá)到了0.69，而利用獨(dú)立的效價(jià)模型進(jìn)行預(yù)測的基線為0.49。

1 介紹

類似于電影這種娛樂媒體可以激發(fā)觀看者一系列的情感，這種情感在強(qiáng)度（intensity） 和極性（polarity） 兩個(gè)維度上隨時(shí)間產(chǎn)生變化；
情感變化往往與攝影手法有關(guān)，例如：音樂強(qiáng)度（music intensity），語言強(qiáng)度（speech intensity），鏡頭框架（shot framing），構(gòu)圖（composition）和角色運(yùn)動(dòng)（character movements）；
靜態(tài)因素例如：色調(diào)（color tones）和環(huán)境音（ambient sound），也會(huì)影響到場景的情感極性；
電影情感預(yù)測的應(yīng)用很廣泛，例如：
- 投放廣告（ place advertisements）
  CYadati, K., Katti, H., Kankanhalli, M.: Cavva: Computational affective video-in-video advertising. IEEE Transactions on Multimedia 16(1), 15–23 (2014)
- 內(nèi)容推薦（content recommendation）
  Canini, L., Benini, S., Leonardi, R.: Affective recommendation of movies based
  on selected connotative features. IEEE Transactions on Circuits and Systems for Video Technology 23(4), 636–647 (2013)
- 內(nèi)容索引（ content indexing）
  Zhang, S., Huang, Q., Jiang, S., Gao, W., Tian, Q.: Affective visualization and
  retrieval for music video. IEEE Transactions on Multimedia 12(6), 510–522 (2010)
提出影視情感可以映射到喚醒度（Arousal）-效價(jià)（Valence） 空間中，喚醒度表示情感的強(qiáng)度，效價(jià)表示情感極性（正向、負(fù)向、中性），如下圖所示，不同場景激發(fā)的情緒被映射到2D空間中對(duì)應(yīng)的位置，整體展現(xiàn)出一個(gè)拋物線的輪廓；
這項(xiàng)任務(wù)很具有挑戰(zhàn)性，因?yàn)殡娪皠?dòng)態(tài)融合了聽覺、視覺、文本（語義）多種模態(tài)的信息，一些相關(guān)工作如下：
- 核方法和深度學(xué)習(xí)預(yù)測30個(gè)短電影的VA值
  Baveye, Y., Chamaret, C., Dellandr′ea, E., Chen, L.: Affective video content analysis: A multidisciplinary insight. IEEE Transactions on Affective Computing (2017)
- 手工視聽特征預(yù)測12部30分鐘奧斯卡獲獎(jiǎng)電影片段的VA值
  Malandrakis, N., Potamianos, A., Evangelopoulos, G., Zlatintsi, A.: A supervised
  approach to movie emotion tracking. In: Acoustics, Speech and Signal Processing
  (ICASSP), 2011 IEEE International Conference on. pp. 2376–2379. IEEE (2011)
- 混合專家模型（Mixture-of-Experts ，MoE） 來改進(jìn)視聽模型的融合
  Goyal, A., Kumar, N., Guha, T., Narayanan, S.S.: A multimodal mixture-of-experts model for dynamic emotion prediction in movies. In: Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. pp. 2822–2826.
  IEEE (2016)
- 長短時(shí)記憶網(wǎng)絡(luò)（Long Short Term Memory networks ，LSTMs） 捕獲視聽信息的上下文
  Sivaprasad, S., Joshi, T., Agrawal, R., Pedanekar, N.: Multimodal continuous prediction of emotions in movies using long short-term memory networks. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval.
  pp. 413–419. ACM (2018)
觀察發(fā)現(xiàn)效價(jià)（Valence） 往往比喚醒度（Arousal）的預(yù)測效果更差，因?yàn)閂alence預(yù)測需要更多高階語義信息，例如：一場打斗應(yīng)該有負(fù)面的含義，但如果主角贏了就是正向的情感；花園中明亮的景象應(yīng)該有正向的含義，但對(duì)話可能更偏向負(fù)向情感；
上述工作都是將Valence與Arousal分開建模的，下述文獻(xiàn)提議將二者聯(lián)合建模，用LSTM在200個(gè)5-30s的短視頻上預(yù)測VA值：
Zhang, L., Zhang, J.: Synchronous prediction of arousal and valence using lstm
network for affective video content analysis. In: 2017 13th International Conference
on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).
pp. 727–732. IEEE (2017)
本文使用COGNIMUSE數(shù)據(jù)集，因?yàn)楸疚恼J(rèn)為電影的情感預(yù)測在實(shí)際應(yīng)用中很必要，該數(shù)據(jù)集標(biāo)注的Valence與Arousal相關(guān)性較高(0.62)，本文希望利用Arousal的信息來預(yù)測Valence，如果能夠利用認(rèn)知心理學(xué)的觀點(diǎn)，也就是Arousal和Valence通常在一個(gè)拋物線范圍內(nèi)（如上圖a），也許可以進(jìn)一步提升Valence的預(yù)測；

2 數(shù)據(jù)集和特征

數(shù)據(jù)集：COGNIMUSE-包含來自12部奧斯卡獲獎(jiǎng)電影的30分鐘片段
Valence和Arousal的標(biāo)注都在 [ 1, 1] 之間（Valence=-1時(shí)為最負(fù)向的情感，Valence-+1時(shí)為最正向的情感；Arousal=-1時(shí)情感強(qiáng)度最低，Arousal=+1時(shí)情感強(qiáng)度最高）；
本文每間隔5s提取一次情感，也就是將標(biāo)注下采樣至每個(gè)不重復(fù)的5s片段對(duì)應(yīng)一個(gè)標(biāo)注；
本文僅用音頻特征作為模型的輸入，因?yàn)榍叭说难芯繕?biāo)明音頻特征對(duì)Valence的預(yù)測更重要；
本文提取的音頻特征包括：
- 音頻壓縮性（Audio compressibility）
- 調(diào)和性（Harmonicity）
- 梅爾頻譜系數(shù)（Mel frequency spectral coefficients ，MFCC）及其導(dǎo)數(shù)（derivatives）和統(tǒng)計(jì)學(xué)特征（例如：最小值，最大值，平均值）
- 色度（Chroma）及其導(dǎo)數(shù)（derivatives）和統(tǒng)計(jì)學(xué)特征（例如：最小值，最大值，平均值）
進(jìn)一步使用Written等人的提出的基于相關(guān)性的特征選擇來縮小輸入的特征集

3 預(yù)測模型

Joshi T, Sivaprasad S, Pedanekar N. Partners in Crime: Utilizing Arousal-Valence Relationship for Continuous Prediction of Valence in Movies[C]//AffCon@ AAAI. 2019. ??

總結(jié)

以上是生活随笔為你收集整理的[论文记录] 2019 - Utilizing Arousal-Valence Relationship for Continuous Prediction of Valence in Movies的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： E类直流-直流变换器 Matlab si
下一篇：历史教学视频信息汇总