[论文记录] 2019 - Utilizing Arousal-Valence Relationship for Continuous Prediction of Valence in Movies
[論文記錄] 2019 - Partners in Crime: Utilizing Arousal-Valence Relationship for Continuous Prediction of Valence in Movies
- 論文簡介
- 論文內(nèi)容
- 摘要
- 1 介紹
- 2 數(shù)據(jù)集和特征
- 3 預(yù)測模型
論文簡介
原論文:Partners in Crime: Utilizing Arousal-Valence Relationship for Continuous Prediction of Valence in Movies1
利用“喚醒度-效價(jià)”的關(guān)系進(jìn)行電影中“效價(jià)”的連續(xù)值預(yù)測
以下僅為作者閱讀論文時(shí)的記錄,學(xué)識(shí)淺薄,如有錯(cuò)誤,歡迎指正。
論文內(nèi)容
摘要
-
The arousal-valence model is often used in characterizing human emotions.
預(yù)備知識(shí):喚醒度-效價(jià)模型經(jīng)常被用于表征人類情感。 -
Arousal is defined as the intensity of emotion, while valence is defined as the polarity of emotion.
喚醒度被定義為情感的強(qiáng)度,而效價(jià)被定義為情感的極性。 -
Continuous prediction of valence in entertainment media such as movies is important for applications such as ad placement and personalized recommendations.
應(yīng)用:對(duì)電影等娛樂性媒體中效價(jià)的連續(xù)值預(yù)測對(duì)于廣告投放和個(gè)性化推薦等應(yīng)用非常重要。 -
While arousal can be effectively predicted using audio-visual information in movies, valence is reported to be more difficult to predict as it also involves understanding the semantics of the movie.
問題:雖然在電影中可以利用視聽信息有效預(yù)測喚醒度,但效價(jià)卻更難預(yù)測,因?yàn)樗采婕暗诫娪爸械?strong>語義信息理解。 -
In this paper, for improving valence prediction, we utilize the insight from psychology that valence and arousal are interrelated.
依據(jù):本文為了改進(jìn)效價(jià)預(yù)測,利用了心理學(xué)的觀點(diǎn):效價(jià)和喚醒度相互關(guān)聯(lián)。 -
We use Long Short Term Memory networks (LSTMs) to model the temporal context in movies using standard audio features as input.
算法:我們使用長短期記憶網(wǎng)絡(luò)(LSTMs) 利用標(biāo)準(zhǔn)音頻特征作為輸入,對(duì)電影中時(shí)序上下文進(jìn)行建模。 -
We incorporate arousal-valence interdependence in two ways:
我們用兩種方式將喚醒度-效價(jià)的關(guān)聯(lián)性進(jìn)行結(jié)合: - as a joint loss function to optimize the prediction network;
作為聯(lián)合損失函數(shù)來優(yōu)化網(wǎng)絡(luò); - as a geometric constraint simulating the distribution of arousal-valence observed in psychology literature.
作為一種幾何約束模擬心理學(xué)中觀察到的喚醒度-效價(jià)分布。 -
Using a joint arousal-valence model, we predict continuous valence for a dataset containing Academy Award winning movies.
利用喚醒度-效價(jià)的聯(lián)合模型,本文在奧斯卡獲獎(jiǎng)電影數(shù)據(jù)集上預(yù)測了效價(jià)的連續(xù)值。 -
We report a significant improvement over the state-of-the-art results, with an improved Pearson correlation of 0.69 between the annotation and prediction using the joint model, as compared to a baseline prediction of 0.49 using an independent valence model.
結(jié)果:本文的結(jié)果比SOTA有顯著進(jìn)步,利用聯(lián)合模型進(jìn)行的預(yù)測與標(biāo)注之間的皮爾遜相關(guān)系數(shù)達(dá)到了0.69,而利用獨(dú)立的效價(jià)模型進(jìn)行預(yù)測的基線為0.49。
1 介紹
-
類似于電影這種娛樂媒體可以激發(fā)觀看者一系列的情感,這種情感在強(qiáng)度(intensity) 和極性(polarity) 兩個(gè)維度上隨時(shí)間產(chǎn)生變化;
-
情感變化往往與攝影手法有關(guān),例如:音樂強(qiáng)度(music intensity),語言強(qiáng)度(speech intensity),鏡頭框架(shot framing),構(gòu)圖(composition)和角色運(yùn)動(dòng)(character movements);
-
靜態(tài)因素例如:色調(diào)(color tones)和環(huán)境音(ambient sound),也會(huì)影響到場景的情感極性;
-
電影情感預(yù)測的應(yīng)用很廣泛,例如:
- 投放廣告( place advertisements)
CYadati, K., Katti, H., Kankanhalli, M.: Cavva: Computational affective video-in-video advertising. IEEE Transactions on Multimedia 16(1), 15–23 (2014) - 內(nèi)容推薦(content recommendation)
Canini, L., Benini, S., Leonardi, R.: Affective recommendation of movies based
on selected connotative features. IEEE Transactions on Circuits and Systems for Video Technology 23(4), 636–647 (2013) - 內(nèi)容索引( content indexing)
Zhang, S., Huang, Q., Jiang, S., Gao, W., Tian, Q.: Affective visualization and
retrieval for music video. IEEE Transactions on Multimedia 12(6), 510–522 (2010)
- 投放廣告( place advertisements)
-
提出影視情感可以映射到喚醒度(Arousal)-效價(jià)(Valence) 空間中,喚醒度表示情感的強(qiáng)度,效價(jià)表示情感極性(正向、負(fù)向、中性),如下圖所示,不同場景激發(fā)的情緒被映射到2D空間中對(duì)應(yīng)的位置,整體展現(xiàn)出一個(gè)拋物線的輪廓;
-
這項(xiàng)任務(wù)很具有挑戰(zhàn)性,因?yàn)殡娪皠?dòng)態(tài)融合了聽覺、視覺、文本(語義)多種模態(tài)的信息,一些相關(guān)工作如下:
- 核方法和深度學(xué)習(xí)預(yù)測30個(gè)短電影的VA值
Baveye, Y., Chamaret, C., Dellandr′ea, E., Chen, L.: Affective video content analysis: A multidisciplinary insight. IEEE Transactions on Affective Computing (2017) - 手工視聽特征預(yù)測12部30分鐘奧斯卡獲獎(jiǎng)電影片段的VA值
Malandrakis, N., Potamianos, A., Evangelopoulos, G., Zlatintsi, A.: A supervised
approach to movie emotion tracking. In: Acoustics, Speech and Signal Processing
(ICASSP), 2011 IEEE International Conference on. pp. 2376–2379. IEEE (2011) - 混合專家模型(Mixture-of-Experts ,MoE) 來改進(jìn)視聽模型的融合
Goyal, A., Kumar, N., Guha, T., Narayanan, S.S.: A multimodal mixture-of-experts model for dynamic emotion prediction in movies. In: Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. pp. 2822–2826.
IEEE (2016) - 長短時(shí)記憶網(wǎng)絡(luò)(Long Short Term Memory networks ,LSTMs) 捕獲視聽信息的上下文
Sivaprasad, S., Joshi, T., Agrawal, R., Pedanekar, N.: Multimodal continuous prediction of emotions in movies using long short-term memory networks. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval.
pp. 413–419. ACM (2018)
- 核方法和深度學(xué)習(xí)預(yù)測30個(gè)短電影的VA值
-
觀察發(fā)現(xiàn)效價(jià)(Valence) 往往比喚醒度(Arousal)的預(yù)測效果更差,因?yàn)閂alence預(yù)測需要更多高階語義信息,例如:一場打斗應(yīng)該有負(fù)面的含義,但如果主角贏了就是正向的情感;花園中明亮的景象應(yīng)該有正向的含義,但對(duì)話可能更偏向負(fù)向情感;
-
上述工作都是將Valence與Arousal分開建模的,下述文獻(xiàn)提議將二者聯(lián)合建模,用LSTM在200個(gè)5-30s的短視頻上預(yù)測VA值:
Zhang, L., Zhang, J.: Synchronous prediction of arousal and valence using lstm
network for affective video content analysis. In: 2017 13th International Conference
on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).
pp. 727–732. IEEE (2017) -
本文使用COGNIMUSE數(shù)據(jù)集,因?yàn)楸疚恼J(rèn)為電影的情感預(yù)測在實(shí)際應(yīng)用中很必要,該數(shù)據(jù)集標(biāo)注的Valence與Arousal相關(guān)性較高(0.62),本文希望利用Arousal的信息來預(yù)測Valence,如果能夠利用認(rèn)知心理學(xué)的觀點(diǎn),也就是Arousal和Valence通常在一個(gè)拋物線范圍內(nèi)(如上圖a),也許可以進(jìn)一步提升Valence的預(yù)測;
2 數(shù)據(jù)集和特征
-
數(shù)據(jù)集:COGNIMUSE-包含來自12部奧斯卡獲獎(jiǎng)電影的30分鐘片段
-
Valence和Arousal的標(biāo)注都在 [ 1, 1] 之間(Valence=-1時(shí)為最負(fù)向的情感,Valence-+1時(shí)為最正向的情感;Arousal=-1時(shí)情感強(qiáng)度最低,Arousal=+1時(shí)情感強(qiáng)度最高);
-
本文每間隔5s提取一次情感,也就是將標(biāo)注下采樣至每個(gè)不重復(fù)的5s片段對(duì)應(yīng)一個(gè)標(biāo)注;
-
本文僅用音頻特征作為模型的輸入,因?yàn)榍叭说难芯繕?biāo)明音頻特征對(duì)Valence的預(yù)測更重要;
-
本文提取的音頻特征包括:
- 音頻壓縮性(Audio compressibility)
- 調(diào)和性 (Harmonicity)
- 梅爾頻譜系數(shù)(Mel frequency spectral coefficients ,MFCC) 及其導(dǎo)數(shù)(derivatives)和統(tǒng)計(jì)學(xué)特征(例如:最小值,最大值,平均值)
- 色度(Chroma)及其導(dǎo)數(shù)(derivatives)和統(tǒng)計(jì)學(xué)特征(例如:最小值,最大值,平均值)
-
進(jìn)一步使用Written等人的提出的基于相關(guān)性的特征選擇來縮小輸入的特征集
3 預(yù)測模型
Joshi T, Sivaprasad S, Pedanekar N. Partners in Crime: Utilizing Arousal-Valence Relationship for Continuous Prediction of Valence in Movies[C]//AffCon@ AAAI. 2019. ??
總結(jié)
以上是生活随笔為你收集整理的[论文记录] 2019 - Utilizing Arousal-Valence Relationship for Continuous Prediction of Valence in Movies的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: E类直流-直流变换器 Matlab si
- 下一篇: 历史教学视频信息汇总