Human communicative language is inherently multimodal. We as humans use a combination of language, gesture, and voice to convey our intentions. Thus there are three modalities present in human multimodal language: language, vision, and acoustics. Multimodal sentiment analysis is an extension of the current language-based sentiment analysis (mostly applied on written text and tweets) to a multimodal setup. Similarly, emotions can be inferred from multimodal configurations based on cues in language, gesture, and voice. Differences in communicative traits can be mapped to different speaker characteristics including persuasiveness and presentation skills. All of these various forms of analysis can be performed by observing the communicative behavior of a speaker. Sentiment analysis, emotion recognition, and speaker trait recognition can be done at video level, utterance level or sentence level.
The MultiComp lab has developed multiple datasets over the years to enable the studies targeting multimodal sentiment analysis, emotion recognition, and speaker traits recognition. CMU-MOSI2 is the largest dataset of multimodal sentiment analysis and emotion recognition at the sentence level. This dataset will be publicly available in February 2018. CMU-MOSI is a dataset of opinion-level sentiment intensity analysis. EMO-REACT is a child emotion recognition dataset. POM is a dataset of multimodal sentiment analysis and speaker traits recognition. ICT-MMMO is a dataset of video-level multimodal sentiment analysis. YouTube is a dataset of utterance-level multimodal sentiment analysis. MOUD is a dataset of utterance-level multimodal sentiment analysis in Spanish.
H. Wang, A. Meghawat, L.-P. Morency and E. Xing. Select-Additive Learning: Improving Generalization in Multimodal Sentiment Analysis. In Proceedings of the IEEE International Conference on Multimedia & Expo (ICME), 2017
S. Poria, E. Cambria, D. Hazarika, N. Mazumder, A. Zadeh and L.-P. Morency, Context-Dependent Sentiment Analysis in User-Generated Videos, In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2017
A. Zadeh, R. Zellers, E. Pincus, L.-P. Morency, Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages, IEEE Intelligent Systems, Volume 31, Issue 6, Nov-Dec 2016
S. Park, H. Shim, M. Chatterjee, K. Sagae and L.-P. Morency, Multimodal Analysis and Prediction of Persuasiveness in Online Social Multimedia, ACM Transactions on Interactive Intelligent Systems (TiiS), Volume 6, Issue 3, October 2016
B. Nojavanasghari, D. Gopinath, J. Koushik, T. Baltrušaitis and L.-P. Morency. Deep Multimodal Fusion for Persuasiveness Prediction. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), 2016
H.S. Shim, S. Park, M. Chatterjee, S. Scherer, K. Sagae, and L.-P. Morency. Acoustic and Para-verbal Indicators of Persuasiveness in Social Multimedia. In Proceedings of the 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
M.Chatterjee, S. Park, S. Scherer and L.-P. Morency. Combining Two Perspectives on Classifying Multimodal Data for Recognizing Speaker Traits. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), 2015 *** Best Paper Award ***
T. Wörtwein, M. Chollet, B. Schauerte, L.-P. Morency, R. Stiefelhagen and S. Scherer. Multimodal Public Speaking Performance Assessment. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), 2015
M. Chatterjee, H. S. Shim, S. Park, K. Sagea and L.-P. Morency, Verbal Behaviors and Persuasiveness in Online Multimedia Content. In Proceedings of the AFNLP Special Interest Group on Natural Language Processing for Social Media (SocialNLP), 2014
S. Park, M. Chatterjee, H. S. Shim, K. Sagea and L.-P. Morency, Computational Analysis of Persuasiveness in Social Multimedia: A Novel Dataset and Multimodal Prediction Approach. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), 2014
A. B. Zadeh, K. Sagea and L.-P. Morency, Towards Learning Nonverbal Identities from the Web: Automatically Identifying Visually-Accentuated Words. In Proceedings of the 14th International Conference on Intelligent Virtual Agents (IVA), 2014
V. P. Rosas, R. Mihalcea and L.-P. Morency. Utterance-Level Multimodal Sentiment Analysis. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2013