single

Time-slice Prediction of Dyadic Human Activities

Time-slice Prediction of Dyadic Human Activities

M. Ziaeefard, R. Bergevin and L.-P. Morency, Time-slice Prediction of Dyadic Human Activities. In Proceedings of the British Machine Vision conference (BMVC), 2015

Publications

Typical techniques for sequence modeling rely upon well-segmented sequences which have been edited to remove noisy or irrelevant parts. Therefore, we cannot easily apply such methods to noisy sequences expected in real-world applications.

We study sequence modeling through the combination of RNNs that captures the temporal dependencies and the attention mechanism that localizes the salient observations which are relevant to the final decision and ignore the irrelevant (noisy) parts of the input sequence.

 

The success of video-sharing and social network websites has led to an increased posting of online multimedia content, with a large proportion of these videos being human-centric. The sheer amount of such data promotes research on behavior understanding that can discover the affective and social states within human-centric multimedia content.  We can model personality and social interaction via temporal modeling and multimodal fusion.

Rapport: Rapport is a harmonious relationship in which people are coordinated and understand each other. The power of rapport in social interactions has inspired us to develop the intelligent virtual agent that induces the subjective feeling and many of the behavioral benefits of the psychological concept of rapport. Moreover, we develop the system of automatic detection for remote peer tutoring.

Persuasiveness: Persuasiveness is a high-level personality trait that quantifies the influence a speaker has on the beliefs, attitudes, intentions, motivations, and behavior of the audience. With social multimedia becoming an important channel in propagating ideas and opinions, analyzing persuasiveness is very important. Inspired by the success of deep learning techniques, we study the persuasiveness prediction with deep multimodal fusion that combines signals from the visual, acoustic, and text modalities effectively.