3D Iconic Gesture Database

What is an iconic gesture?

Iconic gestures are pictorial gestures, which depict entities, objects or actions. Humans perform iconic gestures to refer to entities through embodying their shapes. For instance, people often gesture the outline of an object (e.g. a circle for a ball) when referring to it during communication.

What does this dataset contain?

29 subjects (20 males and 9 females) were asked to perform gestures to refer to 20 different virtual 3D objects. Using MS Kinect™, in total 1739 gesture performances were captured each in three formats: color video, depth video and motion of the tracked skeleton (as 3D positions of 20 joints in 30 fps).

How were the gestures performed and captured?

The subjects stand in front of a big display, mounted at about the height of their heads (see figure below). They were asked to refer to the virtual objects by gestures. The were asked to perform the gestures so that people watching the recordeds video could recognize which object is referred to by each gesture. During the study, the subjects could see all 20 different 3D virtual objects which were located at the bottom of the display. At each turn, one of the virtual objects was zoomed in at the center of the screen, and kept rotating so the subject could see the different perspectives. They gave a signal (e.g. “ok”), when they were ready to perform a gesture for that object. At this time, the object disappeared and a photo of an attending person was shown. In this way, on the one hand, the subjects performed gestures using their imagination (as common in human communication). On the other hand, the realistic photo of an attending person should trigger the social cognitive processes, which underlay human communication with an interlocutor. As soon as the subjects retracted their hands back to the rest position, or gave a verbal signal, the next object was shown. The virtual objects were shown in a random order and each of them three times, for a total 60 gestures per subject. Body motions were recorded only while showing the photo of the attending person.

