Second Grand Challenge and Workshop on Multimodal Language
ACL 2020
Location: ACL2020 – Seattle, WA, USA
Date: July 10th


Sponsored by: NSF and Intel


We hope everyone and their loved ones are staying safe during the COVID-19 pandemic.

[Extension]  We understand that quarantine is now in effect in many countries around the world, and that has taken a heavy toll on the speed of scientific research. Most of these days have been overlapping with the preparation days for this workshop. Due to this, and to allow fairness in submissions, we are extending all the deadlines by at least 3 weeks (deadlines hour still end of the day everywhere on earth). We hope this extension makes up for the lost time during the pandemic. The new deadlines are shown below.

[Online] As for the workshop day (July 10th 2020), we follow the same policy as the ACL 2020. We will hold the workshop online, and a zoom link will be announced here. The workshop is open to everyone without the need for conference registration.  The workshop will start at 6:00 a.m PDT (Seattle time).



[7/10/2020] Thank you everyone for participating! 

[7/10/2020] Best paper award with $1000 reward: A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis”, Jean-Benoit Delbrouck, Noé Tits, Mathilde Brousmiche and Stéphane Dupont

[7/10/2020] Best paper runner up awards with $250 reward each:

A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews”, Edison Marrese-Taylor, Cristian Rodriguez, Jorge Balazs, Stephen Gould and Yu- taka Matsuo

Multilogue-Net: A Context-Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation”, Aman Shenoy and Ashish Sardana

[7/10/2020] Workshop now happening on ACL live! 

[6/13/2020] Proceedings of the workshop now available to download from this link. We have 9 archival papers and 2 non-archival submissions. Zoom link:

[3/10/2020] Submission portal now open for all papers:

[1/07/2020] Deadline for submitting Grand-Challenge papers is May 1st now May 20th. Workshop papers are due April 25th now May 18th.

[1/07/2020] CMU-MOSEI Grand-Challenge test data will be released on Feb 15th. Please check the challenge github link for access.

[1/07/2020] Grand challenge GitHub page is released. Please check the github link.

[1/05/2020] Workshop page is published. Submission template can be downloaded here (identical to ACL 2020): download zip, access overleaf


Table of Contents

1. Keynotes
2. Scope and Related Areas
3. Workshop Track
4. Grand-Challenge Track
5. Workshop Schedule
6. Organizing Committee


Speaker: Rada Mihalcea – University of Michigan (USA)

Topic: Multimodal Language and Affect

Biography: Rada Mihalcea is a Professor in the Computer Science and Engineering department at the University of Michigan. Her research interests are in computational linguistics, with a focus on lexical semantics, multilingual natural language processing, and computational social sciences. She serves or has served on the editorial boards of the Journals of Computational Linguistics, Language Resources and Evaluations, Natural Language Engineering, Research in Language in Computation, IEEE Transactions on Affective Computing, and Transactions of the Association for Computational Linguistics. She was a program co-chair for ACL (2011) and EMNLP (2009), and a general chair for the NAACL (2015). She is the recipient of a National Science Foundation CAREER award (2008) and a Presidential Early Career Award for Scientists and Engineers (2009).

Speaker: Ruslan Salakhutdinov – Carnegie Mellon University (USA)

Topic: Multimodal Dialogue and RL

Biography: Ruslan Salakhutdinov received his Ph.D. in machine learning (computer science) from the University of Toronto in 2009. In February of 2016, he joined the Machine Learning Department at Carnegie Mellon University as an Associate Professor. Ruslan’s primary interests lie in deep learning, machine learning, and large-scale optimization. He is an action editor of the Journal of Machine Learning Research and served on the senior program committee of several learning conferences including NeurIPS and ICML. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Connaught New Researcher Award, Google Faculty Award, Nvidia’s Pioneers of AI award, and is a Senior Fellow of the Canadian Institute for Advanced Research.

Speaker: M. Ehsan Hoque – University of Rochester (USA)

Topic: Multimodal Healthcare and Education

Biography: Dr. Hoque is an assistant professor of Computer Science and the Asaro-Biggar (’92) Family fellow at the University of Rochester. From Jan, 2018 to June 2019, he was the interim Director of the Goergen Institute for Data Science. He co-leads the Rochester Human-Computer Interaction (ROC HCI) Group. He received his PhD from MIT in 2013. His research interests span around developing computational tools to recognize the subtle nuances of human communication with a direct application of improving human ability.

Speaker: Yejin Choi – University of Washington (USA)

Topic: Multimodal Commonsense

Biography: Yejin Choi is an associate professor of Paul G. Allen School of Computer Science & Engineering at the University of Washington, adjunct of the Linguistics department, and affiliate of the Center for Statistics and Social Sciences. She is also a senior research manager at the Allen Institute for Artificial Intelligence. She is a co-recepient of the Marr Prize (best paper award) at ICCV 2013, a recepient of Borg Early Career Award (BECA) in 2018, and named among IEEE AI’s 10 to Watch in 2016. She received her Ph.D. in Computer Science at Cornell University (advisor: Prof. Claire Cardie) and BS in Computer Science and Engineering at Seoul National University in Korea.


Scope and Related Areas

Humans adopt a structured multimodal signal to communicate with each other. This signal contains modalities of language (in terms of spoken text), visual (in terms of gestures and facial expressions) and acoustic (in terms of changes in tone of speech). This form of communication is commonly known as multimodal language. Modeling multimodal language is now a growing research area in NLP. It pushes the boundaries of multimodal learning, and requires advanced neural modeling of all three constituent modalities. Advances in this research area allow the field of NLP to take the leap towards better generalization to real-world communication (as opposed to limitation to textual applications), and better downstream impact on applications within fields such as Conversational AI, Virtual Reality, Robotics, HCI, Healthcare, and Education.

In the past few years, there has been a surge of research papers in NLP conference including ACL, EMNLP, and NAACL on the topic of modeling multimodal language. Similar to first Challenge-HML in ACL18, the Second Grand-Challenge and Workshop on Multimodal Language (Chellenge-HML) brings the researchers across the globe once again together at ACL conference, to address the following fundamental current and future research challenges within multimodal language modeling.

  • • Neural Modeling of Multimodal Language
  • • Multimodal Dialogue Modeling and Generation
  • • Multimodal Sentiment Analysis and Emotion Recognition
  • • Language, Vision and Speech
  • • Multimodal Artificial Social Intelligence Modeling
  • • Multimodal Commonsense Reasoning
  • • Multimodal RL and Control (Human-robot communication and multimodal language for robots)
  • • Multimodal Healthcare
  • • Multimodal Educational Systems
  • • Multimodal Affective Computing
  • • Multimodal Fusion and Alignment
  • • Multimodal Representation Learning
  • • Multimodal Sequential Modeling
  • • Multimodal Co-learning and Transfer Learning
  • • Multimodal Active Learning
  • • Multimodal and Multimedia Resources
  • • Creative Applications of Multimodal Learning in E-commerce, Art, and other Impactful Areas.


Workshop Track


Archival Track: Workshop papers are either full (8 pages) or short (4 pages) papers with infinite references. The formatting instructions are identical to ACL 2020 paper format. ACL guidelines suggest papers should be self-contained, with high presentation quality and properly compared to previous works. Publications related to all areas above are welcome to submit a paper. The submission should have at least two modalities present in their experiments (i.e. works should be related to multimodal learning). The workshop track encourages creative applications of multimodal learning, over new or understudied datasets.

All accepted papers will be presented during poster sessions at the workshop. Selected papers will also be invited as contributed talks. All the papers (full or short) will be indexed as part of the ACL 2020 workshop proceedings. All submitted material should be considered novel at the time of submission. Further results and experiments on existing works is also acceptable for submission.

Submission template: download zip, access overleaf.

Non-archival Track: The workshop also includes a non-archival track to allow submission of previously published papers and double submissions to other conferences or journals. Accepted non-archival papers will still be presented as posters or talks at the workshop. There are no formatting or page restrictions for non-archival submissions. The accepted papers to the non-archival track will be displayed on the workshop website, but will NOT be included in the ACL 2020 Workshop proceedings or otherwise archived.

Grand-Challenge Track



Grand-challenge (shared task) papers are between 6-8 pages of content with infinite references. The formatting instructions are identical to ACL 2020 paper format. ACL guidelines (see here) suggest papers should be self-contained, with high presentation quality and properly compared to previous works. It is mandatory that papers should contain results of the test set (only for grand-challenge submission). All the papers will be indexed as part of the ACL 2020 workshop proceedings.

Submission template: download zip, access overleaf

The Grand Challenge offers two datasets for sentiment analysis and emotion recognition: CMU-MOSEI (certificate and a grand prize with value >$1000 USD for winner) and MELD (certificate). Teams can submit to one or both dataset tracks. Further details on how to start with each of the datasets is released on their respective Challenge-HML 2020 Github pages above. Test set is released on Feb 15th 2020. Authors have until April 25th May 18th to submit their test results and until May 1st 2020 May 20th 2020 to submit final papers. To get started with datasets, please refer to the github link.


Important Dates

  • Grand-Challenge test evaluation last call: May 18th
  • Paper Deadline: April 25th May 18th (Workshop) and May 1st May 20th (Grand-Challenge)
  • Grand challenge test data release: Feb 15th
  • Notification of Acceptance: May 26th
  • Camera-ready: June 2nd
  • Workshop day: July 10
  • Workshop location: ACL 2020, Seattle, USA, Remote Broadcasting Link to Be Announced

**All deadlines @11:59 pm anywhere on Earth – year 2020)**


Title Speakers Type Date Start time (PT) End Time (PT) Duration [min]
Opening Remarks Amir Zadeh LIVE INTRO 07/10/2020 6:00 6:10 10 minutes
Keynote for Second Grand-Challenge and Workshop on Multimodal Language Ruslan Salakhutdinov LIVE PRESENTATION 07/10/2020 6:10 6:50 40 minutes
QA For Keynote Ruslan Salakhutdinov   LIVE DISCUSSION 07/10/2020 6:50 7:05 15 minutes
AI Sensing for Robotics using Deep Learning based Visual and Language Modeling Yuvaram Singh PRE-RECORDED 07/10/2020 7:05 7:20 15 minutes
Multilogue-Net: A Context-Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation Aman Shenoy PRE-RECORDED 07/10/2020 7:20 7:35 15 minutes
A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews Edison Marrese-Taylor PRE-RECORDED 07/10/2020 7:35 7:50 15 minutes
Leveraging Multimodal Behavioral Analytics for Automated Job Interview Performance Assessment and Feedback Anumeha Agrawal PRE-RECORDED 07/10/2020 7:50 8:05 15 minutes
QA Session 1 – Yuvaram, Aman, Edison, Anumeha   LIVE DISCUSSION 07/10/2020 8:05 8:25 20 minutes
Break 1     07/10/2020 8:25 8:40 15 minutes
Keynote for Second Grand-Challenge and Workshop on Multimodal Language Rada Mihalcea LIVE PRESENTATION 07/10/2020 8:40 9:20 40 minutes
QA For Keynote Rada Mihalcea   LIVE DISCUSSION 07/10/2020 9:20 9:35 15 minutes
Exploring Weaknesses of {VQA} Models through Attribution Driven Insights Shaunak Halbe PRE-RECORDED 07/10/2020 9:35 9:50 15 minutes
Unsupervised Online Grounding of Natural Language during Human-Robot Interactions Oliver Roesler PRE-RECORDED 07/10/2020 9:50 10:05 15 minutes
A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis Jean-Benoit Delbrouck PRE-RECORDED 07/10/2020 10:05 10:20 15 minutes
QA Session 2 – Shaunak, Oliver, Jean-Benoit   LIVE DISCUSSION 07/10/2020 10:20 10:35 15 minutes
Break 2     07/10/2020 10:35 10:50 15 minutes
Keynote for Second Grand-Challenge and Workshop on Multimodal Language Yejin Choi PRE-RECORDED 07/10/2020 10:50 11:30 40 minutes
QA For Keynote Yejin Choi   LIVE DISCUSSION 07/10/2020 11:30 11:45 15 minutes
Break 3     07/10/2020 11:45 12:30 45 minutes
Low Rank Fusion based Transformers for Multimodal Sequences Saurav Sahay PRE-RECORDED 07/10/2020 12:30 12:45 15 minutes
Audio-Visual Understanding of Passenger Intents for In-Cabin Conversational Agents Eda Okur PRE-RECORDED 07/10/2020 12:45 13:00 15 minutes
Cross-Modal Data Programming Enables Rapid Medical Machine Learning Jared Dunnmon PRE-RECORDED 07/10/2020 13:00 13:15 15 minutes
What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets Jed Yang PRE-RECORDED 07/10/2020 13:15 13:30 15 minutes
QA Session 3 – Saurav, Eda, Jared, Jed   LIVE DISCUSSION 07/10/2020 13:30 13:45 15 minutes
Keynote for Second Grand-Challenge and Workshop on Multimodal Language M. Ehsan Hoque PRE-RECORDED 07/10/2020 13:45 14:25 40 minutes
QA For Keynote Ehsan Hoque   LIVE DISCUSSION 07/10/2020 14:25 14:40 15 minutes
Closing Remarks Amir Zadeh LIVE PRESENTATION 07/10/2020 14:40 14:50 10 minutes


Organizing Committee

Amir Zadeh – Language Technologies Institute, Carnegie Mellon University
Louis-Philippe Morency – Language Technologies Institute, Carnegie Mellon University
Paul Pu Liang – Machine Learning Department, Carnegie Mellon University
Soujanya Poria – Singapore University of Technology and Design