single-research

Reseach

Research Opportunities

The MultiComp Lab has a great tradition of including undergraduate and professional masters students in our research. The opportunities listed at the left are single-term, unfunded research appointments that may be extended to additional semesters. Each also includes a brief description, required skills, and contact information. Please email Nicki Siverling with any questions.

Opportunities include:

Human Communication
Mentor: Chaitanya Ahuja
Description: We are looking for driven students to help curate a test bench to study human communication, especially nonverbal behaviour (i.e. hand gestures, facial expressions). The main responsibility of the candidate would be to implement a (semi) automatic approach to find, collect and process youtube videos based on a predefined set of constraints. A good candidate would also show initiative and creativity designing these constraints. To get an idea of the kind of data we are looking to collect, check out http://chahuja.com/pats/
Skills/Experience: Comfortable in python and bash scripting; some knowledge of docker is a plus.
Contact: Interested students should send an email to (Chaitanya Ahuja) with their CV.

Relationships between verbal and nonverbal behavior
Mentor: Chaitanya Ahuja
Description: Human communication is a complex combination of both verbal (i.e. spoken/written language) and nonverbal (i.e. hand gestures, facial expressions). In this project, we would like to study the relationships between verbal and nonverbal behaviour as a combined communicative entity. Specifically, we are looking for candidates to help us pursue a challenging project of speech generation which accurately represents the prosodic components (i.e. intonation, tone, stress, and rhythm) along with the linguistic content (i.e. words or sentences). A goal here is to use the visual information (i.e. hand gestures and facial expressions) to extract the prosodic components which will be incorporated in the speech generation. While the problem statement is fairly well-defined, a good candidate would show initiative in coming up with creative hypotheses. They would also help design and implement the experiment to validate the said hypotheses. Just to give an idea, the kind of data/projects we work with is available at http://chahuja.com/mix-stage/
Skills/Experience: comfortable with coding in python and deep learning libraries such as pytorch/tensorflow/keras; strong background in math and statistics; some machine learning experience is a plus; interest in generative modelling, especially for speech, is a plus.
Contact: Interested students should send an email to (Chaitanya Ahuja) with their CV.

Multimodal Machine Learning
Mentor: Paul Pu Liang
Description: Many real-world agents interact with their environment through a variety of sensors and modalities, such as language, vision (or video), audio, or tactile information. This project will develop models that integrate information from multiple modalities, as well as benchmark their performance in a variety of perfect and imperfect real-world settings. We will explore models that are robust to  noisy or missing modalities, experiment with environments which require agents to take actions grounded in language and vision, and settings in which multiple agents need to communicate in order to complete a task.
Skills/Experience: Prior experience in deep learning is an advantage but not a requirement.
Contact: Interested students should send an email to Paul Pu Liang with their CV and description of
their experience in machine learning.

Meta-learning
Mentor: Paul Pu Liang
Description: Meta-learning aims to design models that can learn new representations or transfer them to new environments rapidly with few training examples. This project will explore meta-learning across modalities (transferring quickly from text to images to speech), distributed meta-learning, and unsupervised meta-learning (without labels).
Skills/Experience: Prior experience in deep learning is an advantage but not a requirement.
Contact: Interested students should send an email to Paul Pu Liang with their CV and description of
their experience in machine learning.

Language Grounding
Mentor: Volkan Cirik
Description: This research is 3-fold. The first one is on the counting problem in Visual Question Answering systems. Typically, models treat the counting problems as another classification problem. Due to this simplification, models are severely limited to the number of object categories seen during the training phase. We will address this problem with a simple yet orthogonal approach. The second one is related to Vision and Language Navigation. Recent studies extend the standard VLN benchmark by adding interaction as a part of the navigation problem. We’ll study the use of modularity for using the interaction supervision to address the fundamental problems in navigation such as diverging from ground-truth trajectory or stopping problem. Finally, we recently introduced a dataset Refer360 to study spatial language understanding in 3D scenes. In this dataset, the speaker refers to a location of a hidden object using 3-5 instruction sentences by observing a partial and dynamic field of view. The follower is trying to find the hidden location in the 3D scene starting from a different partial field of view of the scene. We argue that this new dataset poses several novel challenges to existing vision and language systems since the system needs to address the grounding problem in addition to the interaction problem i.e. changing the field of view. In this project, we plan to build a SLAM-like system to build a representation of the observed scene.
Contact: Interested students should send an email to Volkan Cirik with their CV.

Audiovisual Reasoning about Unseen Objects
Mentor: Peter Wu, Paul Pu Liang
Description: People can reason about objects in their environment even when only listening to sounds, without seeing them directly. This reasoning about audio and objects is not something that current AI systems can do. Most AI systems expect objects to be clearly visible to be able to recognize them. We introduce a dataset and an accompanying algorithm to bridge this gap. Given that this project explores a new area, research work will primarily focus on coding a strong video processing pipeline and creating new deep learning algorithms. Research mentee(s) will learn the skills essential for building nascent multimodal learning algorithms.
Skills/Experience: Candidates should have experience with Python data analysis tools (e.g. pandas) and Python machine learning libraries (e.g. PyTorch).
Contact: Interested students should send an email to Peter Wu with their CV.