Y.-H.H. Tsai*, P.P. Liang*, A. Zadeh, L.-P. Morency, R. Salakhutdinov. Learning Multimodal Representations using Factorized Deep Generative Models. NeurIPS 2018 Workshop on Bayesian Deep Learning

The beauty of the series of work is to combine statistical methods with multimodal machine learning problems. The inherent statistical property gives the model more interpretability/explanations and guaranteed bounds. We employ probabilistic graphical models or statistical kernel methods for multimodal generation, multimodal time-series fusion, and modeling uncertainty in the multimodal environment.

In the example, we present a model that can 1) learn complex intra-modal and cross-modal interactions for prediction and 2) be robust to unexpected missing or noisy modalities during testing. The model factorizes representations into two sets of independent factors: multimodal discriminative and modality-specific generative factors for tackling a joint generative-discriminative objective across multimodal data and labels.