Y. Song, L.-P. Morency and R. Davis. Action Recognition by Hierarchical Sequence Summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013
Typical techniques for sequence modeling rely upon well-segmented sequences which have been edited to remove noisy or irrelevant parts. Therefore, we cannot easily apply such methods to noisy sequences expected in real-world applications.
In one of our projects, we study sequence modeling through the combination of RNNs that captures the temporal dependencies and the attention mechanism that localizes the salient observations which are relevant to the final decision and ignore the irrelevant (noisy) parts of the input sequence.
More recent work uses more powerful neural network models such as Transformers to process longer sequences. One of our more recent projects uses a hierarchical architecture to model multiple temporal resolutions of sequences, allowing representations of data with different degrees of granularity and more easily capturing long-range dependencies. This work is being applied to music generation, where modeling hierarchical structure is especially important.