Background: Deep neural network based methods have obtained great progress in a variety
of computer vision tasks, as described in various patents. But, so far, it is still a challenge task to
model temporal dependencies in the tasks of recognizing object movement from videos.
Method: In this paper, we propose a multi-timescale gated neural network for encoding the temporal
dependencies from videos. The developed model stacks multiple gated layers in a recurrent pyramid,
which makes it possible to hierarchically model not just pairs but long-term dependencies from video
frames. Additionally, the model combines the Convolutional Neural Networks into its structure that
exploits the pictorial nature of the frames and reduces the number of model parameters.
Result: We evaluated the proposed model on the datasets of synthetic bouncing-MNIST, standard
actions benchmark of UCF101 and facial expressions benchmark of CK+. The experiment results
reveal that on all tasks, the proposed model outperforms the existing approach to build deep stacked
gated model and achieves superior performance compared to several recent state-of-the-art techniques.
Conclusion: From the experimental results, we can make the conclusion that our proposed model is
able to adapt its structure based on different time scales and can be applied in motion estimation, action
recognition and tracking, etc.