UCF101 dataset is an extension of UCF50 and consists of 13,320 video clips, which are classified into 101 categories. These 101 categories can be classified into 5 types (Body motion, Human-human interactions, Human-object interactions, Playing musical i
We use a spatial and motion stream cnn with ResNet101 for modeling video information in UCF101 dataset. Reference Paper [1] Two-stream convolutional networks for action recognition in videos [2] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition ...