Zero-Shot Action Recognition has attracted attention in the last years and many approaches have been proposed for recognition of objects, events and actions in images and videos. There is a demand for methods that can classify instances from classes that are not present in the training of models...
模型在30个CV数据集上做了实验,实验任务包括OCR, action recognition in videos, geo-localization, an...
Recently, with the ever-growing action categories, zero-shot action recognition (ZSAR) has been achieved by automatically mining the underlying concepts (e.g., actions, attributes) in videos. However, most existing methods only exploit the visual cues of these concepts but ignore external knowledge...
Then, we propose a method for action recognition by deploying generalized zero-shot learning, which transfers the knowledge of web video to detect the anomalous actions in surveillance videos. To verify the effectiveness of our proposed method, we further construct a new surveillance video dataset ...
因为是Zero-Shot Action Recognition(ZSAR),和ZSL一样,涉及到三个concept:video、attribute、label/action。两种常见的方法如上图的(a)、(b)所示,一是通过对attributes和video的embedding,然后通过knowledge transfer,来对unseen类的action进行识别分类(action-attribute)。另一种是用语义表达semantic representations(如常...
In this paper, we address zero-shot recognition in contemporary video action recognition tasks, using semantic word vector space as the common space to embed videos and category labels. This is more challenging because the mapping between the semantic space and space-time features of videos ...
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR). Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associatio...
Zero-shot spatio-temporal action detection involves identifying a person's actions in a video and recognizing the time and place of these actions without prior training on those specific actions. Large-scale pre-trained vision-language models like CLIP exhibit zero-shot recognition capabilities for ...
Zero-shot learning is essential for the solution of many problems in the real world. For example, in computer vision, zero-shot learning is applied to video action recognition13 and image recognition14 tasks, which can be performed on unseen videos and images. Therefore, zero-shot learning has...
Zero-shot action recognition aims to classify actions not pre- viously seen during training. This is achieved by learning a visual model for the seen source classes and establishing a semantic relationship to the unseen target classes e.g. through the action labels. In order to draw a clear ...