现有的时序动作检测(temporal action detection, TAD)方法依赖于包含片段级标注的大量训练数据,在推断时只能识别之前看到的类别。为每个感兴趣的类收集和注释大型训练集是昂贵的,因此是不可伸缩的。Zero-shotTAD (ZS-TAD)解决了这一障碍,它使预训练模型能够识别未见过的动作类别。与此同时,ZS-TAD的挑战性也大大降低...
2.主要方法 作者效仿经典TAD,将多标签TAD重新定义为一个instance-level的检测任务,每个动作预测用开始结束时间和动作类别来表示,从而提高定位能力,形成更加完整的动作预测框。 但是直接将TAD的方法迁移到多标签的场景下是不够的,因为TAD方法大多依赖于基于segment的表示,并且通过均匀采样来形成动作表征,这样会导致生成的...
CVPR2020论文解读:G-TAD: Sub-Graph Localization for Temporal Action Detection 作者已经公布了代码https://github.com/frostinassiky/gtad 视频的上下文关系是个关键问题,但是目前的工作主要集中在时序的上下文内容,忽略了语义的上下文内容。这篇文章提出了一个GCN模型,将多层次语义上下文自适应地融合到视频特征中,并...
Existing temporal action detection (TAD) methods rely on large training data including segment-level annotations, limited to recognizing previously seen classes alone during inference. Collecting and annotating a large training set for each class of interest is costly and hence unscalable. Zero-shot ...
TAD head 功能 Snippet classifcation stream 流程 公式 Temporal mask steam 流程 公式 Boundary Refnement—— interaction mechanism 功能 流程 hard snippets 定义 检测 Model Training Self-supervised pre-training semi-supervised fne-tuning 总体LOSS 【论文】Semi-Supervised Temporal Action Detection with Pr...
1. Introduction With the development of information technology, the numbers of videos generated and accessed are rapidly in- creasing, underscoring the need for automatic video under- standing, such as human action recognition and temporal action detection (TAD)1. Action recogni...
TriDet: Temporal Action Detection with Relative Boundary Modeling Dingfeng Shi, Yujie Zhong, Qiong Cao, Lin Ma, Jia Li, Dacheng Tao Temporal Action Detection (TAD) Detect all action boundaries and categories from an untrimmed video. duet dance group dance T Video frames Temporal Action Detection ...
Temporal Action Detection (TAD) is fundamental yet challenging for real-world video applications. Leveraging the unique benefits of transformers, various DETR-based approaches have been adopted in TAD. However, it has recently been identified that the attention collapse in self-attention causes the per...
(1)研究了如何利用大量预训练的ViL模型进行未修剪视频中的zero-shot时序动作定位(ZS-TAD)的问题。 (2)提出了一种新的one-stage分类定位模型STALE,该模型在并行分类和定位设计的同时引入了一个可学习的class-agnostic掩码组件,以实现zero-shot迁移到未见过的类。为了增强跨模态任务的自适应能力,在Transformer框架中引...
•TadTR,通过Transformer编码器-解码器架构将学习到的action embedding映射到相应的action prediction上去。 G-TAD: Sub-Graph Localization for Temporal Action Detection, CVPR 2019Learning Salient Boundary Feature for Anchor-free Temporal Action Localization, CVPR 2021End-to-end Temporal Action Detection with ...