We introduce the new setting of open-vocabulary object 6D pose estimation, in which a textual prompt is used to specify the object of interest. In contrast to existing approaches, in our setting (i) the object of interest is specified solely through the textual prompt, (ii) no object model...
* 题目: On the Importance of Large Objects in CNN Based Object Detection Algorithms* PDF: arxiv.org/abs/2311.1171* 作者: Ahmed Ben Saad,Gabriele Facciolo,Axel Davy* 题目: CastDet: Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning* PDF: arxiv.org/abs/...
Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion open-world object-detection zero-shot-object-detection open-vocabulary-detection open-vocabulary-segmentation fundation-models ov-dino Updated Sep 15, 2024 Python Charles-Xie / awesome-described-obj...
computer-vision deep-learning pytorch object-detection zero-shot-object-detection open-set-object-detection novel-objects open-vocabulary-detection Updated Oct 29, 2024 Python feifeiobama / OrthogonalDet Star 40 Code Issues Pull requests [CVPR 2024] Exploring Orthogonality in Open World Object Detec...
Visual Object Tracking and Segmentation Challenge 2023 11th Workshop on Assistive Computer Vision and Robotics 1st Workshop on Open-Vocabulary 3D Scene Understanding Recovering 6D Object Pose Visual Perception for Navigation in Human Environments: The JackRabbot Human Motion Forecasting Dataset and Benchmar...
Open Vocabulary Scene Parsing Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba [pdf] [supp] [arXiv] [bibtex] @InProceedings{Zhao_2017_ICCV, author = {Zhao, Hang and Puig, Xavier and Zhou, Bolei and Fidler, Sanja and Torralba, Antonio}, title = {Open Vocabu...
Open-Vocabulary Object Detection Universal Semantic Segmentation [Semantic-SAM]|ECCV'24| Semantic-SAM: Segment and Recognize Anything at Any Granularity |[pdf]|[code] [Open-Vocabulary SAM]|ECCV'24| Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively |[pdf]|[code] ...
Using a pre-built vocabulary-tree, we first assign each 3D average descriptor to its corresponding K-means cell (visual word). At search time, each query image descriptor is as- signed to its closest visual word w. We then select every 3D point in the scene that has the same vis...