We propose feeding auxiliary perception modalities as control inputs through an additional vision encoder, which we term as our Versatile visual enCoders (VCoder). In this work, we focus on the task of object perception and leverage a segmentation map, depth map, or both as the control input...
LiDAR localization is a fundamental task in robotics and computer vision, which estimates the pose of a LiDAR point cloud within a global map. Scene coordinate regression (SCR) has demonstrated state-of-the-art performance in this task. In SCR, a scene ...
Random Entangled Tokens for Adversarially Robust Vision Transformer Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models One Prom...
@inproceedings{cui2024anyskill, title={Anyskill: Learning Open-Vocabulary Physical Skill for Interactive Agents}, author={Cui, Jieming and Liu, Tengyu and Liu, Nian and Yang, Yaodong and Zhu, Yixin and Huang, Siyuan}, booktitle=Conference on Computer Vision and Pattern Recognition(CVPR), year...
We integrate it with a vision-language object segmentation framework LISA. Through this, we could further unlock the rich semantic inherent in SAM, for interactive universal object segmentation with Event data. There are some visualizations. Acknowledgments Thanks to VisEvent, COESOT, MVSEC, DDD17,...
212b CellTypeGraph: A New Geometric Computer Vision Benchmark Lorenzo Cerrone; Athul Vijayan; Tejasvinee Mody; Kay Schneitz; Fred A. Hamprecht 213b ContIG: Self-Supervised Multimodal Contrastive Learning for Medical Imaging With Genetics Aiham Taleb; Matthias Kirchler; Remo Monti; Christoph Lippert...
We integrate it with a vision-language object segmentation framework LISA. Through this, we could further unlock the rich semantic inherent in SAM, for interactive universal object segmentation with Event data. There are some visualizations. Acknowledgments Thanks to VisEvent, COESOT, MVSEC, DDD17,...