In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. While convolutional neural networks can categorize scenes well, they also learn an intermediate representation not aligned ...
In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. While convolutional neural networks can categorize scenes well, they also learn an intermediate representation not aligned ...
However, for a variety of task-driven robots or 3D scene understanding, RGB semantic segmentation cannot provide 3D geometric information, which limits the practical application of semantic segmentation [13]. To address the issue, many works try to fuse depth information during semantic segmentation....
Scene graph generationImage-to-text translationVisual relationship detectionComputer visionUnderstanding a visual scene requires not only identifying single objects ... Y Wang,Y Gao,SHB Yu - Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solv...
We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. Our approach scales to any number of views, and is view-agnostic. We analyze...
the practical elements and suppress the invalid ones. The method has been validated on the Cornell and Jacquard grasping datasets with cross-validation accuracies of 99.3% and 94.6%. The single-object and multi-object scene grasping success rates on the robot platform are 95.625% and 87.5%, ...
Earlier, sound engineers could program all the reverberation parameters in advance for a scene or if the audience was in a fixed position. However, adjusting the reverberation parameters using conventional methods is difficult because all such parameters cannot be programmed for AR applications. ...
deep-learning optical-flow autonomous-driving mobile-robotics motion-segmentation scene-flow cross-modal-learning 4d-radar automotive-radar ego-motion-estimation Updated Jul 17, 2023 Python RunpeiDong / ACT Star 100 Code Issues Pull requests [ICLR 2023] Autoencoders as Cross-Modal Teachers...
Patch-Based Semantic Labeling of Road Scene Using Colorized Mobile LiDAR Point Clouds Semantic labeling of road scenes using colorized m Wang,Hanyun,Yongtao,... - 《IEEE Transactions on Intelligent Transportation Systems》 被引量: 21发表: 2016年 Laser range data based semantic labeling of places ...
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval, WACV 2020. [paper] Multi-Modality Cross Attention Network for Image and Sentence Matching, CVPR 2020. [paper] IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval, CVPR 2020. [pap...