Accurate motion estimation is essential for reliable visual perception tasks allowing, among other things, the avoidance of collisions and the inference of 3D geometry from monocular vision. Hence, the investigation and estimation of motion cues is a central and long-studied problem in computer vision...
To address these issues, we propose a Multi-modal Scale-aware Attention Network (MSAN) to fuse RGB and Depth data effectively via a novel transformer-based cross-attention module, namely Multi-modal Scare-aware Transformer (MST), which fuses RGB-D features from a global perspective across ...
However, there are also recent studies that combine CNN with Transformer. Li et al.23 proposed a dual encoding-decoding structure of the X-shaped network, integrated both characteristics of CNN and Transformer, achieves good segmentation results RGB-D semantic segmentation The depth image can be ...
leading to precise classification. The training process of the proposed method is stable, and the quality of generated 3D objects significantly exceeds that of other methods in both subjective visual and objective evaluation metrics. This method can facilitate...
Transformer neural network for protein-specific de novo drug generation as a machine translation problem. Sci. Rep. 11, 321 (2021). Article Google Scholar Li, C. et al. Geometry-based molecular generation with deep constrained variational autoencoder. In IEEE 21st International Conference on ...
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 1 Multiscale Mesh Deformation Component Analysis with Attention-based Autoencoders Jie Yang, Lin Gao∗, Qingyang Tan, Yi-Hua Huang, Shihong Xia and Yu-Kun Lai Abstract—Deformation component analysis is a fundamental problem in geometry ...
Multi-view Stereo (MVS) [30] is a classi- cal task in 3D computer vision, with the goal of recon- structing 3D geometry from multi-view images. Conven- tional methods [31, 46,77, 114] reconstruct 3D geometry by finding the patches matched with different images and es- tim...
We exploit scene super-patches with consistent geometry information instead of discrete point clouds to overcome these challenges. However, the super-patch representation may lose local geometric details, especially for 3d complex shapes. To counter this potential drawback, we employ a transformer-based...
loss ensures that the generated point cloud closely aligns with the target surface, Chamfer Distance minimizes the overall discrepancy between two point clouds, and Normal Difference helps maintain consistent and accurate surface normals, thereby enhancing the fidelity of the reconstructed 3D geometry. ...
The groundbreaking work of PointNet [1] processes point cloud data based on MLPs, but it lacks consideration of local geometry and interrelationships between points. Subsequently, numerous methods [26,27,28,29,30,31,32,33,34] have emerged to overcome these problems, although fully supervised ...