Introduction 对于image-text embedding learning,作者提出了 cross-modal projection matching (CMPM) loss 和 cross-modal projection classification (CMPC) loss。前者最小化两个模态特征投影分布的KL散度;后者基于norm-softmax损失,对模态A在模态B上的投影特征进行分类,进一步增强模态之间的契合度。 The Proposed Algor...
与原始的softmax loss相比,norm-softmax loss将所有权向量标准化为相同长度,以减少权值在区分不同样本时的影响。 如上图所示,softmax损失的分类结果依靠于\[\left\| {{W_k}} \right\|\left\| x \right\|\cos \left( {{\theta _k}} \right),\left( {k = 1,2} \right)\]。对于norm-softmax,...
6 Conclusions In this paper, we proposed a novel cross-modal projection matching loss (CMPM) and cross-modal projection classification (CMPC) loss, for learning deep dis- criminative image-text embeddings. The CMPM loss utilize the KL divergence to minimize the compatibility score of the ...
Various methods are proposed to reduce the cross-domain discrepancy by using adversarial loss, sharing a projection network, using triplet loss with pairs/triplets of different modalities, maximizing cross-modal pairwise item correlation [29, 42, 34, 20, 10]. Even though the existing...
On this basis, cross-modal projection matching constrain (CMPM) is introduced which minimizes the Kullback-Leibler divergence between feature projection matching distributions and label projection matching distributions, and label information is used to align similarities between low-dimensional features of ...
Considering the similarity between the modalities, an automatic encoder is utilized to associate the feature projection to the semantic code vector. In addition, regularization and sparse constraints are applied to low-dimensional matrices to balance reconstruction errors. The high dimensional data is ...
Both of them were able to perform cross-modal matching using the air-filled PVC targets with high matching accuracy, but only Ginsan was trained for matching using targets with different material compositions. High-frequency hearing loss may be common in older dolphins25 but Ginsan was about 10...
Zhang, Y., Lu, H.: Deep cross-modal projection learning for image-text matching. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), pp. 707–723. Munich, Germany (2018) Google Scholar Li, S., Cao, M....
Additional PCA projection plots for random pairs of classes in ImageNet [15]. Adding one-shot text as training samples can oftentimes aggressively shift the decision boundary. whereby visual and text examples lie in slightly different parts of the embedding spac...
For set prediction, the bipartite matching is applied for one-to-one assignment between predictions and ground- truths. We adopt the focal loss for classification and L1 loss for 3D bounding box regression: L(y, yˆ) = ω1Lcls(c, cˆ) + ω2Lreg(b, ...