Regularizing Class-wise Predictions via Self-knowledge Distillation,程序员大本营,技术文章内容聚合第一站。
对于预训练阶段,visual encoder和linguistic encoder由L_{ccl}和L_{dis},所以pre-training损失为L_{pre}=\lambda{L_{ccl}}+(1-\lambda)L_{dis},\lambda\in[0,1]是一个超参以平衡L_{ccl}和L_{dis}。对于Knowledge Distillation,需要已经训练好的CLIP辅助训练,它含有更多的信息,以此让CVLP尽可能有更多的...
Table 8: Comparison of proxy-novel supervision with knowledge distillation from CLIP. Supervision APr BCE loss (base classes) 24.6 (+0.0) BCE + distillation vild 25.3 (+0.7) BCE + proxy loss (ours, Eq. 4) 26.2 (+1.6) Proxy-novel Novel category prior Mixing class selection APrFigure 6: ...
DistillationDataStats DistillationHyperParameters DistillationSpec DoubleArray DynamicRetrievalConfig Overview Mode EncryptionSpec Endpoint Overview LabelsEntry TrafficSplitEntry EntityIdSelector EntityType Overview LabelsEntry EnvVar ErrorAnalysisAnnotation Overview AttributedItem QueryType Ev...