SUR [7] and URT [26] rely on attention mechanisms to select appropriate domain- specific representations for a given few-shot learning task. While these methods have shown good performance, they require multiple forward networks during inference time. In contrast, Li ...