We build a theoretical framework for designing and understanding practical meta-learning methods that integrates sophisticated formalizations of task-similarity with the extensive literature on online convex optimization and sequential prediction algorithms. Our approach enables the task-similarity to be ...
It optimizes for the initial parameters of the learner to warm-start the gradient descent updates, such that new tasks can be solved using a small number of examples. In this paper we elaborate the gradient-based meta-learning, developing two new schemes. First, we present a feedforward ...
论文:Adaptive Meta-learner via Gradient Similarity for Few-shot Text Classification Abstract and Introduction 很多人尝试使用 optimization-based meta-learning 来解决 few-shot text classification 任务,因为可以得到一个好的初始化模型,相当于变相得到 task distribution 。 但是他们忽略了一些问题,作者提出 Adaptiv...
Upgrade meta-learner with AMGS 作者通过梯度之间的余弦相似度来判别query set得到的梯度是否有用。公式如下: 如果上述相似度大于0,则此梯度视为positive gradient,表示查询集对于增强模型泛化性有益处,因此我们保留此查询集样本,outer update的训练目标为: 如果上述相似度小于0,则此梯度视为negative gradient,我们移除...
Then, an adaptive gradient descent strategy, which can exploit the update history of per-dimensional pheromones to achieve intelligent convergence, is integrated into the ACO algorithm as ADACO. A parallel computation process is also implemented in the algorithm. Finally, ADACO was trialed on ...
It is also worth mentioning that despite increasing the number of hidden layers and neurons, the performance of ANN did not improve, possibly due to overfitting or vanishing gradient problems71. Based on the evaluation metrics employed in this study, the optimal model is characterised by minimal ...
Peng. Learning to learn single domain generalization. In CVPR, 2020. 2, 3 [49] A. Rame, C. Dancette, and M. Cord. Fishr: Invariant gradient variances for out-of-distribution generalization. In ICML, 2022. 3 [50] S. Sagawa, P. W. Koh,...
R. A taxonomy of global optimization methods based on response surfaces. J. Global Optim. 21, 345–383. https://doi. org/10.1023/A:1012771025575 (2001). 42. Laurenceau, J., Meaux, M., Montagnac, M. & Sagaut, P. Comparison of gradient-based and gradient...
Both collaborative and local updating use the stochastic gradient descent (SGD) optimizer with a batch size of 32 and a learning rate of 0.01. Datasets. Experiments are conducted on two scenar- ios [27, 23], including Label Shift: CIFAR-100 [18], and Tiny-ImageNet1, an...
Controlling aerodynamic forces in turbulent conditions is crucial for UAV operation. Traditional reactive methods often struggle due to unpredictable flow and sensor noise. We present FALCON (Fourier Adaptive Learning and Control), a model-based reinforc