Then, an adaptive gradient descent strategy, which can exploit the update history of per-dimensional pheromones to achieve intelligent convergence, is integrated into the ACO algorithm as ADACO. A parallel computation process is implemented in the process to expedite computational efficiency. Finally, ...
A few efforts have been made to solve decentralized nonconvex strongly-concave (NCSC) minimax-structured optimization; however, all of them focus on smooth problems with at most a constraint on the maximization variable. In this paper, we make the first attempt on solving composite NCSC minimax ...
The convergence results are also derived in the particular case in which the problem is unconstrained and even if inexact directions are taken as descent directions. Furthermore, we investigate the application of the proposed method to optimization models where the domain of the variable order map ...
内容提示: Stochastic Gradient Descent Jittering for Inverse Problems:Alleviating the Accuracy-Robustness TradeoffPeimeng Guan1 , Mark A. Davenport 1Georgia Intitute of TechnologyAtlanta, GA 30332 USA{pguan6, mdav}@gatech.eduAbstractInverse problems aim to reconstruct unseen data from cor-rupted or ...
Paper tables with annotated results for Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems
Some Open Problems:尽管基于 classic machine learning theory 会认为深度学习会出现严重的过拟合, 但是实际上深度学习通常能够表现地很好, 这说明存在着某种潜在的 regularizing effect, 例如 SGD 本身. 然而 value-based method 事实上不是 gradient descent, 因此一个可能的研究方向就是这样的一种 "magic" 是否在...
interaction data. It usesBayesian Personalized Ranking (BPR)and a variant ofWeighted Approximate-Rank Pairwise (WARP)loss to learn model weights via Stochastic Gradient Descent (SGD). It can (optionally) incorporate sample weights and user/item auxiliary features to augment the main interaction data...
The minimization is typically performed by means of a gradient descent algorithm [58], such as the L-BFGS method [59] (a quasi-Newton optimizer) or ADAM [60]. The computational costs of the training are heavily linked to the complexity of the neural network, that depends on its depth (...
Building upon the previously established equivalence of the basic scheme of Moulinec鈥揝uquet's FFT-based computational homogenization method with a gradient descent method, this work concerns the impact of the fast gradient method of Nesterov in the context of computational homogenization. Nesterov's...
(cf. Eq.3) for the correct, yielded by the PDE, and approximate fluid-flow physics, yielded by the trained FNO. The results of these inversions after 100 iterations of gradient descent with back-tracking linesearch [87] are plotted in Fig.8a and b. From these plots, we observe that ...