Ma, Zhi-MingChinese Acad Math & Syst SciLiu, Tie-YanMicrosoft Res AsiaElsevier ScienceNeurocomputingQ. Meng, W. Chen, Y. Wang, Z.-M. Ma, and T.-Y. Liu, "Convergence analysis of distributed stochastic gradient descent with shuffling," NIPS, 2017.
Variance reduction (VR) methods boost the performance of stochastic gradient descent (SGD) by enabling the use of larger stepsizes and preserving linear convergence rates. However, current variance reduced SGD methods require either high memory usage or require a full pass over the (large) data ...
Unlike existing distributed stochastic gradient schemes, CentralVR exhibits linear performance gains up to thousands of cores for massive datasets.De, SohamTaylor, GavinGoldstein, TomMathematicsS. De, G. Taylor, and T. Goldstein, "Scaling up distributed stochastic gradient descent using variance ...
Distributed stochastic gradient descent method is widely used for training large-scale machine learning models. However, the communication latency might slow down its convergence performance. Thus, [25] proposed a distributed stochastic gradient descent method with delayed updates to mitigate this issue...
Distributed stochastic gradient descent (SGD) algorithms are becoming popular in speeding up deep learning model training by employing multiple computational devices (named workers) parallelly. Top-k sparsification, a mechanism where each worker only communicates a small number of largest gradients (by ab...
Distributed machine learning (ML) has triggered tremendous research interest in recent years. Stochastic gradient descent (SGD) is one of the most popular algorithms for training ML models, and has been implemented in almost all distributed ML systems, such as Spark MLlib, ...
Distributed Stochastic Gradient Descent withCost-Sensitive and Strategic AgentsAbdullah Basar Akbay, Cihan TepedelenliogluSchool of Electrical, Computer and Energy EngineeringArizona State UniversityTempe, Arizonaaakbay@asu.edu, cihan@asu.eduAbstract—This study considers a federated learning setupwhere cost...
The low-rank matrix factorization models effectively reduce the size of the parameter space, while the asynchronous distributed stochastic gradient descent algorithms enable fast completion of the adjacency matrix. We validate the proposed algorithms using two real-world datasets on a distributed shared-...
Distributed Stochastic Gradient Descent with Event-Triggered Communication 的渐近均方收敛到临界点,并提供了所提出算法的收敛速度。我们将开发的算法应用于分布式监督学习问题,在该问题中,一组网络代理共同训练他们各自的神经网络以执行图像分类。结果表明,分布式训练的网络能够产生与...了一种分布式事件触发的随机梯度下...
除了SGD的发展,stochastic coordinate descent方面也是出现了一对papers。distributed optimization方面,这方面...