The Three Weight Algorithm: A Method for Large-Scale Distributed Optimization
ABSTRACT: The alternating direction method of multipliers (ADMM) originated in the 1970s as a decomposition-coordination method for solving optimization problems. It has recently had a revival of interest because of its highly distributed and parallel nature. It is typically applied when the objective function can be decomposed into a sum of functions over a set of shared variables that can separately be solved efficiently. ADMM iteratively optimizes each function independently and uses update rules that enforce a consensus among the values of the shared variables. These iterations are specified up to the value of a penalty parameter. In the first part of this talk we revisit ADMM and explain how it can be interpreted as a message-passing algorithm on a bipartite graph. In this interpretation, each message has its own penalty parameter, now called a “message-weight.” Informally, these can be seen as measuring the reliability or confidence of the messages. From the intuition gained from our message-passing interpretation, we introduce the Three Weight Algorithm (TWA), a set of rules for dynamically updating the message-weights. The message- weights are restricted to three different values: infinity for absolute confidence, zero for complete lack of confidence and 1 as a default confidence level. This simple change in the algorithm can have an enormous impact on the speed with which it finds solutions for non- convex problems. We demonstrate that TWA can surprisingly quickly solve difficult non-convex problems with a variety of examples. In a first example, TWA is used to solve very large Sudoku puzzles. Here, the infinite weights play a major role and allow the propagation of information through hard constraints. In the second example, TWA is used to pack a very large number of hard disks in a box. Now the zero weights play the major role and speed up convergence several orders of magnitude compared to standard ADMM. In a third example, we show that TWA can solve difficult multi-robot trajectory planning problems. Finally we conclude by showing that the TWA naturally enables the integration of higher-order knowledge into a lower-level parallel optimization algorithm.
BIO: José Bento is a postdoctoral researcher at the Disney Research, Boston lab and part of the research group led by Jonathan Yedidia (Senior Research Scientist). He completed his PhD in Electrical Engineering at Stanford University where he worked with Professor Andrea Montanari on statistical inference and structural learning of graphical models. Currently he is working on proximal algorithms for distributed optimization. In 2011 he received the SIGWEB DocEng Best paper award and won the RecSys-CAMRa2011 Challenge on context-aware recommendation systems. In 2014 he received a Disney Inventor Award for his work on distributed optimization.