We propose and analyze a new dynamical system with \textit{a closed-loop control law} in a Hilbert space $\mathcal{H}$, aiming to shed light on the acceleration phenomenon for \textit{monotone inclusion} problems, which unifies a broad class of optimization, saddle point and variational inequality (VI) problems under a single framework. Given an operator $A: \mathcal{H} \rightrightarrows \mathcal{H}$ that is maximal monotone, we study a closed-loop control system that is governed by the operator $I - (I + \lambda(t)A)^{-1}$ where $\lambda(\cdot)$ is tuned by the resolution of the algebraic equation $\lambda(t)\|(I + \lambda(t)A)^{-1}x(t) - x(t)\|^{p-1} = \theta$ for some $\theta \in (0, 1)$. Our first contribution is to prove the existence and uniqueness of a global solution via the Cauchy-Lipschitz theorem. We present a Lyapunov function that allows for establishing the weak convergence of trajectories and strong convergence results under additional conditions. We establish a global ergodic rate of $O(t^{-(p+1)/2})$ in terms of a gap function and a global pointwise rate of $O(t^{-p/2})$ in terms of a residue function. Local linear convergence is established in terms of a distance function under an error bound condition. Further, we provide an algorithmic framework based on implicit discretization of our system in a Euclidean setting, generalizing the large-step HPE framework of~\citet{Monteiro-2012-Iteration}. While the discrete-time analysis is a simplification and generalization of the previous analysis for bounded domain, it is motivated by the aforementioned continuous-time analysis, illustrating the fundamental role that the closed-loop control plays in acceleration in monotone inclusion. A highlight of our analysis is set of new results concerning $p$-th order tensor algorithms for monotone inclusion problems, which complement the recent analysis for saddle point and VI problems.
This paper considers the problem of real-time control and learning in dynamic systems subjected to parametric uncertainties and proposes a controller that combines Adaptive Control (AC) in the inner loop and a Reinforcement Learning (RL) based policy in the outer loop. Two classes of nonlinear dynamic systems are considered, both of which are control-affine. The first class of dynamic systems utilizes equilibrium points with expansion forms around these points and employs a Lyapunov approach. The second class of nonlinear systems uses contraction theory as the underlying framework. For both classes of systems, the AC-RL controller is shown to lead to online policies that guarantee stability, and leverage accelerated convergence properties using a high-order tuner. Additionally, for the second class of systems, the AC-RL controller is shown to lead to parameter learning with persistent excitation. Numerical validations of all algorithms are carried out using a quadrotor landing task on a moving platform and other academic examples. All results clearly point out the advantage of the proposed integrative AC-RL approach.
Network management is a fundamental ingredient for efficient operation of wireless networks. With increasing bandwidth, number of antennas and number of users, the amount of information required for network management increases significantly. Therefore, distributed network management is a key to efficient operation of future networks. This paper focuses on the problem of distributed joint beamforming control and power allocation in ad-hoc mmWave networks. Over the shared spectrum, a number of multi-input-multi-output links attempt to minimize their supply power by simultaneously finding the locally optimal power allocation and beamformers in a self-organized manner. Our design considers a family of non-convex quality-of-service constraint and utility functions characterized by monotonicity in the strategies of the various users. We propose a two-stage, decentralized optimization scheme, where the adaptation of power levels and beamformer coefficients are iteratively performed by each link. We first prove that given a set of receive beamformers, the power allocation stage converges to an optimal generalized Nash equilibrium of the generalized power allocation game. Then we prove that iterative minimum-mean-square-error adaptation of the receive beamformer results in an overall converging scheme. Several transmit beamforming schemes requiring different levels of information exchange are also compared in the proposed allocation framework. Our simulation results show that allowing each link to optimize its transmit filters using the direct channel results in a near optimum performance with very low computational complexity, even though the problem is highly non-convex.
In this work, we investigate stochastic quasi-Newton methods for minimizing a finite sum of cost functions over a decentralized network. In Part I, we develop a general algorithmic framework that incorporates stochastic quasi-Newton approximations with variance reduction so as to achieve fast convergence. At each time each node constructs a local, inexact quasi-Newton direction that asymptotically approaches the global, exact one. To be specific, (i) A local gradient approximation is constructed by using dynamic average consensus to track the average of variance-reduced local stochastic gradients over the entire network; (ii) A local Hessian inverse approximation is assumed to be positive definite with bounded eigenvalues, and how to construct it to satisfy these assumptions will be given in Part II. Compared to the existing decentralized stochastic first-order methods, the proposed general framework introduces the second-order curvature information without incurring extra sampling or communication. With a fixed step size, we establish the conditions under which the proposed general framework linearly converges to an exact optimal solution.
We determine the exact minimax rate of a Gaussian sequence model under bounded convex constraints, purely in terms of the local geometry of the given constraint set $K$. Our main result shows that the minimax risk (up to constant factors) under the squared $L_2$ loss is given by $\epsilon^{*2} \wedge \operatorname{diam}(K)^2$ with \begin{align*} \epsilon^* = \sup \bigg\{\epsilon : \frac{\epsilon^2}{\sigma^2} \leq \log M^{\operatorname{loc}}(\epsilon)\bigg\}, \end{align*} where $\log M^{\operatorname{loc}}(\epsilon)$ denotes the local entropy of the set $K$, and $\sigma^2$ is the variance of the noise. We utilize our abstract result to re-derive known minimax rates for some special sets $K$ such as hyperrectangles, ellipses, and more generally quadratically convex orthosymmetric sets. Finally, we extend our results to the unbounded case with known $\sigma^2$ to show that the minimax rate in that case is $\epsilon^{*2}$.
In this paper we propose a deep learning based numerical scheme for strongly coupled FBSDE, stemming from stochastic control. It is a modification of the deep BSDE method in which the initial value to the backward equation is not a free parameter, and with a new loss function being the weighted sum of the cost of the control problem, and a variance term which coincides with the means square error in the terminal condition. We show by a numerical example that a direct extension of the classical deep BSDE method to FBSDE, fails for a simple linear-quadratic control problem, and motivate why the new method works. Under regularity and boundedness assumptions on the exact controls of time continuous and time discrete control problems we provide an error analysis for our method. We show empirically that the method converges for three different problems, one being the one that failed for a direct extension of the deep BSDE method.
Short-time Fourier transform (STFT) is the most common window-based approach for analyzing the spectrotemporal dynamics of time series. To mitigate the effects of high variance on the spectral estimates due to finite-length, independent STFT windows, state-space multitaper (SSMT) method used a state-space framework to introduce dependency among the spectral estimates. However, the assumed time-invariance of the state-space parameters makes the spectral dynamics difficult to capture when the time series is highly nonstationary. We propose an adaptive SSMT (ASSMT) method as a time-varying extension of SSMT. ASSMT tracks highly nonstationary dynamics by adaptively updating the state parameters and Kalman gains using a heuristic, computationally efficient exponential smoothing technique. In analyses of simulated data and real human electroencephalogram (EEG) recordings, ASSMT showed improved denoising and smoothing properties relative to standard multitaper and SSMT approaches.
In this paper, we study a non-local approximation of the time-dependent (local) Eikonal equation with Dirichlet-type boundary conditions, where the kernel in the non-local problem is properly scaled. Based on the theory of viscosity solutions, we prove existence and uniqueness of the viscosity solutions of both the local and non-local problems, as well as regularity properties of these solutions in time and space. We then derive error bounds between the solution to the non-local problem and that of the local one, both in continuous-time and Backward Euler time discretization. We then turn to studying continuum limits of non-local problems defined on random weighted graphs with $n$ vertices. In particular, we establish that if the kernel scale parameter decreases at an appropriate rate as $n$ grows, then almost surely, the solution of the problem on graphs converges uniformly to the viscosity solution of the local problem as the time step vanishes and the number vertices $n$ grows large.
The principle of majorization-minimization (MM) provides a general framework for eliciting effective algorithms to solve optimization problems. However, they often suffer from slow convergence, especially in large-scale and high-dimensional data settings. This has drawn attention to acceleration schemes designed exclusively for MM algorithms, but many existing designs are either problem-specific or rely on approximations and heuristics loosely inspired by the optimization literature. We propose a novel, rigorous quasi-Newton method for accelerating any valid MM algorithm, cast as seeking a fixed point of the MM \textit{algorithm map}. The method does not require specific information or computation from the objective function or its gradient and enjoys a limited-memory variant amenable to efficient computation in high-dimensional settings. By connecting our approach to Broyden's classical root-finding methods, we establish convergence guarantees and identify conditions for linear and super-linear convergence. These results are validated numerically and compared to peer methods in a thorough empirical study, showing that it achieves state-of-the-art performance across a diverse range of problems.
We consider the problem of in-order packet transmission over a cascade of packet-erasure links with acknowledgment (ACK) signals, interconnected by relays. We treat first the case of transmitting a single packet, in which ACKs are unnecessary, over links with independent identically distributed erasures. For this case, we derive tight upper and lower bounds on the probability of arrive failure within an allowed end-to-end communication delay over a given number of links. When the number of links is commensurate with the allowed delay, we determine the maximal ratio between the two -- coined information velocity -- for which the arrive-failure probability decays to zero; we further derive bounds on the arrive-failure probability when the ratio is below the information velocity, determine the exponential arrive-failure decay rate, and extend the treatment to links with different erasure probabilities. We then elevate all these results for a stream of packets with independent geometrically distributed interarrival times, and prove that the information velocity and the exponential decay rate remain the same for any stationary ergodic arrival process and for deterministic interarrival times. We demonstrate the significance of the derived fundamental limits -- the information velocity and the arrive-failure exponential decay rate -- by comparing them to simulation results.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.