This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves $O(\sigma / \sqrt{T})$ convergence when the oracle feedback is stochastic with variance $\sigma^2$, and improves its convergence to $O( 1 / T^3)$ with deterministic oracles, where $T$ is the number of iterations. Our method also interpolates these rates without knowing the nature of the oracle apriori, which is enabled by a parameter-free adaptive step-size that is oblivious to the knowledge of smoothness modulus, variance bounds and the diameter of the constrained set. To our knowledge, this is the first universal algorithm with such global guarantees within the second-order optimization literature.
We develop the theory and algorithmic toolbox for networked federated learning in decentralized collections of local datasets with an intrinsic network structure. This network structure arises from domain-specific notions of similarity between local datasets. Different notions of similarity are induced by spatio-temporal proximity, statistical dependencies or functional relations. Our main conceptual contribution is to formulate networked federated learning using a generalized total variation minimization. This formulation unifies and considerably extends existing federated multi-task learning methods. It is highly flexible and can be combined with a broad range of parametric models including Lasso or deep neural networks. Our main algorithmic contribution is a novel networked federated learning algorithm which is well-suited for distributed computing environments such as edge computing over wireless networks. This algorithm is robust against inexact computations due to limited computational resources. For local models resulting in convex problems, we derive precise conditions on the local models and their network structure such that our algorithm learns nearly optimal local models. Our analysis reveals an interesting interplay between the convex geometry of local models and the (cluster-) geometry of their network structure.
Beam parameter optimization in accelerators involves multiple, sometimes competing objectives. Condensing these individual objectives into a single figure of merit unavoidably results in a bias towards particular outcomes, in absence of prior knowledge often in a non-desired way. Finding an optimal objective definition then requires operators to iterate over many possible objective weights and definitions, a process that can take many times longer than the optimization itself. A more versatile approach is multi-objective optimization, which establishes the trade-off curve or Pareto front between objectives. Here we present the first results on multi-objective Bayesian optimization of a simulated laser-plasma accelerator. We find that multi-objective optimization reaches comparable performance to its single-objective counterparts while allowing for instant evaluation of entirely new objectives. This dramatically reduces the time required to find appropriate objective definitions for new problems. Additionally, our multi-objective, multi-fidelity method reduces the time required for an optimization run by an order of magnitude. It does so by dynamically choosing simulation resolution and box size, requiring fewer slow and expensive simulations as it learns about the Pareto-optimal solutions from fast low-resolution runs. The techniques demonstrated in this paper can easily be translated into many different computational and experimental use cases beyond accelerator optimization.
Many applications of representation learning, such as privacy preservation, algorithmic fairness, and domain adaptation, desire explicit control over semantic information being discarded. This goal is formulated as satisfying two objectives: maximizing utility for predicting a target attribute while simultaneously being invariant (independent) to a known semantic attribute. Solutions to invariant representation learning (IRepL) problems lead to a trade-off between utility and invariance when they are competing. While existing works study bounds on this trade-off, two questions remain outstanding: 1) What is the exact trade-off between utility and invariance? and 2) What are the encoders (mapping the data to a representation) that achieve the trade-off, and how can we estimate it from training data? This paper addresses these questions for IRepLs in reproducing kernel Hilbert spaces (RKHS)s. Under the assumption that the distribution of a low-dimensional projection of high-dimensional data is approximately normal, we derive a closed-form solution for the global optima of the underlying optimization problem for encoders in RKHSs. This yields closed formulae for a near-optimal trade-off, corresponding optimal representation dimensionality, and the corresponding encoder(s). We also numerically quantify the trade-off on representative problems and compare them to those achieved by baseline IRepL algorithms.
We present the Recurrent Interface Network (RIN), a neural net architecture that allocates computation adaptively to the input according to the distribution of information, allowing it to scale to iterative generation of high-dimensional data. Hidden units of RINs are partitioned into the interface, which is locally connected to inputs, and latents, which are decoupled from inputs and can exchange information globally. The RIN block selectively reads from the interface into latents for high-capacity processing, with incremental updates written back to the interface. Stacking multiple blocks enables effective routing across local and global levels. While routing adds overhead, the cost can be amortized in recurrent computation settings where inputs change gradually while more global context persists, such as iterative generation using diffusion models. To this end, we propose a latent self-conditioning technique that "warm-starts" the latents at each iteration of the generation process. When applied to diffusion models operating directly on pixels, RINs yield state-of-the-art image and video generation without cascades or guidance, while being domain-agnostic and up to 10$\times$ more efficient compared to specialized 2D and 3D U-Nets.
In today's online advertising markets, an important demand for an advertiser (buyer) is to control her total expenditure within a time span under some budget. Among all budget control approaches, throttling stands out as a popular one, where the buyer participates in only a part of auctions. This paper gives a theoretical panorama of a single buyer's dynamic budget throttling process in repeated second-price auctions, which is lacking in the literature. We first establish a lower bound on the regret and an upper bound on the asymptotic competitive ratio for any throttling algorithm, respectively, on whether the buyer's values are stochastic or adversarial. Second, on the algorithmic side, we consider two different information structures, with increasing difficulty in learning the stochastic distribution of the highest competing bid. We further propose the OGD-CB algorithm, which is oblivious to stochastic or adversarial values and has asymptotically equal results under these two information structures. Specifically, with stochastic values, we demonstrate that this algorithm guarantees a near-optimal expected regret. When values are adversarial, we prove that the proposed algorithm reaches the upper bound on the asymptotic competitive ratio. At last, we compare throttling with pacing, another widely adopted budget control method, in repeated second-price auctions. In the stochastic case, we illustrate that pacing is generally better than throttling for the buyer, which is an extension of known results that pacing is asymptotically optimal in this scenario. However, in the adversarial case, we give an exciting result indicating that throttling is the asymptotically optimal dynamic bidding strategy. Our results fill the gaps in the theoretical research of throttling in repeated auctions and comprehensively reveal the ability of this popular budget-smoothing strategy.
Designing a local planner to control tractor-trailer vehicles in forward and backward maneuvering is a challenging control problem in the research community of autonomous driving systems. Considering a critical situation in the stability of tractor-trailer systems, a practical and novel approach is presented to design a non-linear MPC(NMPC) local planner for tractor-trailer autonomous vehicles in both forward and backward maneuvering. The tractor velocity and steering angle are considered to be control variables. The proposed NMPC local planner is designed to handle jackknife situations, avoiding multiple static obstacles, and path following in both forward and backward maneuvering. The challenges mentioned above are converted into a constrained problem that can be handled simultaneously by the proposed NMPC local planner. The direct multiple shooting approach is used to convert the optimal control problem(OCP) into a non-linear programming problem(NLP) that IPOPT solvers can solve in CasADi. The controller performance is evaluated through different backup and forward maneuvering scenarios in the Gazebo simulation environment in real-time. It achieves asymptotic stability in avoiding static obstacles and accurate tracking performance while respecting path constraints. Finally, the proposed NMPC local planner is integrated with an open-source autonomous driving software stack called AutowareAi.
Convex function constrained optimization has received growing research interests lately. For a special convex problem which has strongly convex function constraints, we develop a new accelerated primal-dual first-order method that obtains an $\Ocal(1/\sqrt{\vep})$ complexity bound, improving the $\Ocal(1/{\vep})$ result for the state-of-the-art first-order methods. The key ingredient to our development is some novel techniques to progressively estimate the strong convexity of the Lagrangian function, which enables adaptive step-size selection and faster convergence performance. In addition, we show that the complexity is further improvable in terms of the dependence on some problem parameter, via a restart scheme that calls the accelerated method repeatedly. As an application, we consider sparsity-inducing constrained optimization which has a separable convex objective and a strongly convex loss constraint. In addition to achieving fast convergence, we show that the restarted method can effectively identify the sparsity pattern (active-set) of the optimal solution in finite steps. To the best of our knowledge, this is the first active-set identification result for sparsity-inducing constrained optimization.
We study numerical methods for solving a system of quasilinear stochastic partial differential equations known as the stochastic Landau-Lifshitz-Bloch (LLB) equation on a bounded domain in $\mathbb R^d$ for $d=1,2$. Our main results are estimates of the rate of convergence of the Finite Element Method to the solutions of stochastic LLB. To overcome the lack of regularity of the solution in the case $d=2$, we propose a Finite Element scheme for a regularised version of the equation. We then obtain error estimates of numerical solutions and for the solution of the regularised equation as well as the rate of convergence of this solution to the solution of the stochastic LLB equation. As a consequence, the convergence in probability of the approximate solutions to the solution of the stochastic LLB equation is derived. To the best of our knowledge this is the first result on error estimates for a system of stochastic quasilinear partial differential equations. A stronger result is obtained in the case $d=1$ due to a new regularity result for the LLB equation which allows us to avoid regularisation.
Variance reduction techniques such as SPIDER/SARAH/STORM have been extensively studied to improve the convergence rates of stochastic non-convex optimization, which usually maintain and update a sequence of estimators for a single function across iterations. What if we need to track multiple functional mappings across iterations but only with access to stochastic samples of $\mathcal{O}(1)$ functional mappings at each iteration? There is an important application in solving an emerging family of coupled compositional optimization problems in the form of $\sum_{i=1}^m f_i(g_i(\mathbf{w}))$, where $g_i$ is accessible through a stochastic oracle. The key issue is to track and estimate a sequence of $\mathbf g(\mathbf{w})=(g_1(\mathbf{w}), \ldots, g_m(\mathbf{w}))$ across iterations, where $\mathbf g(\mathbf{w})$ has $m$ blocks and it is only allowed to probe $\mathcal{O}(1)$ blocks to attain their stochastic values and Jacobians. To improve the complexity for solving these problems, we propose a novel stochastic method named Multi-block-Single-probe Variance Reduced (MSVR) estimator to track the sequence of $\mathbf g(\mathbf{w})$. It is inspired by STORM but introduces a customized error correction term to alleviate the noise not only in stochastic samples for the selected blocks but also in those blocks that are not sampled. With the help of the MSVR estimator, we develop several algorithms for solving the aforementioned compositional problems with improved complexities across a spectrum of settings with non-convex/convex/strongly convex/Polyak-{\L}ojasiewicz (PL) objectives. Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on the strong convexity parameter. Empirical studies on multi-task deep AUC maximization demonstrate the better performance of using the new estimator.
In many important graph data processing applications the acquired information includes both node features and observations of the graph topology. Graph neural networks (GNNs) are designed to exploit both sources of evidence but they do not optimally trade-off their utility and integrate them in a manner that is also universal. Here, universality refers to independence on homophily or heterophily graph assumptions. We address these issues by introducing a new Generalized PageRank (GPR) GNN architecture that adaptively learns the GPR weights so as to jointly optimize node feature and topological information extraction, regardless of the extent to which the node labels are homophilic or heterophilic. Learned GPR weights automatically adjust to the node label pattern, irrelevant on the type of initialization, and thereby guarantee excellent learning performance for label patterns that are usually hard to handle. Furthermore, they allow one to avoid feature over-smoothing, a process which renders feature information nondiscriminative, without requiring the network to be shallow. Our accompanying theoretical analysis of the GPR-GNN method is facilitated by novel synthetic benchmark datasets generated by the so-called contextual stochastic block model. We also compare the performance of our GNN architecture with that of several state-of-the-art GNNs on the problem of node-classification, using well-known benchmark homophilic and heterophilic datasets. The results demonstrate that GPR-GNN offers significant performance improvement compared to existing techniques on both synthetic and benchmark data.