In this paper we study a class of constrained minimax problems. In particular, we propose a first-order augmented Lagrangian method for solving them, whose subproblems turn out to be a much simpler structured minimax problem and are suitably solved by a first-order method recently developed in [26] by the authors. Under some suitable assumptions, an \emph{operation complexity} of ${\cal O}(\varepsilon^{-4}\log\varepsilon^{-1})$, measured by its fundamental operations, is established for the first-order augmented Lagrangian method for finding an $\varepsilon$-KKT solution of the constrained minimax problems.
In stochastic contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience to minimize the cumulative regret. Like many other machine learning algorithms, the performance of bandits heavily depends on their multiple hyperparameters, and theoretically derived parameter values may lead to unsatisfactory results in practice. Moreover, it is infeasible to use offline tuning methods like cross-validation to choose hyperparameters under the bandit environment, as the decisions should be made in real time. To address this challenge, we propose the first online continuous hyperparameter tuning framework for contextual bandits to learn the optimal parameter configuration within a search space on the fly. Specifically, we use a double-layer bandit framework named CDT (Continuous Dynamic Tuning) and formulate the hyperparameter optimization as a non-stationary continuum-armed bandit, where each arm represents a combination of hyperparameters, and the corresponding reward is the algorithmic result. For the top layer, we propose the Zooming TS algorithm that utilizes Thompson Sampling (TS) for exploration and a restart technique to get around the switching environment. The proposed CDT framework can be easily used to tune contextual bandit algorithms without any pre-specified candidate set for hyperparameters. We further show that it could achieve sublinear regret in theory and performs consistently better on both synthetic and real datasets in practice.
Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP). However, not all goals can be captured in this manner. In this paper we study convex MDPs in which goals are expressed as convex functions of the stationary distribution and show that they cannot be formulated using stationary reward functions. Convex MDPs generalize the standard reinforcement learning (RL) problem formulation to a larger framework that includes many supervised and unsupervised RL problems, such as apprenticeship learning, constrained MDPs, and so-called `pure exploration'. Our approach is to reformulate the convex MDP problem as a min-max game involving policy and cost (negative reward) `players', using Fenchel duality. We propose a meta-algorithm for solving this problem and show that it unifies many existing algorithms in the literature.
In this paper, we introduce the Maximum Matrix Contraction problem, where we aim to contract as much as possible a binary matrix in order to maximize its density. We study the complexity and the polynomial approximability of the problem. Especially, we prove this problem to be NP-Complete and that every algorithm solving this problem is at most a $2\sqrt{n}$-approximation algorithm where n is the number of ones in the matrix. We then focus on efficient algorithms to solve the problem: an integer linear program and three heuristics.
Classical analysis of convex and non-convex optimization methods often requires the Lipshitzness of the gradient, which limits the analysis to functions bounded by quadratics. Recent work relaxed this requirement to a non-uniform smoothness condition with the Hessian norm bounded by an affine function of the gradient norm, and proved convergence in the non-convex setting via gradient clipping, assuming bounded noise. In this paper, we further generalize this non-uniform smoothness condition and develop a simple, yet powerful analysis technique that bounds the gradients along the trajectory, thereby leading to stronger results for both convex and non-convex optimization problems. In particular, we obtain the classical convergence rates for (stochastic) gradient descent and Nesterov's accelerated gradient method in the convex and/or non-convex setting under this general smoothness condition. The new analysis approach does not require gradient clipping and allows heavy-tailed noise with bounded variance in the stochastic setting.
As a computational alternative to Markov chain Monte Carlo approaches, variational inference (VI) is becoming more and more popular for approximating intractable posterior distributions in large-scale Bayesian models due to its comparable efficacy and superior efficiency. Several recent works provide theoretical justifications of VI by proving its statistical optimality for parameter estimation under various settings; meanwhile, formal analysis on the algorithmic convergence aspects of VI is still largely lacking. In this paper, we consider the common coordinate ascent variational inference (CAVI) algorithm for implementing the mean-field (MF) VI towards optimizing a Kullback--Leibler divergence objective functional over the space of all factorized distributions. Focusing on the two-block case, we analyze the convergence of CAVI by leveraging the extensive toolbox from functional analysis and optimization. We provide general conditions for certifying global or local exponential convergence of CAVI. Specifically, a new notion of generalized correlation for characterizing the interaction between the constituting blocks in influencing the VI objective functional is introduced, which according to the theory, quantifies the algorithmic contraction rate of two-block CAVI. As illustrations, we apply the developed theory to a number of examples, and derive explicit problem-dependent upper bounds on the algorithmic contraction rate.
We consider gradient-related methods for low-rank matrix optimization with a smooth cost function. The methods operate on single factors of the low-rank factorization and share aspects of both alternating and Riemannian optimization. Two possible choices for the search directions based on Gauss-Southwell type selection rules are compared: one using the gradient of a factorized non-convex formulation, the other using the Riemannian gradient. While both methods provide gradient convergence guarantees that are similar to the unconstrained case, the version based on Riemannian gradient is significantly more robust with respect to small singular values and the condition number of the cost function, as illustrated by numerical experiments. As a side result of our approach, we also obtain new convergence results for the alternating least squares method.
This paper provides experimental experiences on two local search hybridized genetic algorithms in solving the uncapacitated examination timetabling problem. The proposed two hybrid algorithms use partition and priority based solution representations which are inspired from successful genetic algorithms proposed for graph coloring and project scheduling problems, respectively. The algorithms use a parametrized saturation degree heuristic hybridized crossover scheme. In the experiments, the algorithms firstly are calibrated with a Design of Experiments approach and then tested on the well-known Toronto benchmark instances. The calibration shows that the hybridization prefers an intensive local search method. The experiments indicate the vitality of local search in the proposed genetic algorithms, however, experiments also show that the hybridization benefits local search as well. Interestingly, although the structures of the two algorithms are not alike, their performances are quite similar to each other and also to other state-of-the-art genetic-type algorithms proposed in the literature.
Given $n$ samples of a function $f\colon D\to\mathbb C$ in random points drawn with respect to a measure $\varrho_S$ we develop theoretical analysis of the $L_2(D, \varrho_T)$-approximation error. For a parituclar choice of $\varrho_S$ depending on $\varrho_T$, it is known that the weighted least squares method from finite dimensional function spaces $V_m$, $\dim(V_m) = m < \infty$ has the same error as the best approximation in $V_m$ up to a multiplicative constant when given exact samples with logarithmic oversampling. If the source measure $\varrho_S$ and the target measure $\varrho_T$ differ we are in the domain adaptation setting, a subfield of transfer learning. We model the resulting deterioration of the error in our bounds. Further, for noisy samples, our bounds describe the bias-variance trade off depending on the dimension $m$ of the approximation space $V_m$. All results hold with high probability. For demonstration, we consider functions defined on the $d$-dimensional cube given in unifom random samples. We analyze polynomials, the half-period cosine, and a bounded orthonormal basis of the non-periodic Sobolev space $H_{\mathrm{mix}}^2$. Overcoming numerical issues of this $H_{\text{mix}}^2$ basis, this gives a novel stable approximation method with quadratic error decay. Numerical experiments indicate the applicability of our results.
This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.
Recommender systems exploit interaction history to estimate user preference, having been heavily used in a wide range of industry applications. However, static recommendation models are difficult to answer two important questions well due to inherent shortcomings: (a) What exactly does a user like? (b) Why does a user like an item? The shortcomings are due to the way that static models learn user preference, i.e., without explicit instructions and active feedback from users. The recent rise of conversational recommender systems (CRSs) changes this situation fundamentally. In a CRS, users and the system can dynamically communicate through natural language interactions, which provide unprecedented opportunities to explicitly obtain the exact preference of users. Considerable efforts, spread across disparate settings and applications, have been put into developing CRSs. Existing models, technologies, and evaluation methods for CRSs are far from mature. In this paper, we provide a systematic review of the techniques used in current CRSs. We summarize the key challenges of developing CRSs into five directions: (1) Question-based user preference elicitation. (2) Multi-turn conversational recommendation strategies. (3) Dialogue understanding and generation. (4) Exploitation-exploration trade-offs. (5) Evaluation and user simulation. These research directions involve multiple research fields like information retrieval (IR), natural language processing (NLP), and human-computer interaction (HCI). Based on these research directions, we discuss some future challenges and opportunities. We provide a road map for researchers from multiple communities to get started in this area. We hope this survey helps to identify and address challenges in CRSs and inspire future research.