We consider the linear contextual multi-class multi-period packing problem~(LMMP) where the goal is to pack items such that the total vector of consumption is below a given budget vector and the total value is as large as possible. We consider the setting where the reward and the consumption vector associated with each action is a class-dependent linear function of the context, and the decision-maker receives bandit feedback. LMMP includes linear contextual bandits with knapsacks and online revenue management as special cases. We establish a new more efficient estimator which guarantees a faster convergence rate, and consequently, a lower regret in such problems. We propose a bandit policy that is a closed-form function of said estimated parameters. When the contexts are non-degenerate, the regret of the proposed policy is sublinear in the context dimension, the number of classes, and the time horizon~$T$ when the budget grows at least as $\sqrt{T}$. We also resolve an open problem posed in Agrawal & Devanur (2016), and extend the result to a multi-class setting. Our numerical experiments clearly demonstrate that the performance of our policy is superior to other benchmarks in the literature.
Correlated outcomes are common in many practical problems. In some settings, one outcome is of particular interest, and others are auxiliary. To leverage information shared by all the outcomes, traditional multi-task learning (MTL) minimizes an averaged loss function over all the outcomes, which may lead to biased estimation for the target outcome, especially when the MTL model is mis-specified. In this work, based on a decomposition of estimation bias into two types, within-subspace and against-subspace, we develop a robust transfer learning approach to estimating a high-dimensional linear decision rule for the outcome of interest with the presence of auxiliary outcomes. The proposed method includes an MTL step using all outcomes to gain efficiency, and a subsequent calibration step using only the outcome of interest to correct both types of biases. We show that the final estimator can achieve a lower estimation error than the one using only the single outcome of interest. Simulations and real data analysis are conducted to justify the superiority of the proposed method.
This paper deals with speeding up the convergence of a class of two-step iterative methods for solving linear systems of equations. To implement the acceleration technique, the residual norm associated with computed approximations for each sub-iterate is minimized over a certain two-dimensional subspace. Convergence properties of the proposed method are studied in detail. The approach is further developed to solve (regularized) normal equations arising from the discretization of ill-posed problems. The results of numerical experiments are reported to illustrate the performance of exact and inexact variants of the method for some test problems.
We study the problem of zeroth-order (black-box) optimization of a Lipschitz function $f$ defined on a compact subset $\mathcal X$ of $\mathbb R^d$, with the additional constraint that algorithms must certify the accuracy of their recommendations. We characterize the optimal number of evaluations of any Lipschitz function $f$ to find and certify an approximate maximizer of $f$ at accuracy $\varepsilon$. Under a weak assumption on $\mathcal X$, this optimal sample complexity is shown to be nearly proportional to the integral $\int_{\mathcal X} \mathrm{d}\boldsymbol x/( \max(f) - f(\boldsymbol x) + \varepsilon )^d$. This result, which was only (and partially) known in dimension $d=1$, solves an open problem dating back to 1991. In terms of techniques, our upper bound relies on a packing bound by Bouttier al. (2020) for the Piyavskii-Shubert algorithm that we link to the above integral. We also show that a certified version of the computationally tractable DOO algorithm matches these packing and integral bounds. Our instance-dependent lower bound differs from traditional worst-case lower bounds in the Lipschitz setting and relies on a local worst-case analysis that could likely prove useful for other learning tasks.
We study the computational complexity of multi-stage robust optimization problems. Such problems are formulated with alternating min/max quantifiers and therefore naturally fall into a higher stage of the polynomial hierarchy. Despite this, almost no hardness results with respect to the polynomial hierarchy are known. In this work, we examine the hardness of robust two-stage adjustable and robust recoverable optimization with budgeted uncertainty sets. Our main technical contribution is the introduction of a technique tailored to prove $\Sigma^p_3$-hardness of such problems. We highlight a difference between continuous and discrete budgeted uncertainty: In the discrete case, indeed a wide range of problems becomes complete for the third stage of the polynomial hierarchy; in particular, this applies to the TSP, independent set, and vertex cover problems. However, in the continuous case this does not happen and problems remain in the first stage of the hierarchy. Finally, if we allow the uncertainty to not only affect the objective, but also multiple constraints, then this distinction disappears and even in the continuous case we encounter hardness for the third stage of the hierarchy. This shows that even robust problems which are already NP-complete can still exhibit a significant computational difference between column-wise and row-wise uncertainty.
In recent years, the development of quantum annealers has enabled experimental demonstrations and has increased research interest in applications of quantum annealing, such as in quantum machine learning and in particular for the popular quantum SVM. Several versions of the quantum SVM have been proposed, and quantum annealing has been shown to be effective in them. Extensions to multiclass problems have also been made, which consist of an ensemble of multiple binary classifiers. This work proposes a novel quantum SVM formulation for direct multiclass classification based on quantum annealing, called Quantum Multiclass SVM (QMSVM). The multiclass classification problem is formulated as a single Quadratic Unconstrained Binary Optimization (QUBO) problem solved with quantum annealing. The main objective of this work is to evaluate the feasibility, accuracy, and time performance of this approach. Experiments have been performed on the D-Wave Advantage quantum annealer for a classification problem on remote sensing data. The results indicate that, despite the memory demands of the quantum annealer, QMSVM can achieve accuracy that is comparable to standard SVM methods and, more importantly, it scales much more efficiently with the number of training examples, resulting in nearly constant time. This work shows an approach for bringing together classical and quantum computation, solving practical problems in remote sensing with current hardware.
Standard bandit algorithms that assume continual reallocation of measurement effort are challenging to implement due to delayed feedback and infrastructural/organizational difficulties. Motivated by practical instances involving a handful of reallocation epochs in which outcomes are measured in batches, we develop a new adaptive experimentation framework that can flexibly handle any batch size. Our main observation is that normal approximations universal in statistical inference can also guide the design of scalable adaptive designs. By deriving an asymptotic sequential experiment, we formulate a dynamic program that can leverage prior information on average rewards. State transitions of the dynamic program are differentiable with respect to the sampling allocations, allowing the use of gradient-based methods for planning and policy optimization. We propose a simple iterative planning method, Residual Horizon Optimization, which selects sampling allocations by optimizing a planning objective via stochastic gradient-based methods. Our method significantly improves statistical power over standard adaptive policies, even when compared to Bayesian bandit algorithms (e.g., Thompson sampling) that require full distributional knowledge of individual rewards. Overall, we expand the scope of adaptive experimentation to settings which are difficult for standard adaptive policies, including problems with a small number of reallocation epochs, low signal-to-noise ratio, and unknown reward distributions.
The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively studied in the context of decision-making under uncertainty. In many real-world applications, such as robotic applications, selecting an arm corresponds to a physical action that constrains the choices of the next available arms (actions). Motivated by this, we study an extension of MAB called the graph bandit, where an agent travels over a graph to maximize the reward collected from different nodes. The graph defines the agent's freedom in selecting the next available nodes at each step. We assume the graph structure is fully available, but the reward distributions are unknown. Built upon an offline graph-based planning algorithm and the principle of optimism, we design a learning algorithm, G-UCB, that balances long-term exploration-exploitation using the principle of optimism. We show that our proposed algorithm achieves $O(\sqrt{|S|T\log(T)}+D|S|\log T)$ learning regret, where $|S|$ is the number of nodes and $D$ is the diameter of the graph, which matches the theoretical lower bound $\Omega(\sqrt{|S|T})$ up to logarithmic factors. To our knowledge, this result is among the first tight regret bounds in non-episodic, un-discounted learning problems with known deterministic transitions. Numerical experiments confirm that our algorithm outperforms several benchmarks.
In reward-free reinforcement learning (RL), an agent explores the environment first without any reward information, in order to achieve certain learning goals afterwards for any given reward. In this paper we focus on reward-free RL under low-rank MDP models, in which both the representation and linear weight vectors are unknown. Although various algorithms have been proposed for reward-free low-rank MDPs, the corresponding sample complexity is still far from being satisfactory. In this work, we first provide the first known sample complexity lower bound that holds for any algorithm under low-rank MDPs. This lower bound implies it is strictly harder to find a near-optimal policy under low-rank MDPs than under linear MDPs. We then propose a novel model-based algorithm, coined RAFFLE, and show it can both find an $\epsilon$-optimal policy and achieve an $\epsilon$-accurate system identification via reward-free exploration, with a sample complexity significantly improving the previous results. Such a sample complexity matches our lower bound in the dependence on $\epsilon$, as well as on $K$ in the large $d$ regime, where $d$ and $K$ respectively denote the representation dimension and action space cardinality. Finally, we provide a planning algorithm (without further interaction with true environment) for RAFFLE to learn a near-accurate representation, which is the first known representation learning guarantee under the same setting.
This paper is dedicated to achieving scalable relative state estimation using inter-robot Euclidean distance measurements. We consider equipping robots with distance sensors and focus on the optimization problem underlying relative state estimation in this setup. We reveal the commonality between this problem and the coordinates realization problem of a sensor network. Based on this insight, we propose an effective unconstrained optimization model to infer the relative states among robots. To work on this model in a distributed manner, we propose an efficient and scalable optimization algorithm with the classical block coordinate descent method as its backbone. This algorithm exactly solves each block update subproblem with a closed-form solution while ensuring convergence. Our results pave the way for distance measurements-based relative state estimation in large-scale multi-robot systems.
This paper develops efficient preconditioned iterative solvers for incompressible flow problems discretised by an enriched Taylor-Hood mixed approximation, in which the usual pressure space is augmented by a piecewise constant pressure to ensure local mass conservation. This enrichment process causes over-specification of the pressure, which complicates the design and implementation of efficient solvers for the resulting linear systems. We first describe the impact of this choice of pressure space on the matrices involved. Next, we show how to recover effective solvers for Stokes problems, with a preconditioner based on the singular pressure mass matrix, and for Oseen systems arising from linearised Navier-Stokes equations, by using a two-stage pressure convection-diffusion strategy. The codes used to generate the numerical results are available online.