Recently, the study of linear misspecified bandits has generated intriguing implications of the hardness of learning in bandits and reinforcement learning (RL). In particular, Du et al. (2020) show that even if a learner is given linear features in $\mathbb{R}^d$ that approximate the rewards in a bandit or RL with a uniform error of $\varepsilon$, searching for an $O(\varepsilon)$-optimal action requires pulling at least $\Omega(\exp(d))$ queries. Furthermore, Lattimore et al. (2020) show that a degraded $O(\varepsilon\sqrt{d})$-optimal solution can be learned within $\operatorname{poly}(d/\varepsilon)$ queries. Yet it is unknown whether a structural assumption on the ground-truth parameter, such as sparsity, could break the $\varepsilon\sqrt{d}$ barrier. In this paper, we address this question by showing that algorithms can obtain $O(\varepsilon)$-optimal actions by querying $O(\varepsilon^{-s}d^s)$ actions, where $s$ is the sparsity parameter, removing the $\exp(d)$-dependence. We then establish information-theoretical lower bounds, i.e., $\Omega(\exp(s))$, to show that our upper bound on sample complexity is nearly tight if one demands an error $ O(s^{\delta}\varepsilon)$ for $0<\delta<1$. For $\delta\geq 1$, we further show that $\operatorname{poly}(s/\varepsilon)$ queries are possible when the linear features are "good" and even in general settings. These results provide a nearly complete picture of how sparsity can help in misspecified bandit learning and provide a deeper understanding of when linear features are "useful" for bandit and reinforcement learning with misspecification.
The goal of multi-objective optimization is to identify a collection of points which describe the best possible trade-offs between the multiple objectives. In order to solve this vector-valued optimization problem, practitioners often appeal to the use of scalarization functions in order to transform the multi-objective problem into a collection of single-objective problems. This set of scalarized problems can then be solved using traditional single-objective optimization techniques. In this work, we formalise this convention into a general mathematical framework. We show how this strategy effectively recasts the original multi-objective optimization problem into a single-objective optimization problem defined over sets. An appropriate class of objective functions for this new problem is the R2 utility function, which is defined as a weighted integral over the scalarized optimization problems. We show that this utility function is a monotone and submodular set function, which can be optimised effectively using greedy optimization algorithms. We analyse the performance of these greedy algorithms both theoretically and empirically. Our analysis largely focusses on Bayesian optimization, which is a popular probabilistic framework for black-box optimization.
An $f$-edge fault-tolerant distance sensitive oracle ($f$-DSO) with stretch $\sigma \ge 1$ is a data structure that preprocesses a given undirected, unweighted graph $G$ with $n$ vertices and $m$ edges, and a positive integer $f$. When queried with a pair of vertices $s, t$ and a set $F$ of at most $f$ edges, it returns a $\sigma$-approximation of the $s$-$t$-distance in $G-F$. We study $f$-DSOs that take subquadratic space. Thorup and Zwick [JACM 2015] showed that this is only possible for $\sigma \ge 3$. We present, for any constant $f \ge 1$ and $\alpha \in (0, \frac{1}{2})$, and any $\varepsilon > 0$, an $f$-DSO with stretch $ 3 + \varepsilon$ that takes $\widetilde{O}(n^{2-\frac{\alpha}{f+1}}/\varepsilon) \cdot O(\log n/\varepsilon)^{f+1}$ space and has an $O(n^\alpha/\varepsilon^2)$ query time. We also give an improved construction for graphs with diameter at most $D$. For any constant $k$, we devise an $f$-DSO with stretch $2k-1$ that takes $O(D^{f+o(1)} n^{1+1/k})$ space and has $\widetilde{O}(D^{o(1)})$ query time, with a preprocessing time of $O(D^{f+o(1)} mn^{1/k})$. Chechik, Cohen, Fiat, and Kaplan [SODA 2017] presented an $f$-DSO with stretch $1{+}\varepsilon$ and preprocessing time $O_\varepsilon(n^{5+o(1)})$, albeit with a super-quadratic space requirement. We show how to reduce their preprocessing time to $O_{\varepsilon}(mn^{2+o(1)})$.
Random Search is one of the most widely-used method for Hyperparameter Optimization, and is critical to the success of deep learning models. Despite its astonishing performance, little non-heuristic theory has been developed to describe the underlying working mechanism. This paper gives a theoretical accounting of Random Search. We introduce the concept of \emph{scattering dimension} that describes the landscape of the underlying function, and quantifies the performance of random search. We show that, when the environment is noise-free, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $, where $ d_s \ge 0 $ is the scattering dimension of the underlying function. When the observed function values are corrupted by bounded $iid$ noise, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $. In addition, based on the principles of random search, we introduce an algorithm, called BLiN-MOS, for Lipschitz bandits in doubling metric spaces that are also emdowed with a Borel measure, and show that BLiN-MOS achieves a regret rate of order $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $, where $d_z$ is the zooming dimension of the problem instance. Our results show that in metric spaces with a Borel measure, the classic theory of Lipschitz bandits can be improved. This result suggests an intrinsic axiomatic gap between metric spaces and metric measure spaces from an algorithmic perspective, since the upper bound in a metric measure space breaks the known information-theoretical lower bounds for Lipschitz bandits in a metric space with no measure structure.
Among the great successes of Reinforcement Learning (RL), self-play algorithms play an essential role in solving competitive games. Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies, making it often stuck in the local optimum and its strategy style simple and homogeneous. A possible solution is to improve the diversity of policies, which helps the agent break the stalemate and enhances its robustness when facing different opponents. However, enhancing diversity in the self-play algorithms is not trivial. In this paper, we aim to introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty. Specifically, we design a novel reinforcement learning algorithm called Risk-sensitive Proximal Policy Optimization (RPPO), which smoothly interpolates between worst-case and best-case policy learning and allows for policy learning with desired risk preferences. Seamlessly integrating RPPO with population-based self-play, agents in the population optimize dynamic risk-sensitive objectives with experiences from playing against diverse opponents. Empirical results show that our method achieves comparable or superior performance in competitive games and that diverse modes of behaviors emerge. Our code is public online at \url{//github.com/Jackory/RPBT}.
We present a highly effective algorithmic approach for generating $\varepsilon$-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset $X$ in the hypercube $[0,1]^d$, our algorithm generates synthetic dataset $Y$ such that the expected 1-Wasserstein distance between the empirical measure of $X$ and $Y$ is $O((\varepsilon n)^{-1/d})$ for $d\geq 2$, and is $O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$ for $d=1$. The accuracy guarantee is optimal up to a constant factor for $d\geq 2$, and up to a logarithmic factor for $d=1$. Our algorithm has a fast running time of $O(\varepsilon dn)$ for all $d\geq 1$ and demonstrates improved accuracy compared to the method in (Boedihardjo et al., 2022) for $d\geq 2$.
Moment restrictions and their conditional counterparts emerge in many areas of machine learning and statistics ranging from causal inference to reinforcement learning. Estimators for these tasks, generally called methods of moments, include the prominent generalized method of moments (GMM) which has recently gained attention in causal inference. GMM is a special case of the broader family of empirical likelihood estimators which are based on approximating a population distribution by means of minimizing a $\varphi$-divergence to an empirical distribution. However, the use of $\varphi$-divergences effectively limits the candidate distributions to reweightings of the data samples. We lift this long-standing limitation and provide a method of moments that goes beyond data reweighting. This is achieved by defining an empirical likelihood estimator based on maximum mean discrepancy which we term the kernel method of moments (KMM). We provide a variant of our estimator for conditional moment restrictions and show that it is asymptotically first-order optimal for such problems. Finally, we show that our method achieves competitive performance on several conditional moment restriction tasks.
We consider problems of minimizing functionals $\mathcal{F}$ of probability measures on the Euclidean space. To propose an accelerated gradient descent algorithm for such problems, we consider gradient flow of transport maps that give push-forward measures of an initial measure. Then we propose a deterministic accelerated algorithm by extending Nesterov's acceleration technique with momentum. This algorithm do not based on the Wasserstein geometry. Furthermore, to estimate the convergence rate of the accelerated algorithm, we introduce new convexity and smoothness for $\mathcal{F}$ based on transport maps. As a result, we can show that the accelerated algorithm converges faster than a normal gradient descent algorithm. Numerical experiments support this theoretical result.
We present linear-time algorithms for both maximum hidden set and minimum convex cover in funnel polygons. These algorithms show that funnel polygons are "homestead" polygons, i.e. polygons for which the hidden set number and the convex cover number coincide. We extend the algorithm to apply to maximum hidden vertex set and use the result to give a 2-approximation for all three problems in pseudotriangles.
We propose a novel training algorithm called DualFL (Dualized Federated Learning), for solving a distributed optimization problem in federated learning. Our approach is based on a specific dual formulation of the federated learning problem. DualFL achieves communication acceleration under various settings on smoothness and strong convexity of the problem. Moreover, it theoretically guarantees the use of inexact local solvers, preserving its optimal communication complexity even with inexact local solutions. DualFL is the first federated learning algorithm that achieves communication acceleration, even when the cost function is either nonsmooth or non-strongly convex. Numerical results demonstrate that the practical performance of DualFL is comparable to those of state-of-the-art federated learning algorithms, and it is robust with respect to hyperparameter tuning.
Polynomial zonotopes, a non-convex set representation, have a wide range of applications from real-time motion planning and control in robotics, to reachability analysis of nonlinear systems and safety shielding in reinforcement learning.Despite this widespread use, a frequently overlooked difficulty associated with polynomial zonotopes is intersection checking. Determining whether the reachable set, represented as a polynomial zonotope, intersects an unsafe set is not straightforward.In fact, we show that this fundamental operation is NP-hard, even for a simple class of polynomial zonotopes.The standard method for intersection checking with polynomial zonotopes is a two-part algorithm that overapproximates a polynomial zonotope with a regular zonotope and then, if the overapproximation error is deemed too large, splits the set and recursively tries again.Beyond the possible need for a large number of splits, we identify two sources of concern related to this algorithm: (1) overapproximating a polynomial zonotope with a zonotope has unbounded error, and (2) after splitting a polynomial zonotope, the overapproximation error can actually increase.Taken together, this implies there may be a possibility that the algorithm does not always terminate.We perform a rigorous analysis of the method and detail necessary conditions for the union of overapproximations to provably converge to the original polynomial zonotope.