The Sum-of-Squares (SoS) hierarchy of semidefinite programs is a powerful algorithmic paradigm which captures state-of-the-art algorithmic guarantees for a wide array of problems. In the average case setting, SoS lower bounds provide strong evidence of algorithmic hardness or information-computation gaps. Prior to this work, SoS lower bounds have been obtained for problems in the "dense" input regime, where the input is a collection of independent Rademacher or Gaussian random variables, while the sparse regime has remained out of reach. We make the first progress in this direction by obtaining strong SoS lower bounds for the problem of Independent Set on sparse random graphs. We prove that with high probability over an Erdos-Renyi random graph $G\sim G_{n,\frac{d}{n}}$ with average degree $d>\log^2 n$, degree-$D_{SoS}$ SoS fails to refute the existence of an independent set of size $k = \Omega\left(\frac{n}{\sqrt{d}(\log n)(D_{SoS})^{c_0}} \right)$ in $G$ (where $c_0$ is an absolute constant), whereas the true size of the largest independent set in $G$ is $O\left(\frac{n\log d}{d}\right)$. Our proof involves several significant extensions of the techniques used for proving SoS lower bounds in the dense setting. Previous lower bounds are based on the pseudo-calibration heuristic of Barak et al [FOCS 2016] which produces a candidate SoS solution using a planted distribution indistinguishable from the input distribution via low-degree tests. In the sparse case the natural planted distribution does admit low-degree distinguishers, and we show how to adapt the pseudo-calibration heuristic to overcome this. Another notorious technical challenge for the sparse regime is the quest for matrix norm bounds. In this paper, we obtain new norm bounds for graph matrices in the sparse setting.
Incremental graph search algorithms such as D* Lite reuse previous, and perhaps partial, searches to expedite subsequent path planning tasks. In this article, we are interested in developing incremental graph search algorithms for path finding problems to simultaneously optimize multiple objectives such as travel risk, arrival time, etc. This is challenging because in a multi-objective setting, the number of "Pareto-optimal" solutions can grow exponentially with respect to the size of the graph. This article presents a new multi-objective incremental search algorithm called Multi-Objective Path-Based D* Lite (MOPBD*) which leverages a path-based expansion strategy to prune dominated solutions. Additionally, we introduce a sub-optimal variant of MOPBD* to improve search efficiency while approximating the Pareto-optimal front. We numerically evaluate the performance of MOPBD* and its variants in various maps with two and three objectives. Results show that our approach is more efficient than search from scratch, and runs up to an order of magnitude faster than the existing incremental method for multi-objective path planning.
An error estimate for the Gauss-Lobatto quadrature formula for integration over the interval $[-1, 1]$, relative to the Jacobi weight function $w^{\alpha,\beta}(t)=(1-t)^\alpha(1+t)^\beta$, $\alpha,\beta>-1$, is obtained. This estimate holds true for functions belonging to some Sobolev-type subspaces of the weighted space $L_{w^{\alpha,\beta}}^1([-1,1])$.
We give a classical algorithm for linear regression analogous to the quantum matrix inversion algorithm [Harrow, Hassidim, and Lloyd, Physical Review Letters'09, arXiv:0811.3171] for low-rank matrices [Wossnig, Zhao, and Prakash, Physical Review Letters'18, arXiv:1704.06174], when the input matrix $A$ is stored in a data structure applicable for QRAM-based state preparation. Namely, suppose we are given an $A \in \mathbb{C}^{m\times n}$ with minimum non-zero singular value $\sigma$ which supports certain efficient $\ell_2$-norm importance sampling queries, along with a $b \in \mathbb{C}^m$. Then, for some $x \in \mathbb{C}^n$ satisfying $\|x - A^+b\| \leq \varepsilon\|A^+b\|$, we can output a measurement of $|x\rangle$ in the computational basis and output an entry of $x$ with classical algorithms that run in $\tilde{\mathcal{O}}\big(\frac{\|A\|_{\mathrm{F}}^6\|A\|^6}{\sigma^{12}\varepsilon^4}\big)$ and $\tilde{\mathcal{O}}\big(\frac{\|A\|_{\mathrm{F}}^6\|A\|^2}{\sigma^8\varepsilon^4}\big)$ time, respectively. This improves on previous "quantum-inspired" algorithms in this line of research by at least a factor of $\frac{\|A\|^{16}}{\sigma^{16}\varepsilon^2}$ [Chia, Gily\'en, Li, Lin, Tang and Wang, STOC'20, arXiv:1910.06151]. As a consequence, we show that quantum computers can achieve at most a factor-of-12 speedup for linear regression in this QRAM data structure setting and related settings. Our work applies techniques from sketching algorithms and optimization to the quantum-inspired literature. Unlike earlier works, this is a promising avenue that could lead to feasible implementations of classical regression in a quantum-inspired settings, for comparison against future quantum computers.
We prove upper and lower bounds on the minimal spherical dispersion, improving upon previous estimates obtained by Rote and Tichy [Spherical dispersion with an application to polygonal approximation of curves, Anz. \"Osterreich. Akad. Wiss. Math.-Natur. Kl. 132 (1995), 3--10]. In particular, we see that the inverse $N(\varepsilon,d)$ of the minimal spherical dispersion is, for fixed $\varepsilon>0$, linear in the dimension $d$ of the ambient space. We also derive upper and lower bounds on the expected dispersion for points chosen independently and uniformly at random from the Euclidean unit sphere. In terms of the corresponding inverse $\widetilde{N}(\varepsilon,d)$, our bounds are optimal with respect to the dependence on $\varepsilon$.
Logistic Bandits have recently undergone careful scrutiny by virtue of their combined theoretical and practical relevance. This research effort delivered statistically efficient algorithms, improving the regret of previous strategies by exponentially large factors. Such algorithms are however strikingly costly as they require $\Omega(t)$ operations at each round. On the other hand, a different line of research focused on computational efficiency ($\mathcal{O}(1)$ per-round cost), but at the cost of letting go of the aforementioned exponential improvements. Obtaining the best of both world is unfortunately not a matter of marrying both approaches. Instead we introduce a new learning procedure for Logistic Bandits. It yields confidence sets which sufficient statistics can be easily maintained online without sacrificing statistical tightness. Combined with efficient planning mechanisms we design fast algorithms which regret performance still match the problem-dependent lower-bound of Abeille et al. (2021). To the best of our knowledge, those are the first Logistic Bandit algorithms that simultaneously enjoy statistical and computational efficiency.
Graph spanners are well-studied and widely used both in theory and practice. In a recent breakthrough, Chechik and Wulff-Nilsen [CW18] improved the state-of-the-art for light spanners by constructing a $(2k-1)(1+\epsilon)$-spanner with $O(n^{1+1/k})$ edges and $O_\epsilon(n^{1/k})$ lightness. Soon after, Filtser and Solomon [FS19] showed that the classic greedy spanner construction achieves the same bounds The major drawback of the greedy spanner is its running time of $O(mn^{1+1/k})$ (which is faster than [CW16]). This makes the construction impractical even for graphs of moderate size. Much faster spanner constructions do exist but they only achieve lightness $\Omega_\epsilon(kn^{1/k})$, even when randomization is used. The contribution of this paper is deterministic spanner constructions that are fast, and achieve similar bounds as the state-of-the-art slower constructions. Our first result is an $O_\epsilon(n^{2+1/k+\epsilon'})$ time spanner construction which achieves the state-of-the-art bounds. Our second result is an $O_\epsilon(m + n\log n)$ time construction of a spanner with $(2k-1)(1+\epsilon)$ stretch, $O(\log k\cdot n^{1+1/k})$ edges and $O_\epsilon(\log k\cdot n^{1/k})$ lightness. This is an exponential improvement in the dependence on $k$ compared to the previous result with such running time. Finally, for the important special case where $k=\log n$, for every constant $\epsilon>0$, we provide an $O(m+n^{1+\epsilon})$ time construction that produces an $O(\log n)$-spanner with $O(n)$ edges and $O(1)$ lightness which is asymptotically optimal. This is the first known sub-quadratic construction of such a spanner for any $k = \omega(1)$. To achieve our constructions, we show a novel deterministic incremental approximate distance oracle, which may be of independent interest.
We study fairness through the lens of cooperative multi-agent learning. Our work is motivated by empirical evidence that naive maximization of team reward yields unfair outcomes for individual team members. To address fairness in multi-agent contexts, we introduce team fairness, a group-based fairness measure for multi-agent learning. We then prove that it is possible to enforce team fairness during policy optimization by transforming the team's joint policy into an equivariant map. We refer to our multi-agent learning strategy as Fairness through Equivariance (Fair-E) and demonstrate its effectiveness empirically. We then introduce Fairness through Equivariance Regularization (Fair-ER) as a soft-constraint version of Fair-E and show that it reaches higher levels of utility than Fair-E and fairer outcomes than non-equivariant policies. Finally, we present novel findings regarding the fairness-utility trade-off in multi-agent settings; showing that the magnitude of the trade-off is dependent on agent skill.
We couple the L1 discretization for Caputo derivative in time with spectral Galerkin method in space to devise a scheme that solves quasilinear subdiffusion equations. Both the diffusivity and the source are allowed to be nonlinear functions of the solution. We prove method's stability and convergence with spectral accuracy in space. The temporal order depends on solution's regularity in time. Further, we support our results with numerical simulations that utilize parallelism for spatial discretization. Moreover, as a side result we find asymptotic exact values of error constants along with their remainders for discretizations of Caputo derivative and fractional integrals. These constants are the smallest possible which improves the previously established results from the literature.
We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.
This work considers the problem of provably optimal reinforcement learning for episodic finite horizon MDPs, i.e. how an agent learns to maximize his/her long term reward in an uncertain environment. The main contribution is in providing a novel algorithm --- Variance-reduced Upper Confidence Q-learning (vUCQ) --- which enjoys a regret bound of $\widetilde{O}(\sqrt{HSAT} + H^5SA)$, where the $T$ is the number of time steps the agent acts in the MDP, $S$ is the number of states, $A$ is the number of actions, and $H$ is the (episodic) horizon time. This is the first regret bound that is both sub-linear in the model size and asymptotically optimal. The algorithm is sub-linear in that the time to achieve $\epsilon$-average regret for any constant $\epsilon$ is $O(SA)$, which is a number of samples that is far less than that required to learn any non-trivial estimate of the transition model (the transition model is specified by $O(S^2A)$ parameters). The importance of sub-linear algorithms is largely the motivation for algorithms such as $Q$-learning and other "model free" approaches. vUCQ algorithm also enjoys minimax optimal regret in the long run, matching the $\Omega(\sqrt{HSAT})$ lower bound. Variance-reduced Upper Confidence Q-learning (vUCQ) is a successive refinement method in which the algorithm reduces the variance in $Q$-value estimates and couples this estimation scheme with an upper confidence based algorithm. Technically, the coupling of both of these techniques is what leads to the algorithm enjoying both the sub-linear regret property and the asymptotically optimal regret.