Given tensors $\boldsymbol{\mathscr{A}}, \boldsymbol{\mathscr{B}}, \boldsymbol{\mathscr{C}}$ of size $m \times 1 \times n$, $m \times p \times 1$, and $1\times p \times n$, respectively, their Bhattacharya-Mesner (BM) product will result in a third order tensor of dimension $m \times p \times n$ and BM-rank of 1 (Mesner and Bhattacharya, 1990). Thus, if a third-order tensor can be written as a sum of a small number of such BM-rank 1 terms, this BM-decomposition (BMD) offers an implicitly compressed representation of the tensor. Therefore, in this paper, we give a generative model which illustrates that spatio-temporal video data can be expected to have low BM-rank. Then, we discuss non-uniqueness properties of the BMD and give an improved bound on the BM-rank of a third-order tensor. We present and study properties of an iterative algorithm for computing an approximate BMD, including convergence behavior and appropriate choices for starting guesses that allow for the decomposition of our spatial-temporal data into stationary and non-stationary components. Several numerical experiments show the impressive ability of our BMD algorithm to extract important temporal information from video data while simultaneously compressing the data. In particular, we compare our approach with dynamic mode decomposition (DMD): first, we show how the matrix-based DMD can be reinterpreted in tensor BMP form, then we explain why the low BM-rank decomposition can produce results with superior compression properties while simultaneously providing better separation of stationary and non-stationary features in the data. We conclude with a comparison of our low BM-rank decomposition to two other tensor decompositions, CP and the t-SVDM.
Given a target distribution $\pi$ and an arbitrary Markov infinitesimal generator $L$ on a finite state space $\mathcal{X}$, we develop three structured and inter-related approaches to generate new reversiblizations from $L$. The first approach hinges on a geometric perspective, in which we view reversiblizations as projections onto the space of $\pi$-reversible generators under suitable information divergences such as $f$-divergences. With different choices of functions $f$, we not only recover nearly all established reversiblizations but also unravel and generate new reversiblizations. Along the way, we unveil interesting geometric results such as bisection properties, Pythagorean identities, parallelogram laws and a Markov chain counterpart of the arithmetic-geometric-harmonic mean inequality governing these reversiblizations. This further serves as motivation for introducing the notion of information centroids of a sequence of Markov chains and to give conditions for their existence and uniqueness. Building upon the first approach, we view reversiblizations as generalized means. In this second approach, we construct new reversiblizations via different natural notions of generalized means such as the Cauchy mean or the dual mean. In the third approach, we combine the recently introduced locally-balanced Markov processes framework and the notion of convex $*$-conjugate in the study of $f$-divergence. The latter offers a rich source of balancing functions to generate new reversiblizations.
Let $ \bbB_n =\frac{1}{n}(\bbR_n + \bbT^{1/2}_n \bbX_n)(\bbR_n + \bbT^{1/2}_n \bbX_n)^* $ where $ \bbX_n $ is a $ p \times n $ matrix with independent standardized random variables, $ \bbR_n $ is a $ p \times n $ non-random matrix, representing the information, and $ \bbT_{n} $ is a $ p \times p $ non-random nonnegative definite Hermitian matrix. Under some conditions on $ \bbR_n \bbR_n^* $ and $ \bbT_n $, it has been proved that for any closed interval outside the support of the limit spectral distribution, with probability one there will be no eigenvalues falling in this interval for all $ p $ sufficiently large. The purpose of this paper is to carry on with the study of the support of the limit spectral distribution, and we show that there is an exact separation phenomenon: with probability one, the proper number of eigenvalues lie on either side of these intervals.
Reinforcement Learning (RL), bolstered by the expressive capabilities of Deep Neural Networks (DNNs) for function approximation, has demonstrated considerable success in numerous applications. However, its practicality in addressing various real-world scenarios, characterized by diverse and unpredictable dynamics, noisy signals, and large state and action spaces, remains limited. This limitation stems from issues such as poor data efficiency, limited generalization capabilities, a lack of safety guarantees, and the absence of interpretability, among other factors. To overcome these challenges and improve performance across these crucial metrics, one promising avenue is to incorporate additional structural information about the problem into the RL learning process. Various sub-fields of RL have proposed methods for incorporating such inductive biases. We amalgamate these diverse methodologies under a unified framework, shedding light on the role of structure in the learning problem, and classify these methods into distinct patterns of incorporating structure. By leveraging this comprehensive framework, we provide valuable insights into the challenges of structured RL and lay the groundwork for a design pattern perspective on RL research. This novel perspective paves the way for future advancements and aids in developing more effective and efficient RL algorithms that can potentially handle real-world scenarios better.
A closed quasigeodesic is a closed curve on the surface of a polyhedron with at most $180^\circ$ of surface on both sides at all points; such curves can be locally unfolded straight. In 1949, Pogorelov proved that every convex polyhedron has at least three (non-self-intersecting) closed quasigeodesics, but the proof relies on a nonconstructive topological argument. We present the first finite algorithm to find a closed quasigeodesic on a given convex polyhedron, which is the first positive progress on a 1990 open problem by O'Rourke and Wyman. The algorithm establishes for the first time a quasipolynomial upper bound on the total number of visits to faces (number of line segments), namely, $O\left(\frac{n \, L^3}{\epsilon^2 \, \ell^3}\right)$ where $n$ is the number of vertices of the polyhedron, $\epsilon$ is the minimum curvature of a vertex, $L$ is the length of the longest edge, and $\ell$ is the smallest distance within a face between a vertex and a nonincident edge (minimum feature size of any face). On the real RAM, the algorithm's running time is also pseudopolynomial, namely $O\left(\frac{n \, L^3}{\epsilon^2 \, \ell^3} \log n\right)$. On a word RAM, the running time grows to $O\left(b^2 \cdot \frac{n^8 \log n}{\epsilon^8} \cdot \frac{L^{21}}{\ell^{21}}\cdot 2^{O(|\Lambda|)}\right)$, where $|\Lambda|$ is the number of distinct edge lengths in the polyhedron, assuming its intrinsic or extrinsic geometry is given by rational coordinates each with at most $b$ bits. This time bound remains pseudopolynomial for polyhedra with $O(\log n)$ distinct edges lengths, but is exponential in the worst case. Along the way, we introduce the expression RAM model of computation, formalizing a connection between the real RAM and word RAM hinted at by past work on exact geometric computation.
$ \newcommand{\epsA}{\Mh{\delta}} \newcommand{\Re}{\mathbb{R}} \newcommand{\reals}{\mathbb{R}} \newcommand{\SetX}{\mathsf{X}} \renewcommand{\P}{P} \newcommand{\diam}{\Delta} \newcommand{\Mh}[1]{#1} \newcommand{\query}{q} \newcommand{\eps}{\varepsilon} \newcommand{\VorX}[1]{\mathcal{V} \pth{#1}} \newcommand{\IntRange}[1]{[ #1 ]} \newcommand{\Space}{\overline{\mathsf{m}}} \newcommand{\pth}[2][\!]{#1\left({#2}\right)} \newcommand{\polylog}{\mathrm{polylog}} \newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\pt}{p} \newcommand{\distY}[2]{\left\| {#1} - {#2} \right\|} \newcommand{\PP}{P} \newcommand{\ptq}{q} \newcommand{\pts}{s}$Given a set $P \subset \Re^d$ of $n$ points, with diameter $\diam$, and a parameter $\epsA \in (0,1)$, it is known that there is a partition of $P$ into sets $P_1, \ldots, P_t$, each of size $O(1/\epsA^2)$, such that their convex-hulls all intersect a common ball of radius $\epsA \diam$. We prove that a random partition, with a simple alteration step, yields the desired partition, resulting in a (randomized) linear time algorithm. We also provide a deterministic algorithm with running time $O( dn \log n)$. Previous proofs were either existential (i.e., at least exponential time), or required much bigger sets. In addition, the algorithm and its proof of correctness are significantly simpler than previous work, and the constants are slightly better. We also include a number of applications and extensions using the same central ideas. For example, we provide a linear time algorithm for computing a ``fuzzy'' centerpoint, and prove a no-dimensional weak $\eps$-net theorem with an improved constant.
For a graph $G = (V, E)$ with vertex set $V$ and edge set $E$, a function $ f : V \rightarrow \{0, 1, 2, . . . , diam(G)\} $ is called a $\textit{broadcast}$ on $G$. For each vertex $u \in V$, if there exists a vertex $v$ in $G$ (possibly, $u = v$) such that $f (v) > 0$ and $d(u, v) \leq f (v)$, then $f$ is called a $\textit{dominating broadcast}$ on $G$. The $\textit{cost}$ of the dominating broadcast $f$ is the quantity $ \sum_{v\in V}f(v)$. The minimum cost of a dominating broadcast is the \textit{broadcast domination number} of $G$, denoted by $ \gamma_{b}(G) $. A $\textit{multipacking}$ is a set $S \subseteq V$ in a graph $G = (V, E)$ such that for every vertex $v \in V$ and for every integer $r \geq 1$, the ball of radius $r$ around $v$ contains at most $r$ vertices of $S$, that is, there are at most $r$ vertices in $S$ at a distance at most $r$ from $v$ in $G$. The $\textit{multipacking number}$ of $G$ is the maximum cardinality of a multipacking of $ G $ and is denoted by $ mp(G) $. We show that, for any cactus graph $G$, $\gamma_b(G)\leq \frac{3}{2}mp(G)+\frac{11}{2}$. We also show that $\gamma_b(G)-mp(G)$ can be arbitrarily large for cactus graphs by constructing an infinite family of cactus graphs such that the ratio $\gamma_b(G)/mp(G)=4/3$, with $mp(G)$ arbitrarily large. This result shows that, for cactus graphs, we cannot improve the bound $\gamma_b(G)\leq \frac{3}{2}mp(G)+\frac{11}{2}$ to a bound in the form $\gamma_b(G)\leq c_1\cdot mp(G)+c_2$, for any constant $c_1<4/3$ and $c_2$. Moreover, we provide an $O(n)$-time algorithm to construct a multipacking of $G$ of size at least $\frac{2}{3}mp(G)-\frac{11}{3}$, where $n$ is the number of vertices of the graph $G$.
We revisit the main result of Carmosino et al \cite{CILM18} which shows that an $\Omega(n^{\omega/2+\epsilon})$ size noncommutative arithmetic circuit size lower bound (where $\omega$ is the matrix multiplication exponent) for a constant-degree $n$-variate polynomial family $(g_n)_n$, where each $g_n$ is a noncommutative polynomial, can be ``lifted'' to an exponential size circuit size lower bound for another polynomial family $(f_n)$ obtained from $(g_n)$ by a lifting process. In this paper, we present a simpler and more conceptual automata-theoretic proof of their result.
We study pseudo-polynomial time algorithms for the fundamental \emph{0-1 Knapsack} problem. In terms of $n$ and $w_{\max}$, previous algorithms for 0-1 Knapsack have cubic time complexities: $O(n^2w_{\max})$ (Bellman 1957), $O(nw_{\max}^2)$ (Kellerer and Pferschy 2004), and $O(n + w_{\max}^3)$ (Polak, Rohwedder, and W\k{e}grzycki 2021). On the other hand, fine-grained complexity only rules out $O((n+w_{\max})^{2-\delta})$ running time, and it is an important question in this area whether $\tilde O(n+w_{\max}^2)$ time is achievable. Our main result makes significant progress towards solving this question: - The 0-1 Knapsack problem has a deterministic algorithm in $\tilde O(n + w_{\max}^{2.5})$ time. Our techniques also apply to the easier \emph{Subset Sum} problem: - The Subset Sum problem has a randomized algorithm in $\tilde O(n + w_{\max}^{1.5})$ time. This improves (and simplifies) the previous $\tilde O(n + w_{\max}^{5/3})$-time algorithm by Polak, Rohwedder, and W\k{e}grzycki (2021) (based on Galil and Margalit (1991), and Bringmann and Wellnitz (2021)). Similar to recent works on Knapsack (and integer programs in general), our algorithms also utilize the \emph{proximity} between optimal integral solutions and fractional solutions. Our new ideas are as follows: - Previous works used an $O(w_{\max})$ proximity bound in the $\ell_1$-norm. As our main conceptual contribution, we use an additive-combinatorial theorem by Erd\H{o}s and S\'{a}rk\"{o}zy (1990) to derive an $\ell_0$-proximity bound of $\tilde O(\sqrt{w_{\max}})$. - Then, the main technical component of our Knapsack result is a dynamic programming algorithm that exploits both $\ell_0$- and $\ell_1$-proximity. It is based on a vast extension of the ``witness propagation'' method, originally designed by Deng, Mao, and Zhong (2023) for the easier \emph{unbounded} setting only.
We study the tolerant testing problem for high-dimensional samplers. Given as input two samplers $\mathcal{P}$ and $\mathcal{Q}$ over the $n$-dimensional space $\{0,1\}^n$, and two parameters $\varepsilon_2 > \varepsilon_1$, the goal of tolerant testing is to test whether the distributions generated by $\mathcal{P}$ and $\mathcal{Q}$ are $\varepsilon_1$-close or $\varepsilon_2$-far. Since exponential lower bounds (in $n$) are known for the problem in the standard sampling model, research has focused on models where one can draw \textit{conditional} samples. Among these models, \textit{subcube conditioning} ($\mathsf{SUBCOND}$), which allows conditioning on arbitrary subcubes of the domain, holds the promise of widespread adoption in practice owing to its ability to capture the natural behavior of samplers in constrained domains. To translate the promise into practice, we need to overcome two crucial roadblocks for tests based on $\mathsf{SUBCOND}$: the prohibitively large number of queries ($\tilde{\mathcal{O}}(n^5/\varepsilon_2^5)$) and limitation to non-tolerant testing (i.e., $\varepsilon_1 = 0$). The primary contribution of this work is to overcome the above challenges: we design a new tolerant testing methodology (i.e., $\varepsilon_1 \geq 0$) that allows us to significantly improve the upper bound to $\tilde{\mathcal{O}}(n^3/(\varepsilon_2-\varepsilon_1)^5)$.
In the misspecified spectral algorithms problem, researchers usually assume the underground true function $f_{\rho}^{*} \in [\mathcal{H}]^{s}$, a less-smooth interpolation space of a reproducing kernel Hilbert space (RKHS) $\mathcal{H}$ for some $s\in (0,1)$. The existing minimax optimal results require $\|f_{\rho}^{*}\|_{L^{\infty}}<\infty$ which implicitly requires $s > \alpha_{0}$ where $\alpha_{0}\in (0,1)$ is the embedding index, a constant depending on $\mathcal{H}$. Whether the spectral algorithms are optimal for all $s\in (0,1)$ is an outstanding problem lasting for years. In this paper, we show that spectral algorithms are minimax optimal for any $\alpha_{0}-\frac{1}{\beta} < s < 1$, where $\beta$ is the eigenvalue decay rate of $\mathcal{H}$. We also give several classes of RKHSs whose embedding index satisfies $ \alpha_0 = \frac{1}{\beta} $. Thus, the spectral algorithms are minimax optimal for all $s\in (0,1)$ on these RKHSs.