The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ in which $\Phi$ is a geometrically ergodic Markov chain on a general state space $\textsf{X}$ with stationary distribution $\pi$, and $f:\Re^d\times\textsf{X}\to\Re^d$. The main results are established under a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3), and a stability condition for the mean flow with vector field $\bar{f}(\theta)=\textsf{E}[f(\theta,\Phi)]$, with $\Phi\sim\pi$. (i) $\{ \theta_n\}$ is convergent a.s. and in $L_4$ to the unique root $\theta^*$ of $\bar{f}(\theta)$. (ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. (iii) The CLT holds for the normalized version, $z_n{=:} \sqrt{n} (\theta^{\text{PR}}_n -\theta^*)$, of the averaged parameters, $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$, subject to standard assumptions on the step-size. Moreover, the normalized covariance converges, $$ \lim_{n \to \infty} n \textsf{E} [ {\widetilde{\theta}}^{\text{ PR}}_n ({\widetilde{\theta}}^{\text{ PR}}_n)^T ] = \Sigma_\theta^*,\;\;\;\textit{with $\widetilde{\theta}^{\text{ PR}}_n = \theta^{\text{ PR}}_n -\theta^*$,} $$ where $\Sigma_\theta^*$ is the minimal covariance of Polyak and Ruppert. (iv) An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and the Markov chain $\Phi$ is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded: $ \textsf{E} [ \| \theta_n \|^2 ] \to \infty$ as $n\to\infty$.
For a set of $p$-variate data points $\boldsymbol y_1,\ldots,\boldsymbol y_n$, there are several versions of multivariate median and related multivariate sign test proposed and studied in the literature. In this paper we consider the asymptotic properties of the multivariate extension of the Hodges-Lehmann (HL) estimator, the spatial HL-estimator, and the related test statistic. The asymptotic behavior of the spatial HL-estimator and the related test statistic when $n$ tends to infinity are collected, reviewed, and proved, some for the first time though being used already for a longer time. We also derive the limiting behavior of the HL-estimator when both the sample size $n$ and the dimension $p$ tend to infinity.
We study the expressive power of first-order logic with counting quantifiers, especially the $k$-variable and quantifier-rank-$q$ fragment $\mathsf{C}^k_q$, using homomorphism indistinguishability. Recently, Dawar, Jakl, and Reggio (2021) proved that two graphs satisfy the same $\mathsf{C}^k_q$-sentences if and only if they are homomorphism indistinguishable over the class $\mathcal{T}^k_q$ of graphs admitting a $k$-pebble forest cover of depth $q$. Their proof builds on the categorical framework of game comonads developed by Abramsky, Dawar, and Wang (2017). We reprove their result using elementary techniques inspired by Dvo\v{r}\'ak (2010). Using these techniques we also give a characterisation of guarded counting logic. Our main focus, however, is to provide a graph theoretic analysis of the graph class $\mathcal{T}^k_q$. This allows us to separate $\mathcal{T}^k_q$ from the intersection of the graph class $\mathcal{TW}_{k-1}$, that is graphs of treewidth less or equal $k-1$, and $\mathcal{TD}_q$, that is graphs of treedepth at most $q$ if $q$ is sufficiently larger than $k$. We are able to lift this separation to the semantic separation of the respective homomorphism indistinguishability relations. A part of this separation is to prove that the class $\mathcal{TD}_q$ is homomorphism distinguishing closed, which was already conjectured by Roberson (2022).
A parameterized string (p-string) is a string over an alphabet $(\Sigma_{s} \cup \Sigma_{p})$, where $\Sigma_{s}$ and $\Sigma_{p}$ are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings $x$ and $y$ are said to parameterized match (p-match) if and only if $x$ can be transformed into $y$ by applying a bijection on $\Sigma_{p}$ to every occurrence of p-symbols in $x$. The indexing problem for p-matching is to preprocess a p-string $T$ of length $n$ so that we can efficiently find the occurrences of substrings of $T$ that p-match with a given pattern. Extending the Burrows-Wheeler Transform (BWT) based index for exact string pattern matching, Ganguly et al. [SODA 2017] proposed the first compact index (named pBWT) for p-matching, and posed an open problem on how to construct it in compact space, i.e., in $O(n \lg |\Sigma_{s} \cup \Sigma_{p}|)$ bits of space. Hashimoto et al. [SPIRE 2022] partially solved this problem by showing how to construct some components of pBWTs for $T$ in $O(n \frac{|\Sigma_{p}| \lg n}{\lg \lg n})$ time in an online manner while reading the symbols of $T$ from right to left. In this paper, we improve the time complexity to $O(n \frac{\lg |\Sigma_{p}| \lg n}{\lg \lg n})$. We remark that removing the multiplicative factor of $|\Sigma_{p}|$ from the complexity is of great interest because it has not been achieved for over a decade in the construction of related data structures like parameterized suffix arrays even in the offline setting. We also show that our data structure can support backward search, a core procedure of BWT-based indexes, at any stage of the online construction, making it the first compact index for p-matching that can be constructed in compact space and even in an online manner.
In the online facility assignment on a line ${\rm OFAL}(S,c)$ with a set $S$ of $k$ servers and a capacity $c:S\to\mathbb{N}$, each server $s\in S$ with a capacity $c(s)$ is placed on a line, and a request arrives on a line one-by-one. The task of an online algorithm is to irrevocably match a current request with one of the servers with vacancies before the next request arrives. An algorithm can match up to $c(s)$ requests to a server $s\in S$. In this paper, we propose a new online algorithm PTCP (Policy Transition at Critical Point) for $\mathrm{OFAL}(S,c)$ and show that PTCP is $(2\alpha(S)+1)$-competitive, where $\alpha(S)$ is informally the ratio of the diameter of $S$ to the maximum distance between two adjacent servers in $S$. Depending on the layout of servers, $\alpha(S)$ ranges from constant (independent of $k$) to $k-1$. Among all of known algorithms for $\mathrm{OFAL}(S,c)$, this upper bound on the competitive ratio is the best when $\alpha(S)$ is small. We also show that the competitive ratio of any MPFS (Most Preferred Free Servers) algorithm is at least $2\alpha(S)+1$. For $\mathrm{OFAL}(S,c)$, recall that MPFS is a class of algorithms whose competitive ratio does not depend on a capacity $c$ and it includes the natural greedy algorithm and PTCP, etc. Thus, this implies that PTCP is the best for $\mathrm{OFAL}(S,c)$ in the class MPFS.
Let $ \bbB_n =\frac{1}{n}(\bbR_n + \bbT^{1/2}_n \bbX_n)(\bbR_n + \bbT^{1/2}_n \bbX_n)^* $ where $ \bbX_n $ is a $ p \times n $ matrix with independent standardized random variables, $ \bbR_n $ is a $ p \times n $ non-random matrix, representing the information, and $ \bbT_{n} $ is a $ p \times p $ non-random nonnegative definite Hermitian matrix. Under some conditions on $ \bbR_n \bbR_n^* $ and $ \bbT_n $, it has been proved that for any closed interval outside the support of the limit spectral distribution, with probability one there will be no eigenvalues falling in this interval for all $ p $ sufficiently large. The purpose of this paper is to carry on with the study of the support of the limit spectral distribution, and we show that there is an exact separation phenomenon: with probability one, the proper number of eigenvalues lie on either side of these intervals.
A closed quasigeodesic is a closed curve on the surface of a polyhedron with at most $180^\circ$ of surface on both sides at all points; such curves can be locally unfolded straight. In 1949, Pogorelov proved that every convex polyhedron has at least three (non-self-intersecting) closed quasigeodesics, but the proof relies on a nonconstructive topological argument. We present the first finite algorithm to find a closed quasigeodesic on a given convex polyhedron, which is the first positive progress on a 1990 open problem by O'Rourke and Wyman. The algorithm establishes for the first time a quasipolynomial upper bound on the total number of visits to faces (number of line segments), namely, $O\left(\frac{n \, L^3}{\epsilon^2 \, \ell^3}\right)$ where $n$ is the number of vertices of the polyhedron, $\epsilon$ is the minimum curvature of a vertex, $L$ is the length of the longest edge, and $\ell$ is the smallest distance within a face between a vertex and a nonincident edge (minimum feature size of any face). On the real RAM, the algorithm's running time is also pseudopolynomial, namely $O\left(\frac{n \, L^3}{\epsilon^2 \, \ell^3} \log n\right)$. On a word RAM, the running time grows to $O\left(b^2 \cdot \frac{n^8 \log n}{\epsilon^8} \cdot \frac{L^{21}}{\ell^{21}}\cdot 2^{O(|\Lambda|)}\right)$, where $|\Lambda|$ is the number of distinct edge lengths in the polyhedron, assuming its intrinsic or extrinsic geometry is given by rational coordinates each with at most $b$ bits. This time bound remains pseudopolynomial for polyhedra with $O(\log n)$ distinct edges lengths, but is exponential in the worst case. Along the way, we introduce the expression RAM model of computation, formalizing a connection between the real RAM and word RAM hinted at by past work on exact geometric computation.
We revisit the main result of Carmosino et al \cite{CILM18} which shows that an $\Omega(n^{\omega/2+\epsilon})$ size noncommutative arithmetic circuit size lower bound (where $\omega$ is the matrix multiplication exponent) for a constant-degree $n$-variate polynomial family $(g_n)_n$, where each $g_n$ is a noncommutative polynomial, can be ``lifted'' to an exponential size circuit size lower bound for another polynomial family $(f_n)$ obtained from $(g_n)$ by a lifting process. In this paper, we present a simpler and more conceptual automata-theoretic proof of their result.
We study pseudo-polynomial time algorithms for the fundamental \emph{0-1 Knapsack} problem. In terms of $n$ and $w_{\max}$, previous algorithms for 0-1 Knapsack have cubic time complexities: $O(n^2w_{\max})$ (Bellman 1957), $O(nw_{\max}^2)$ (Kellerer and Pferschy 2004), and $O(n + w_{\max}^3)$ (Polak, Rohwedder, and W\k{e}grzycki 2021). On the other hand, fine-grained complexity only rules out $O((n+w_{\max})^{2-\delta})$ running time, and it is an important question in this area whether $\tilde O(n+w_{\max}^2)$ time is achievable. Our main result makes significant progress towards solving this question: - The 0-1 Knapsack problem has a deterministic algorithm in $\tilde O(n + w_{\max}^{2.5})$ time. Our techniques also apply to the easier \emph{Subset Sum} problem: - The Subset Sum problem has a randomized algorithm in $\tilde O(n + w_{\max}^{1.5})$ time. This improves (and simplifies) the previous $\tilde O(n + w_{\max}^{5/3})$-time algorithm by Polak, Rohwedder, and W\k{e}grzycki (2021) (based on Galil and Margalit (1991), and Bringmann and Wellnitz (2021)). Similar to recent works on Knapsack (and integer programs in general), our algorithms also utilize the \emph{proximity} between optimal integral solutions and fractional solutions. Our new ideas are as follows: - Previous works used an $O(w_{\max})$ proximity bound in the $\ell_1$-norm. As our main conceptual contribution, we use an additive-combinatorial theorem by Erd\H{o}s and S\'{a}rk\"{o}zy (1990) to derive an $\ell_0$-proximity bound of $\tilde O(\sqrt{w_{\max}})$. - Then, the main technical component of our Knapsack result is a dynamic programming algorithm that exploits both $\ell_0$- and $\ell_1$-proximity. It is based on a vast extension of the ``witness propagation'' method, originally designed by Deng, Mao, and Zhong (2023) for the easier \emph{unbounded} setting only.
In multi-turn dialog, utterances do not always take the full form of sentences \cite{Carbonell1983DiscoursePA}, which naturally makes understanding the dialog context more difficult. However, it is essential to fully grasp the dialog context to generate a reasonable response. Hence, in this paper, we propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question, where the question is focused on the omitted information in the dialog. Enlightened by the multi-task learning scheme, we propose a joint framework that unifies these two tasks, sharing the same encoder to extract the common and task-invariant features with different decoders to learn task-specific features. To better fusing information from the question and the dialog history in the encoding part, we propose to augment the Transformer architecture with a memory updater, which is designed to selectively store and update the history dialog information so as to support downstream tasks. For the experiment, we employ human annotators to write and examine a large-scale dialog reading comprehension dataset. Extensive experiments are conducted on this dataset, and the results show that the proposed model brings substantial improvements over several strong baselines on both tasks. In this way, we demonstrate that reasoning can indeed help better response generation and vice versa. We release our large-scale dataset for further research.
While existing work in robust deep learning has focused on small pixel-level $\ell_p$ norm-based perturbations, this may not account for perturbations encountered in several real world settings. In many such cases although test data might not be available, broad specifications about the types of perturbations (such as an unknown degree of rotation) may be known. We consider a setup where robustness is expected over an unseen test domain that is not i.i.d. but deviates from the training domain. While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes. We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space, without having access to the data from the test domain. Our adversarial training solves a min-max optimization problem, with the inner maximization generating adversarial perturbations, and the outer minimization finding model parameters by optimizing the loss on adversarial perturbations generated from the inner maximization. We demonstrate the applicability of our approach on three types of naturally occurring perturbations -- object-related shifts, geometric transformations, and common image corruptions. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations. We demonstrate the usefulness of the proposed approach by showing the robustness gains of deep neural networks trained using our adversarial training on MNIST, CIFAR-10, and a new variant of the CLEVR dataset.