The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ where $ \{ \Phi_n \}$ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3): {(i)} An appropriate Lyapunov function is constructed that implies convergence of the estimates in $L_4$. {(ii)} A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $\textsf{E} [ z_n z_n^T ]$ to the asymptotic covariance in the CLT, where $z_n{=:} (\theta_n-\theta^*)/\sqrt{\alpha_n}$. {(iii)} The CLT holds for the normalized version $z^{\text{PR}}_n{=:} \sqrt{n} [\theta^{\text{PR}}_n -\theta^*]$, of the averaged parameters $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$, subject to standard assumptions on the step-size. Moreover, the covariance in the CLT coincides with the minimal covariance of Polyak and Ruppert. {(iv)} An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and $\Phi$ is a geometrically ergodic Markov chain but does not satisfy (DV3). While the algorithm is convergent, the second moment of $\theta_n$ is unbounded and in fact diverges. {\bf This arXiv version 3 represents a major extension of the results in prior versions.} The main results now allow for parameter-dependent noise, as is often the case in applications to reinforcement learning.
In this note we show examples of total Boolean functions that depend on $n$ variables and have spectral sensitivity $\Theta(\sqrt{\log n})$, which is asymptotically minimal.
Let $G$ be a graph of a network system with vertices, $V(G)$, representing physical locations and edges, $E(G)$, representing informational connectivity. A \emph{locating-dominating (LD)} set $S \subseteq V(G)$ is a subset of vertices representing detectors capable of sensing an "intruder" at precisely their location or somewhere in their open-neighborhood -- an LD set must be capable of locating an intruder anywhere in the graph. We explore three types of fault-tolerant LD sets: \emph{redundant LD} sets, which allow a detector to be removed, \emph{error-detecting LD} sets, which allow at most one false negative, and \emph{error-correcting LD} sets, which allow at most one error (false positive or negative). In particular, we determine lower and upper bounds for the minimum density of these three fault-tolerant locating-dominating sets in the \emph{infinite king grid}.
In this paper, we introduce and study the following question. Let $\mathcal G$ be a family of graphs and let $k\geq 3$ be an integer. What is the largest value $f_k(n)$ such that every $n$-vertex graph in $\mathcal G$ has an induced subgraph with degree at most $k$ and with $f_k(n)$ vertices? Similar questions, in which one seeks a large induced forest, or a large induced linear forest, or a large induced $d$-degenerate graph, rather than a large induced graph of bounded degree, have been studied for decades and have given rise to some of the most fascinating and elusive conjectures in Graph Theory. We tackle our problem when $\mathcal G$ is the class of the outerplanar graphs, or the class of the planar graphs, or the class of the graphs whose degree is bounded by a value $d>k$. In all cases, we provide upper and lower bounds on the value of $f_k(n)$. For example, we prove that every $n$-vertex planar graph has an induced subgraph with degree at most $3$ and with $\frac{5n}{13}>0.384n$ vertices, and that there exist $n$-vertex planar graphs whose largest induced subgraph with degree at most $3$ has $\frac{4n}{7}+O(1)<0.572n+O(1)$ vertices.
We propose a method for estimating a log-concave density on $\mathbb R^d$ from samples, under the assumption that there exists an orthogonal transformation that makes the components of the random vector independent. While log-concave density estimation is hard both computationally and statistically, the independent components assumption alleviates both issues, while still maintaining a large non-parametric class. We prove that under mild conditions, at most $\tilde{\mathcal{O}}(\epsilon^{-4})$ samples (suppressing constants and log factors) suffice for our proposed estimator to be within $\epsilon$ of the original density in squared Hellinger distance. On the computational front, while the usual log-concave maximum likelihood estimate can be obtained via a finite-dimensional convex program, it is slow to compute -- especially in higher dimensions. We demonstrate through numerical experiments that our estimator can be computed efficiently, making it more practical to use.
Given a finite set satisfying condition $\mathcal{A}$, the subset selection problem asks, how large of a subset satisfying condition $\mathcal{B}$ can we find? We make progress on three instances of subset selection problems in planar point sets. Let $n,s\in\mathbb{N}$ with $n\geq s$, and let $P\subseteq\mathbb{R}^2$ be a set of $n$ points, where at most $s$ points lie on the same line. Firstly, we select a general position subset of $P$, i.e., a subset containing no $3$ points on the same line. This problem was proposed by Erd\H{o}s under the regime when $s$ is a constant. For $s$ being non-constant, we give new lower and upper bounds on the maximum size of such a subset. In particular, we show that in the worst case such a set can have size at most $O(n/s)$ when $n^{1/3}\leq s\leq n$ and $O(n^{5/6+o(1)}/\sqrt{s})$ when $3\leq s\leq n^{1/3}$. Secondly, we select a monotone general position subset of $P$, that is, a subset in general position where the points are ordered from left to right and their $y$-coordinates are either non-decreasing or non-increasing. We present bounds on the maximum size of such a subset. In particular, when $s=\Theta(\sqrt{n})$, our upper and lower bounds differ only by a logarithmic factor. Lastly, we select a subset of $P$ with pairwise distinct slopes. This problem was initially studied by Erd\H{o}s, Graham, Ruzsa, and Taylor on the grid. We show that for $s=O(\sqrt{n})$ such a subset of size $\Omega((n/\log{s})^{1/3})$ can always be found in $P$. When $s=\Theta(\sqrt{n})$, this matches a lower bound given by Zhang on the grid. As for the upper bound, we show that in the worst case such a subset has size at most $O(\sqrt{n})$ for $2\leq s\leq n^{3/8}$ and $O((n/s)^{4/5})$ for $n^{3/8}\leq s=O(\sqrt{n})$. The proofs use a wide range of tools such as incidence geometry, probabilistic methods, the hypergraph container method, and additive combinatorics.
We study the discrete quantum walk on a regular graph $X$ that assigns negative identity coins to marked vertices $S$ and Grover coins to the unmarked ones. We find combinatorial bases for the eigenspaces of the transtion matrix, and derive a formula for the average vertex mixing matrix $\AMM$. We then find bounds for entries in $\AMM$, and study when these bounds are tight. In particular, the average probabilities between marked vertices are lower bounded by a matrix determined by the induced subgraph $X[S]$, the vertex-deleted subgraph $X\backslash S$, and the edge deleted subgraph $X-E(S)$. We show this bound is achieved if and only if the marked vertices have walk-equitable neighborhoods in the vertex-deleted subgraph. Finally, for quantum walks attaining this bound, we determine when $\AMM[S,S]$ is symmetric, positive semidefinite or uniform.
We consider temporal numeric planning problems $\Pi$ expressed in PDDL2.1 level 3, and show how to produce SMT formulas $(i)$ whose models correspond to valid plans of $\Pi$, and $(ii)$ that extend the recently proposed planning with patterns approach from the numeric to the temporal case. We prove the correctness and completeness of the approach and show that it performs very well on 10 domains with required concurrency.
We present {\em generative clustering} (GC) for clustering a set of documents, $\mathrm{X}$, by using texts $\mathrm{Y}$ generated by large language models (LLMs) instead of by clustering the original documents $\mathrm{X}$. Because LLMs provide probability distributions, the similarity between two documents can be rigorously defined in an information-theoretic manner by the KL divergence. We also propose a natural, novel clustering algorithm by using importance sampling. We show that GC achieves the state-of-the-art performance, outperforming any previous clustering method often by a large margin. Furthermore, we show an application to generative document retrieval in which documents are indexed via hierarchical clustering and our method improves the retrieval accuracy.
This paper presents an algorithmic method for generating random orthogonal matrices \(A\) that satisfy the property \(A^t S A = S\), where \(S\) is a fixed real invertible symmetric or skew-symmetric matrix. This method is significant as it generalizes the procedures for generating orthogonal matrices that fix a general fixed symmetric or skew-symmetric bilinear form. These include orthogonal matrices that fall to groups such as the symplectic group, Lorentz group, Poincar\'e group, and more generally the indefinite orthogonal group, to name a few. These classes of matrices play crucial roles in diverse fields such as theoretical physics, where they are used to describe symmetries and conservation laws, as well as in computational geometry, numerical analysis, and number theory, where they are integral to the study of quadratic forms and modular forms. The implementation of our algorithms can be accomplished using standard linear algebra libraries.
A matrix $\Phi \in \mathbb{R}^{Q \times N}$ satisfies the restricted isometry property if $\|\Phi x\|_2^2$ is approximately equal to $\|x\|_2^2$ for all $k$-sparse vectors $x$. We give a construction of RIP matrices with the optimal $Q = O(k \log(N/k))$ rows using $O(k\log(N/k)\log(k))$ bits of randomness. The main technical ingredient is an extension of the Hanson-Wright inequality to $\epsilon$-biased distributions.