We show that the minimax sample complexity for estimating the pseudo-spectral gap $\gamma_{\mathsf{ps}}$ of an ergodic Markov chain in constant multiplicative error is of the order of $$\tilde{\Theta}\left( \frac{1}{\gamma_{\mathsf{ps}} \pi_{\star}} \right),$$ where $\pi_\star$ is the minimum stationary probability, recovering the known bound in the reversible setting for estimating the absolute spectral gap [Hsu et al., 2019], and resolving an open problem of Wolfer and Kontorovich [2019]. Furthermore, we strengthen the known empirical procedure by making it fully-adaptive to the data, thinning the confidence intervals and reducing the computational complexity. Along the way, we derive new properties of the pseudo-spectral gap and introduce the notion of a reversible dilation of a stochastic matrix.
Distributed maximization of a submodular function in the MapReduce model has received much attention, culminating in two frameworks that allow a centralized algorithm to be run in the MR setting without loss of approximation, as long as the centralized algorithm satisfies a certain consistency property - which had only been shown to be satisfied by the standard greedy and continous greedy algorithms. A separate line of work has studied parallelizability of submodular maximization in the adaptive complexity model, where each thread may have access to the entire ground set. For the size-constrained maximization of a monotone and submodular function, we show that several sublinearly adaptive algorithms satisfy the consistency property required to work in the MR setting, which yields highly practical parallelizable and distributed algorithms. Also, we develop the first linear-time distributed algorithm for this problem with constant MR rounds. Finally, we provide a method to increase the maximum cardinality constraint for MR algorithms at the cost of additional MR rounds.
We consider the problem of identifying, from statistics, a distribution of discrete random variables $X_1,\ldots,X_n$ that is a mixture of $k$ product distributions. The best previous sample complexity for $n \in O(k)$ was $(1/\zeta)^{O(k^2 \log k)}$ (under a mild separation assumption parameterized by $\zeta$). The best known lower bound was $\exp(\Omega(k))$. It is known that $n\geq 2k-1$ is necessary and sufficient for identification. We show, for any $n\geq 2k-1$, how to achieve sample complexity and run-time complexity $(1/\zeta)^{O(k)}$. We also extend the known lower bound of $e^{\Omega(k)}$ to match our upper bound across a broad range of $\zeta$. Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.
We can define the error distribution as the limiting distribution of the error between the solution $Y$ of a given stochastic differential equation (SDE) and its numerical approximation $\hat{Y}^{(m)}$, weighted by the convergence rate between the two. A goal when studying the error distribution is to provide a way of determination for error distributions for any SDE and numerical scheme that converge to the exact solution. By dividing the error into a main term and a remainder term in a particular way, the author shows that the remainder term can be negligible compared to the main term under certain suitable conditions. Under these conditions, deriving the error distribution reduces to deriving the limiting distribution of the main term. Even if the dimension is one, there are unsolved problems about the asymptotic behavior of the error when the SDE has a drift term and $0<H\leq 1/3$, but our result in the one-dimensional case can be adapted to any Hurst exponent. The main idea of the proof is to define a stochastic process $Y^{m, \rho}$ with the parameter $\rho$ interpolating between $Y$ and $\hat{Y}^{(m)}$ and to estimate the asymptotic expansion for it. Using this estimate, we determine the error distribution of the ($k$)-Milstein scheme and of the Crank-Nicholson scheme in unsolved cases.
We address the problem of enumerating all maximal clique-partitions of an undirected graph and present an algorithm based on the observation that every maximal clique-partition can be produced from the maximal clique-cover of the graph by assigning the vertices shared among maximal cliques, to belong to only one clique. This simple algorithm has the following drawbacks: (1) the search space is very large; (2) it finds some clique-partitions which are not maximal; and (3) some clique-partitions are found more than once. We propose two criteria to avoid these drawbacks. The outcome is an algorithm that explores a much smaller search space and guarantees that every maximal clique-partition is computed only once. The algorithm can be used in problems such as anti-unification with proximity relations or in resource allocation tasks when one looks for several alternative ways to allocate resources.
We explore the space of matrix-generated (0, m, 2)-nets and (0, 2)-sequences in base 2, also known as digital dyadic nets and sequences. In computer graphics, they are arguably leading the competition for use in rendering. We provide a complete characterization of the design space and count the possible number of constructions with and without considering possible reorderings of the point set. Based on this analysis, we then show that every digital dyadic net can be reordered into a sequence, together with a corresponding algorithm. Finally, we present a novel family of self-similar digital dyadic sequences, to be named $\xi$-sequences, that spans a subspace with fewer degrees of freedom. Those $\xi$-sequences are extremely efficient to sample and compute, and we demonstrate their advantages over the classic Sobol (0, 2)-sequence.
Join-preserving maps on the discrete time scale $\omega^+$, referred to as time warps, have been proposed as graded modalities that can be used to quantify the growth of information in the course of program execution. The set of time warps forms a simple distributive involutive residuated lattice -- called the time warp algebra -- that is equipped with residual operations relevant to potential applications. In this paper, we show that although the time warp algebra generates a variety that lacks the finite model property, it nevertheless has a decidable equational theory. We also describe an implementation of a procedure for deciding equations in this algebra, written in the OCaml programming language, that makes use of the Z3 theorem prover.
We propose an original approach to investigate the linearity of Gray codes obtained from $\mathbb{Z}_{2^L}$-additive codes by introducing two related binary codes: the associated and concatenated. Once they are defined, one could perform a straightforward analysis of the Schur product between their codewords and determine the linearity of the respective Gray code. This work expands on earlier contributions from the literature, where the linearity was established with respect to the kernel of a code and/or operations on $\mathbb{Z}_{2^L}$. The $\mathbb{Z}_{2^L}$-additive codes we apply the Gray map and check the linearity are the well-known Hadamard, simplex, MacDonald, Kerdock, and Preparata codes. We also present a family of Reed-Muller codes that yield to linear Gray codes and perform a computational verification of our proposed method applied to other $\mathbb{Z}_{2^L}$-additive codes.
We study the existence of optimal and p-optimal proof systems for classes in the Boolean hierarchy over $\mathrm{NP}$. Our main results concern $\mathrm{DP}$, i.e., the second level of this hierarchy: If all sets in $\mathrm{DP}$ have p-optimal proof systems, then all sets in $\mathrm{coDP}$ have p-optimal proof systems. The analogous implication for optimal proof systems fails relative to an oracle. As a consequence, we clarify such implications for all classes $\mathcal{C}$ and $\mathcal{D}$ in the Boolean hierarchy over $\mathrm{NP}$: either we can prove the implication or show that it fails relative to an oracle. Furthermore, we show that the sets $\mathrm{SAT}$ and $\mathrm{TAUT}$ have p-optimal proof systems, if and only if all sets in the Boolean hierarchy over $\mathrm{NP}$ have p-optimal proof systems which is a new characterization of a conjecture studied by Pudl\'ak.
We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.
In multi-turn dialog, utterances do not always take the full form of sentences \cite{Carbonell1983DiscoursePA}, which naturally makes understanding the dialog context more difficult. However, it is essential to fully grasp the dialog context to generate a reasonable response. Hence, in this paper, we propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question, where the question is focused on the omitted information in the dialog. Enlightened by the multi-task learning scheme, we propose a joint framework that unifies these two tasks, sharing the same encoder to extract the common and task-invariant features with different decoders to learn task-specific features. To better fusing information from the question and the dialog history in the encoding part, we propose to augment the Transformer architecture with a memory updater, which is designed to selectively store and update the history dialog information so as to support downstream tasks. For the experiment, we employ human annotators to write and examine a large-scale dialog reading comprehension dataset. Extensive experiments are conducted on this dataset, and the results show that the proposed model brings substantial improvements over several strong baselines on both tasks. In this way, we demonstrate that reasoning can indeed help better response generation and vice versa. We release our large-scale dataset for further research.