We study variants of the mean problem under the $p$-Dynamic Time Warping ($p$-DTW) distance, a popular and robust distance measure for sequential data. In our setting we are given a set of finite point sequences over an arbitrary metric space and we want to compute a mean point sequence of given length that minimizes the sum of $p$-DTW distances, each raised to the $q$th power, between the input sequences and the mean sequence. In general, the problem is $\mathrm{NP}$-hard and known not to be fixed-parameter tractable in the number of sequences. We show that it is even hard to approximate within any constant factor unless $\mathrm{P} = \mathrm{NP}$ and moreover if there exists a $\delta>0$ such that there is a $(\log n)^{\delta}$-approximation algorithm for DTW mean then $\mathrm{NP} \subseteq \mathrm{QP}$. On the positive side, we show that restricting the length of the mean sequence significantly reduces the hardness of the problem. We give an exact algorithm running in polynomial time for constant-length means. We explore various approximation algorithms that provide a trade-off between the approximation factor and the running time. Our approximation algorithms have a running time with only linear dependency on the number of input sequences. In addition, we use our mean algorithms to obtain clustering algorithms with theoretical guarantees.
Quantum Annealing (QA) is a computational framework where a quantum system's continuous evolution is used to find the global minimum of an objective function over an unstructured search space. It can be seen as a general metaheuristic for optimization problems, including NP-hard ones if we allow an exponentially large running time. While QA is widely studied from a heuristic point of view, little is known about theoretical guarantees on the quality of the solutions obtained in polynomial time. In this paper we use a technique borrowed from theoretical physics, the Lieb-Robinson (LR) bound, and develop new tools proving that short, constant time quantum annealing guarantees constant factor approximations ratios for some optimization problems when restricted to bounded degree graphs. Informally, on bounded degree graphs the LR bound allows us to retrieve a (relaxed) locality argument, through which the approximation ratio can be deduced by studying subgraphs of bounded radius. We illustrate our tools on problems MaxCut and Maximum Independent Set for cubic graphs, providing explicit approximation ratios and the runtimes needed to obtain them. Our results are of similar flavor to the well-known ones obtained in the different but related QAOA (quantum optimization algorithms) framework. Eventually, we discuss theoretical and experimental arguments for further improvements.
This paper considers synchronous discrete-time dynamical systems on graphs based on the threshold model. It is well known that after a finite number of rounds these systems either reach a fixed point or enter a 2-cycle. The problem of finding the fixed points for this type of dynamical system is in general both NP-hard and #P-complete. In this paper we give a surprisingly simple graph-theoretic characterization of fixed points and 2-cycles for the class of finite trees. Thus, the class of trees is the first nontrivial graph class for which a complete characterization of fixed points exists. This characterization enables us to provide bounds for the total number of fixed points and pure 2-cycles. It also leads to an output-sensitive algorithm to efficiently generate these states.
For $R\triangleq Mat_{m}(\mathbb{F})$, the ring of all the $m\times m$ matrices over the finite field $\mathbb{F}$ with $|\mathbb{F}|=q$, and the left $R$-module $A\triangleq Mat_{m,k}(\mathbb{F})$ with $m+1\leqslant k$, by deriving the minimal length of solutions of the related isometry equation, Dyshko has proved in \cite{3,4} that the minimal code length $n$ for $A^{n}$ not to satisfy the MacWilliams extension property with respect to Hamming weight is equal to $\prod_{i=1}^{m}(q^{i}+1)$. In this paper, using the M\"{o}bius functions, we derive the minimal length of nontrivial solutions of the isometry equation with respect to a finite lattice. For the finite vector space $\mathbf{H}\triangleq\prod_{i\in\Omega}\mathbb{F}^{k_{i}}$, a poset $\mathbf{P}=(\Omega,\preccurlyeq_{\mathbf{P}})$ and a map $\omega:\Omega\longrightarrow\mathbb{R}^{+}$ give rise to the $(\mathbf{P},\omega)$-weight on $\mathbf{H}$, which has been proposed by Hyun, Kim and Park in \cite{18}. For such a weight, we study the relations between the MacWilliams extension property and other properties including admitting MacWilliams identity, Fourier-reflexivity of involved partitions and Unique Decomposition Property defined for $(\mathbf{P},\omega)$. We give necessary and sufficient conditions for $\mathbf{H}$ to satisfy the MacWilliams extension property with the additional assumption that either $\mathbf{P}$ is hierarchical or $\omega$ is identically $1$, i.e., $(\mathbf{P},\omega)$-weight coincides with $\mathbf{P}$-weight, which further allow us to partly answer a conjecture proposed by Machado and Firer in \cite{22}.
In this work, we study a random orthogonal projection based least squares estimator for the stable solution of a multivariate nonparametric regression (MNPR) problem. More precisely, given an integer $d\geq 1$ corresponding to the dimension of the MNPR problem, a positive integer $N\geq 1$ and a real parameter $\alpha\geq -\frac{1}{2},$ we show that a fairly large class of $d-$variate regression functions are well and stably approximated by its random projection over the orthonormal set of tensor product $d-$variate Jacobi polynomials with parameters $(\alpha,\alpha).$ The associated uni-variate Jacobi polynomials have degree at most $N$ and their tensor products are orthonormal over $\mathcal U=[0,1]^d,$ with respect to the associated multivariate Jacobi weights. In particular, if we consider $n$ random sampling points $\mathbf X_i$ following the $d-$variate Beta distribution, with parameters $(\alpha+1,\alpha+1),$ then we give a relation involving $n, N, \alpha$ to ensure that the resulting $(N+1)^d\times (N+1)^d$ random projection matrix is well conditioned. Moreover, we provide squared integrated as well as $L^2-$risk errors of this estimator. Precise estimates of these errors are given in the case where the regression function belongs to an isotropic Sobolev space $H^s(I^d),$ with $s> \frac{d}{2}.$ Also, to handle the general and practical case of an unknown distribution of the $\mathbf X_i,$ we use Shepard's scattered interpolation scheme in order to generate fairly precise approximations of the observed data at $n$ i.i.d. sampling points $\mathbf X_i$ following a $d-$variate Beta distribution. Finally, we illustrate the performance of our proposed multivariate nonparametric estimator by some numerical simulations with synthetic as well as real data.
This paper studies Quasi Maximum Likelihood estimation of Dynamic Factor Models for large panels of time series. Specifically, we consider the case in which the autocorrelation of the factors is explicitly accounted for, and therefore the model has a state-space form. Estimation of the factors and their loadings is implemented through the Expectation Maximization (EM) algorithm, jointly with the Kalman smoother.~We prove that as both the dimension of the panel $n$ and the sample size $T$ diverge to infinity, up to logarithmic terms: (i) the estimated loadings are $\sqrt T$-consistent and asymptotically normal if $\sqrt T/n\to 0$; (ii) the estimated factors are $\sqrt n$-consistent and asymptotically normal if $\sqrt n/T\to 0$; (iii) the estimated common component is $\min(\sqrt n,\sqrt T)$-consistent and asymptotically normal regardless of the relative rate of divergence of $n$ and $T$. Although the model is estimated as if the idiosyncratic terms were cross-sectionally and serially uncorrelated and normally distributed, we show that these mis-specifications do not affect consistency. Moreover, the estimated loadings are asymptotically as efficient as those obtained with the Principal Components estimator, while the estimated factors are more efficient if the idiosyncratic covariance is sparse enough.~We then propose robust estimators of the asymptotic covariances, which can be used to conduct inference on the loadings and to compute confidence intervals for the factors and common components. Finally, we study the performance of our estimators and we compare them with the traditional Principal Components approach through MonteCarlo simulations and analysis of US macroeconomic data.
A long line of research on fixed parameter tractability of integer programming culminated with showing that integer programs with n variables and a constraint matrix with dual tree-depth d and largest entry D are solvable in time g(d,D)poly(n) for some function g. However, the dual tree-depth of a constraint matrix is not preserved by row operations, i.e., a given integer program can be equivalent to another with a smaller dual tree-depth, and thus does not reflect its geometric structure. We prove that the minimum dual tree-depth of a row-equivalent matrix is equal to the branch-depth of the matroid defined by the columns of the matrix. We design a fixed parameter algorithm for computing branch-depth of matroids represented over a finite field and a fixed parameter algorithm for computing a row-equivalent matrix with minimum dual tree-depth. Finally, we use these results to obtain an algorithm for integer programming running in time g(d*,D)poly(n) where d* is the branch-depth of the constraint matrix; the branch-depth cannot be replaced by the more permissive notion of branch-width.
We introduce Monte-Carlo Attention (MCA), a randomized approximation method for reducing the computational cost of self-attention mechanisms in Transformer architectures. MCA exploits the fact that the importance of each token in an input sequence varies with respect to their attention scores; thus, some degree of error can be tolerable when encoding tokens with low attention. Using approximate matrix multiplication, MCA applies different error bounds to encode input tokens such that those with low attention scores are computed with relaxed precision, whereas errors of salient elements are minimized. MCA can operate in parallel with other attention optimization schemes and does not require model modification. We study the theoretical error bounds and demonstrate that MCA reduces attention complexity (in FLOPS) for various Transformer models by up to 11$\times$ in GLUE benchmarks without compromising model accuracy.
The problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments to time-series in general dimension. For $\ell_p$-products of Euclidean metrics, for any $p$, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fr\'echet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms, our algorithm is especially efficient when the length of the curves is bounded.
We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.