Inversion of the two-dimensional discrete Fourier transform (DFT) typically requires all DFT coefficients to be known. When only band-limited DFT coefficients of a matrix are known, the original matrix can not be uniquely recovered. Using a priori information that the matrix is binary (all elements are either 0 or 1) can overcome the missing high-frequency DFT coefficients and restore uniqueness. We theoretically investigate the smallest pass band that can be applied while still guaranteeing unique recovery of an arbitrary binary matrix. The results depend on the dimensions of the matrix. Uniqueness results are proven for the dimensions $p\times q$, $p\times p$, and $p^\alpha\times p^\alpha$, where $p\neq q$ are primes numbers and $\alpha>1$ an integer. An inversion algorithm is proposed for practically recovering the unique binary matrix. This algorithm is based on integer linear programming methods and significantly outperforms naive implementations. The algorithm efficiently reconstructs $17\times17$ binary matrices using 81 out of the total 289 DFT coefficients.
A kernel of a directed graph is a subset of vertices that is both independent and absorbing (every vertex not in the kernel has an out-neighbour in the kernel). Not all directed graphs contain kernels, and computing a kernel or deciding that none exist is NP-complete even on low-degree planar digraphs. The existing polynomial-time algorithms for this problem all restrict both the undirected structure and the edge orientations of the input: for example, to chordal graphs without bidirectional edges (Pass-Lanneau, Igarashi and Meunier, Discrete Appl Math 2020) or to permutation graphs where each clique has a sink (Abbas and Saoula, 4OR 2005). By contrast, we count the kernels of a fuzzy circular interval graph in polynomial time, regardless of its edge orientations, and return a kernel when one exists. (Fuzzy circular graphs were introduced by Chudnovsky and Seymour in their structure theorem for claw-free graphs.) We also consider kernels on cographs, where we establish NP-hardness in general but linear running times on the subclass of threshold graphs.
In this paper, we study ideals spanned by polynomials or overconvergent series in a Tate algebra. With state-of-the-art algorithms for computing Tate Gr{\"o}bner bases, even if the input is polynomials, the size of the output grows with the required precision, both in terms of the size of the coefficients and the size of the support of the series. We prove that ideals which are spanned by polynomials admit a Tate Gr{\"o}bner basis made of polynomials, and we propose an algorithm, leveraging Mora's weak normal form algorithm, for computing it. As a result, the size of the output of this algorithm grows linearly with the precision. Following the same ideas, we propose an algorithm which computes an overconvergent basis for an ideal spanned by overconvergent series. Finally, we prove the existence of a universal analytic Gr{\"o}bner basis for polynomial ideals in Tate algebras, compatible with all convergence radii.
Let $N$ be the number of triangles in an Erd\H{o}s-R\'enyi graph $\mathcal{G}(n,p)$ on $n$ vertices with edge density $p=d/n,$ where $d>0$ is a fixed constant. It is well known that $N$ weakly converges to the Poisson distribution with mean ${d^3}/{6}$ as $n\rightarrow \infty$. We address the upper tail problem for $N,$ namely, we investigate how fast $k$ must grow, so that the probability of $\{N\ge k\}$ is not well approximated anymore by the tail of the corresponding Poisson variable. Proving that the tail exhibits a sharp phase transition, we essentially show that the upper tail is governed by Poisson behavior only when $k^{1/3} \log k< (\frac{3}{\sqrt{2}})^{2/3} \log n$ (sub-critical regime) as well as pin down the tail behavior when $k^{1/3} \log k> (\frac{3}{\sqrt{2}})^{2/3} \log n$ (super-critical regime). We further prove a structure theorem, showing that the sub-critical upper tail behavior is dictated by the appearance of almost $k$ vertex-disjoint triangles whereas in the supercritical regime, the excess triangles arise from a clique like structure of size approximately $(6k)^{1/3}$. This settles the long-standing upper-tail problem in this case, answering a question of Aldous, complementing a long sequence of works, spanning multiple decades, culminating in (Harel, Moussat, Samotij,'19) which analyzed the problem only in the regime $p\gg \frac{1}{n}.$ The proofs rely on several novel graph theoretical results which could have other applications.
We consider the problem of sparse normal means estimation in a distributed setting with communication constraints. We assume there are $M$ machines, each holding $d$-dimensional observations of a $K$-sparse vector $\mu$ corrupted by additive Gaussian noise. The $M$ machines are connected in a star topology to a fusion center, whose goal is to estimate the vector $\mu$ with a low communication budget. Previous works have shown that to achieve the centralized minimax rate for the $\ell_2$ risk, the total communication must be high - at least linear in the dimension $d$. This phenomenon occurs, however, at very weak signals. We show that at signal-to-noise ratios (SNRs) that are sufficiently high - but not enough for recovery by any individual machine - the support of $\mu$ can be correctly recovered with significantly less communication. Specifically, we present two algorithms for distributed estimation of a sparse mean vector corrupted by either Gaussian or sub-Gaussian noise. We then prove that above certain SNR thresholds, with high probability, these algorithms recover the correct support with total communication that is sublinear in the dimension $d$. Furthermore, the communication decreases exponentially as a function of signal strength. If in addition $KM\ll \tfrac{d}{\log d}$, then with an additional round of sublinear communication, our algorithms achieve the centralized rate for the $\ell_2$ risk. Finally, we present simulations that illustrate the performance of our algorithms in different parameter regimes.
Certifying the positivity of trigonometric polynomials is of first importance for design problems in discrete-time signal processing. It is well known from the Riesz-Fej\'ez spectral factorization theorem that any trigonometric univariate polynomial positive on the unit circle can be decomposed as a Hermitian square with complex coefficients. Here we focus on the case of polynomials with Gaussian integer coefficients, i.e., with real and imaginary parts being integers. We design, analyze and compare, theoretically and practically,three hybrid numeric-symbolic algorithms computing weighted sums of Hermitian squares decompositions for trigonometric univariate polynomials positive on the unit circle with Gaussian coefficients. The numerical steps the first and second algorithm rely on are complex root isolation and semidefinite programming, respectively. An exact sum of Hermitian squares decomposition is obtained thanks to compensation techniques. The third algorithm, also based on complex semidefinite programming, is an adaptation of the rounding and projection algorithm by Peyrl and Parrilo. For all three algorithms, we prove bit complexity and output size estimates that are polynomial in the degree of the input and linear in the maximum bitsize of its coefficients. We compare their performance on randomly chosen benchmarks, and further design a certified finite impulse filter.
This note proposes an algorithm for identifying the poles and residues of a meromorphic function from its noisy values on the imaginary axis. The algorithm uses M\"{o}bius transform and Prony's method in the frequency domain. Numerical results are provided to demonstrate the performance of the algorithm.
The Crouzeix-Raviart triangular finite elements are $\inf$-$\sup$ stable for the Stokes equations for any mesh with at least one interior vertex. This result affirms a {\em conjecture of Crouzeix-Falk} from 1989 for $p=3$. Our proof applies to {\em any odd degree} $p\ge 3$ and hence Crouzeix-Raviart triangular finite elements of degree $p$ in two dimensions and the piecewise polynomials of degree $p-1$ with vanishing integral form a stable Stokes pair {\em for all positive integers} $p$.
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity. As the core component, self-attention module has aroused widespread interests. Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism and some common patterns are observed in visualization. Based on these patterns, a series of efficient transformers are proposed with corresponding sparse attention masks. Besides above empirical results, universal approximability of Transformer-based models is also discovered from a theoretical perspective. However, above understanding and analysis of self-attention is based on a pre-trained model. To rethink the importance analysis in self-attention, we delve into dynamics of attention matrix importance during pre-training. One of surprising results is that the diagonal elements in the attention map are the most unimportant compared with other attention positions and we also provide a proof to show these elements can be removed without damaging the model performance. Furthermore, we propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design further. The extensive experiments verify our interesting findings and illustrate the effect of our proposed algorithm.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.