The arithmetic crosscorrelation of binary $m$-sequences with coprime periods $2^{n_1}-1$ and $2^{n_2}-1$\ ($\gcd(n_1,n_2)=1$) is determined. The result shows that the absolute value of arithmetic crosscorrelation of such binary $m$-sequences is not greater than $2^{\min(n_1,n_2)}-1$.
Quantifying the difference between two probability density functions, $p$ and $q$, using available data, is a fundamental problem in Statistics and Machine Learning. A usual approach for addressing this problem is the likelihood-ratio estimation (LRE) between $p$ and $q$, which -- to our best knowledge -- has been investigated mainly for the offline case. This paper contributes by introducing a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t \sim p, x'_t \sim q)$ are observed over time. The non-parametric nature of our approach has the advantage of being agnostic to the forms of $p$ and $q$. Moreover, we capitalize on the recent advances in Kernel Methods and functional minimization to develop an estimator that can be efficiently updated online. We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
When the unknown regression function of a single variable is known to have derivatives up to the $(\gamma+1)$th order bounded in absolute values by a common constant everywhere or a.e. (i.e., $(\gamma+1)$th degree of smoothness), the minimax optimal rate of the mean integrated squared error (MISE) is stated as $\left(\frac{1}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$ in the literature. This paper shows that: (i) if $n\leq\left(\gamma+1\right)^{2\gamma+3}$, the minimax optimal MISE rate is $\frac{\log n}{n\log(\log n)}$ and the optimal degree of smoothness to exploit is roughly $\max\left\{ \left\lfloor \frac{\log n}{2\log\left(\log n\right)}\right\rfloor ,\,1\right\} $; (ii) if $n>\left(\gamma+1\right)^{2\gamma+3}$, the minimax optimal MISE rate is $\left(\frac{1}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$ and the optimal degree of smoothness to exploit is $\gamma+1$. The fundamental contribution of this paper is a set of metric entropy bounds we develop for smooth function classes. Some of our bounds are original, and some of them improve and/or generalize the ones in the literature (e.g., Kolmogorov and Tikhomirov, 1959). Our metric entropy bounds allow us to show phase transitions in the minimax optimal MISE rates associated with some commonly seen smoothness classes as well as non-standard smoothness classes, and can also be of independent interest outside the nonparametric regression problems.
The hierarchical matrix ($\mathcal{H}^{2}$-matrix) formalism provides a way to reinterpret the Fast Multipole Method and related fast summation schemes in linear algebraic terms. The idea is to tessellate a matrix into blocks in such as way that each block is either small or of numerically low rank; this enables the storage of the matrix and the application of it to a vector in linear or close to linear complexity. A key motivation for the reformulation is to extend the range of dense matrices that can be represented. Additionally, $\mathcal{H}^{2}$-matrices in principle also extend the range of operations that can be executed to include matrix inversion and factorization. While such algorithms can be highly efficient for certain specialized formats (such as HBS/HSS matrices based on ``weak admissibility''), inversion algorithms for general $\mathcal{H}^{2}$-matrices tend to be based on nested recursions and recompressions, making them challenging to implement efficiently. An exception is the \textit{strong recursive skeletonization (SRS)} algorithm by Minden, Ho, Damle, and Ying, which involves a simpler algorithmic flow. However, SRS greatly increases the number of blocks of the matrix that need to be stored explicitly, leading to high memory requirements. This manuscript presents the \textit{randomized strong recursive skeletonization (RSRS)} algorithm, which is a reformulation of SRS that incorporates the randomized SVD (RSVD) to simultaneously compress and factorize an $\mathcal{H}^{2}$-matrix. RSRS is a ``black box'' algorithm that interacts with the matrix to be compressed only via its action on vectors; this extends the range of the SRS algorithm (which relied on the ``proxy source'' compression technique) to include dense matrices that arise in sparse direct solvers.
We propose a new method for the construction of layer-adapted meshes for singularly perturbed differential equations (SPDEs), based on mesh partial differential equations (MPDEs) that incorporate \emph{a posteriori} solution information. There are numerous studies on the development of parameter robust numerical methods for SPDEs that depend on the layer-adapted mesh of Bakhvalov. In~\citep{HiMa2021}, a novel MPDE-based approach for constructing a generalisation of these meshes was proposed. Like with most layer-adapted mesh methods, the algorithms in that article depended on detailed derivations of \emph{a priori} bounds on the SPDE's solution and its derivatives. In this work we extend that approach so that it instead uses \emph{a posteriori} computed estimates of the solution. We present detailed algorithms for the efficient implementation of the method, and numerical results for the robust solution of two-parameter reaction-convection-diffusion problems, in one and two dimensions. We also provide full FEniCS code for a one-dimensional example.
We propose to approximate a (possibly discontinuous) multivariate function f (x) on a compact set by the partial minimizer arg miny p(x, y) of an appropriate polynomial p whose construction can be cast in a univariate sum of squares (SOS) framework, resulting in a highly structured convex semidefinite program. In a number of non-trivial cases (e.g. when f is a piecewise polynomial) we prove that the approximation is exact with a low-degree polynomial p. Our approach has three distinguishing features: (i) It is mesh-free and does not require the knowledge of the discontinuity locations. (ii) It is model-free in the sense that we only assume that the function to be approximated is available through samples (point evaluations). (iii) The size of the semidefinite program is independent of the ambient dimension and depends linearly on the number of samples. We also analyze the sample complexity of the approach, proving a generalization error bound in a probabilistic setting. This allows for a comparison with machine learning approaches.
Given a vector dataset $\mathcal{X}$ and a query vector $\vec{x}_q$, graph-based Approximate Nearest Neighbor Search (ANNS) aims to build a graph index $G$ and approximately return vectors with minimum distances to $\vec{x}_q$ by searching over $G$. The main drawback of graph-based ANNS is that a graph index would be too large to fit into the memory especially for a large-scale $\mathcal{X}$. To solve this, a Product Quantization (PQ)-based hybrid method called DiskANN is proposed to store a low-dimensional PQ index in memory and retain a graph index in SSD, thus reducing memory overhead while ensuring a high search accuracy. However, it suffers from two I/O issues that significantly affect the overall efficiency: (1) long routing path from an entry vertex to the query's neighborhood that results in large number of I/O requests and (2) redundant I/O requests during the routing process. We propose an optimized DiskANN++ to overcome above issues. Specifically, for the first issue, we present a query-sensitive entry vertex selection strategy to replace DiskANN's static graph-central entry vertex by a dynamically determined entry vertex that is close to the query. For the second I/O issue, we present an isomorphic mapping on DiskANN's graph index to optimize the SSD layout and propose an asynchronously optimized Pagesearch based on the optimized SSD layout as an alternative to DiskANN's beamsearch. Comprehensive experimental studies on eight real-world datasets demonstrate our DiskANN++'s superiority on efficiency. We achieve a notable 1.5 X to 2.2 X improvement on QPS compared to DiskANN, given the same accuracy constraint.
Two Latin squares of order $n$ are $r$-orthogonal if, when superimposed, there are exactly $r$ distinct ordered pairs. The spectrum of all values of $r$ for Latin squares of order $n$ is known. A Latin square $A$ of order $n$ is $r$-self-orthogonal if $A$ and its transpose are $r$-orthogonal. The spectrum of all values of $r$ is known for all orders $n\ne 14$. We develop randomized algorithms for computing pairs of $r$-orthogonal Latin squares of order $n$ and algorithms for computing $r$-self-orthogonal Latin squares of order $n$.
Power functions with low $c$-differential uniformity have been widely studied not only because of their strong resistance to multiplicative differential attacks, but also low implementation cost in hardware. Furthermore, the $c$-differential spectrum of a function gives a more precise characterization of its $c$-differential properties. Let $f(x)=x^{\frac{p^n+3}{2}}$ be a power function over the finite field $\mathbb{F}_{p^{n}}$, where $p\neq3$ is an odd prime and $n$ is a positive integer. In this paper, for all primes $p\neq3$, by investigating certain character sums with regard to elliptic curves and computing the number of solutions of a system of equations over $\mathbb{F}_{p^{n}}$, we determine explicitly the $(-1)$-differential spectrum of $f$ with a unified approach. We show that if $p^n \equiv 3 \pmod 4$, then $f$ is a differentially $(-1,3)$-uniform function except for $p^n\in\{7,19,23\}$ where $f$ is an APcN function, and if $p^n \equiv 1 \pmod 4$, the $(-1)$-differential uniformity of $f$ is equal to $4$. In addition, an upper bound of the $c$-differential uniformity of $f$ is also given.
We consider the Low Rank Approximation problem, where the input consists of a matrix $A \in \mathbb{R}^{n_R \times n_C}$ and an integer $k$, and the goal is to find a matrix $B$ of rank at most $k$ that minimizes $\| A - B \|_0$, which is the number of entries where $A$ and $B$ differ. For any constant $k$ and $\varepsilon > 0$, we present a polynomial time $(1 + \varepsilon)$-approximation time for this problem, which significantly improves the previous best $poly(k)$-approximation. Our algorithm is obtained by viewing the problem as a Constraint Satisfaction Problem (CSP) where each row and column becomes a variable that can have a value from $\mathbb{R}^k$. In this view, we have a constraint between each row and column, which results in a {\em dense} CSP, a well-studied topic in approximation algorithms. While most of previous algorithms focus on finite-size (or constant-size) domains and involve an exhaustive enumeration over the entire domain, we present a new framework that bypasses such an enumeration in $\mathbb{R}^k$. We also use tools from the rich literature of Low Rank Approximation in different objectives (e.g., $\ell_p$ with $p \in (0, \infty)$) or domains (e.g., finite fields/generalized Boolean). We believe that our techniques might be useful to study other real-valued CSPs and matrix optimization problems. On the hardness side, when $k$ is part of the input, we prove that Low Rank Approximation is NP-hard to approximate within a factor of $\Omega(\log n)$. This is the first superconstant NP-hardness of approximation for any $p \in [0, \infty]$ that does not rely on stronger conjectures (e.g., the Small Set Expansion Hypothesis).
We propose a new variable selection procedure for a functional linear model with multiple scalar responses and multiple functional predictors. This method is based on basis expansions of the involved functional predictors and coefficients that lead to a multivariate linear regression model. Then a criterion by means of which the variable selection problem reduces to that of estimating a suitable set is introduced. Estimation of this set is achieved by using appropriate penalizations of estimates of this criterion, so leading to our proposal. A simulation study that permits to investigate the effectiveness of the proposed approach and to compare it with existing methods is given.