We study the problem of estimating the score function of an unknown probability distribution $\rho^*$ from $n$ independent and identically distributed observations in $d$ dimensions. Assuming that $\rho^*$ is subgaussian and has a Lipschitz-continuous score function $s^*$, we establish the optimal rate of $\tilde \Theta(n^{-\frac{2}{d+4}})$ for this estimation problem under the loss function $\|\hat s - s^*\|^2_{L^2(\rho^*)}$ that is commonly used in the score matching literature, highlighting the curse of dimensionality where sample complexity for accurate score estimation grows exponentially with the dimension $d$. Leveraging key insights in empirical Bayes theory as well as a new convergence rate of smoothed empirical distribution in Hellinger distance, we show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound. We also discuss the implication of our theory on the sample complexity of score-based generative models.
Recently an algorithm was given in [Garde & Hyv\"onen, SIAM J. Math. Anal., 2024] for exact direct reconstruction of any $L^2$ perturbation from linearised data in the two-dimensional linearised Calder\'on problem. It was a simple forward substitution method based on a 2D Zernike basis. We now consider the three-dimensional linearised Calder\'on problem in a ball, and use a 3D Zernike basis to obtain a method for exact direct reconstruction of any $L^3$ perturbation from linearised data. The method is likewise a forward substitution, hence making it very efficient to numerically implement. Moreover, the 3D method only makes use of a relatively small subset of boundary measurements for exact reconstruction, compared to a full $L^2$ basis of current densities.
Neural integral equations are deep learning models based on the theory of integral equations, where the model consists of an integral operator and the corresponding equation (of the second kind) which is learned through an optimization procedure. This approach allows to leverage the nonlocal properties of integral operators in machine learning, but it is computationally expensive. In this article, we introduce a framework for neural integral equations based on spectral methods that allows us to learn an operator in the spectral domain, resulting in a cheaper computational cost, as well as in high interpolation accuracy. We study the properties of our methods and show various theoretical guarantees regarding the approximation capabilities of the model, and convergence to solutions of the numerical methods. We provide numerical experiments to demonstrate the practical effectiveness of the resulting model.
We study differentially private (DP) estimation of a rank-$r$ matrix $M \in \RR^{d_1\times d_2}$ under the trace regression model with Gaussian measurement matrices. Theoretically, the sensitivity of non-private spectral initialization is precisely characterized, and the differential-privacy-constrained minimax lower bound for estimating $M$ under the Schatten-$q$ norm is established. Methodologically, the paper introduces a computationally efficient algorithm for DP-initialization with a sample size of $n \geq \wt O (r^2 (d_1\vee d_2))$. Under certain regularity conditions, the DP-initialization falls within a local ball surrounding $M$. We also propose a differentially private algorithm for estimating $M$ based on Riemannian optimization (DP-RGrad), which achieves a near-optimal convergence rate with the DP-initialization and sample size of $n \geq \wt O(r (d_1 + d_2))$. Finally, the paper discusses the non-trivial gap between the minimax lower bound and the upper bound of low-rank matrix estimation under the trace regression model. It is shown that the estimator given by DP-RGrad attains the optimal convergence rate in a weaker notion of differential privacy. Our powerful technique for analyzing the sensitivity of initialization requires no eigengap condition between $r$ non-zero singular values.
The question of characterizing the (finite) representable relation algebras in a ``nice" way is open. The class $\mathbf{RRA}$ is known to be not finitely axiomatizable in first-order logic. Nevertheless, it is conjectured that ``almost all'' finite relation algebras are representable. All finite relation algebras with three or fewer atoms are representable. So one may ask, Over what cardinalities of sets are they representable? This question was answered completely by Andr\'eka and Maddux (``Representations for small relation algebras,'' \emph{Notre Dame J. Form. Log.}, \textbf{35} (1994)); they determine the spectrum of every finite relation algebra with three or fewer atoms. In the present paper, we restrict attention to cyclic group representations, and completely determine the cyclic group spectrum for all seven symmetric integral relation algebras on three atoms. We find that in some instances, the spectrum and cyclic spectrum agree; in other instances, the spectra disagree for finitely many $n$; finally, for other instances, the spectra disagree for infinitely many $n$. The proofs employ constructions, SAT solvers, and the probabilistic method.
We study the complexity of constructing an optimal parsing $\varphi$ of a string ${\bf s} = s_1 \dots s_n$ under the constraint that given a position $p$ in the original text, and the LZ76-like (Lempel Ziv 76) encoding of $T$ based on $\varphi$, it is possible to identify/decompress the character $s_p$ by performing at most $c$ accesses to the LZ encoding, for a given integer $c.$ We refer to such a parsing $\varphi$ as a $c$-bounded access LZ parsing or $c$-BLZ parsing of ${\bf s}.$ We show that for any constant $c$ the problem of computing the optimal $c$-BLZ parsing of a string, i.e., the one with the minimum number of phrases, is NP-hard and also APX hard, i.e., no PTAS can exist under the standard complexity assumption $P \neq NP.$ We also study the ratio between the sizes of an optimal $c$-BLZ parsing of a string ${\bf s}$ and an optimal LZ76 parsing of ${\bf s}$ (which can be greedily computed in polynomial time).
In 2017, Aharoni proposed the following generalization of the Caccetta-H\"{a}ggkvist conjecture: if $G$ is a simple $n$-vertex edge-colored graph with $n$ color classes of size at least $r$, then $G$ contains a rainbow cycle of length at most $\lceil n/r \rceil$. In this paper, we prove that, for fixed $r$, Aharoni's conjecture holds up to an additive constant. Specifically, we show that for each fixed $r \geq 1$, there exists a constant $c_r$ such that if $G$ is a simple $n$-vertex edge-colored graph with $n$ color classes of size at least $r$, then $G$ contains a rainbow cycle of length at most $n/r + c_r$.
Given a simple $n$-vertex, $m$-edge graph $G$ undergoing edge insertions and deletions, we give two new fully dynamic algorithms for exactly maintaining the edge connectivity of $G$ in $\tilde{O}(n)$ worst-case update time and $\tilde{O}(m^{1-1/31})$ amortized update time, respectively. Prior to our work, all dynamic edge connectivity algorithms either assumed bounded edge connectivity, guaranteed approximate solutions, or were restricted to edge insertions only. Our results provide an affirmative answer to an open question posed by Thorup [Combinatorica'07].
We study a general factor analysis framework where the $n$-by-$p$ data matrix is assumed to follow a general exponential family distribution entry-wise. While this model framework has been proposed before, we here further relax its distributional assumption by using a quasi-likelihood setup. By parameterizing the mean-variance relationship on data entries, we additionally introduce a dispersion parameter and entry-wise weights to model large variations and missing values. The resulting model is thus not only robust to distribution misspecification but also more flexible and able to capture non-Gaussian covariance structures of the data matrix. Our main focus is on efficient computational approaches to perform the factor analysis. Previous modeling frameworks rely on simulated maximum likelihood (SML) to find the factorization solution, but this method was shown to lead to asymptotic bias when the simulated sample size grows slower than the square root of the sample size $n$, eliminating its practical application for data matrices with large $n$. Borrowing from expectation-maximization (EM) and stochastic gradient descent (SGD), we investigate three estimation procedures based on iterative factorization updates. Our proposed solution does not show asymptotic biases, and scales even better for large matrix factorizations with error $O(1/p)$. To support our findings, we conduct simulation experiments and discuss its application in three case studies.
We study finding and listing $k$-cliques in a graph, for constant $k\geq 3$, a fundamental problem of both theoretical and practical importance. Our main contribution is a new output-sensitive algorithm for listing $k$-cliques in graphs, for arbitrary $k\geq 3$, coupled with lower bounds based on standard fine-grained assumptions, showing that our algorithm's running time is tight. Previously, the only known conditionally optimal output-sensitive algorithms were for the case of $3$-cliques by Bj\"{o}rklund, Pagh, Vassilevska W. and Zwick [ICALP'14]. Typical inputs to subgraph isomorphism or listing problems are measured by the number of nodes $n$ or the number of edges $m$. Our framework is very general in that it gives $k$-clique listing algorithms whose running times are measured in terms of the number of $\ell$-cliques $\Delta_\ell$ in the graph for any $1\leq \ell<k$. This generalizes the typical parameterization in terms of $n$ (the number of $1$-cliques) and $m$ (the number of $2$-cliques). If the matrix multiplication exponent $\omega$ is $2$, and if the size of the output, $\Delta_k$, is sufficiently large, then for every $\ell<k$, the running time of our algorithm for listing $k$-cliques is $$\tilde{O}\left(\Delta_\ell^{\frac{2}{\ell (k - \ell)}}\Delta_k^{1-\frac{2}{k(k-\ell)}}\right).$$ For sufficiently large $\Delta_k$, we prove that this runtime is in fact {\em optimal} for all $1 \leq \ell < k$ under the Exact $k$-Clique hypothesis. In the special cases of $k = 4$ and $5$, our algorithm in terms of $n$ is conditionally optimal for all values of $\Delta_k$ if $\omega = 2$. Moreover, our framework is powerful enough to provide an improvement upon the 19-year old runtimes for $4$ and $5$-clique detection in $m$-edge graphs, as a function of $m$ [Eisenbrand and Grandoni, TCS'04].
For any positive integer $q\geq 2$ and any real number $\delta\in(0,1)$, let $\alpha_q(n,\delta n)$ denote the maximum size of a subset of $\mathbb{Z}_q^n$ with minimum Hamming distance at least $\delta n$, where $\mathbb{Z}_q=\{0,1,\dotsc,q-1\}$ and $n\in\mathbb{N}$. The asymptotic rate function is defined by $ R_q(\delta) = \limsup_{n\rightarrow\infty}\frac{1}{n}\log_q\alpha_q(n,\delta n).$ The famous $q$-ary asymptotic Gilbert-Varshamov bound, obtained in the 1950s, states that \[ R_q(\delta) \geq 1 - \delta\log_q(q-1)-\delta\log_q\frac{1}{\delta}-(1-\delta)\log_q\frac{1}{1-\delta} \stackrel{\mathrm{def}}{=}R_\mathrm{GV}(\delta,q) \] for all positive integers $q\geq 2$ and $0<\delta<1-q^{-1}$. In the case that $q$ is an even power of a prime with $q\geq 49$, the $q$-ary Gilbert-Varshamov bound was firstly improved by using algebraic geometry codes in the works of Tsfasman, Vladut, and Zink and of Ihara in the 1980s. These algebraic geometry codes have been modified to improve the $q$-ary Gilbert-Varshamov bound $R_\mathrm{GV}(\delta,q)$ at a specific tangent point $\delta=\delta_0\in (0,1)$ of the curve $R_\mathrm{GV}(\delta,q)$ for each given integer $q\geq 46$. However, the $q$-ary Gilbert-Varshamov bound $R_\mathrm{GV}(\delta,q)$ at $\delta=1/2$, i.e., $R_\mathrm{GV}(1/2,q)$, remains the largest known lower bound of $R_q(1/2)$ for infinitely many positive integers $q$ which is a generic prime and which is a generic non-prime-power integer. In this paper, by using codes from geometry of numbers introduced by Lenstra in the 1980s, we prove that the $q$-ary Gilbert-Varshamov bound $R_\mathrm{GV}(\delta,q)$ with $\delta\in(0,1)$ can be improved for all but finitely many positive integers $q$. It is shown that the growth defined by $\eta(\delta)= \liminf_{q\rightarrow\infty}\frac{1}{\log q}\log[1-\delta-R_q(\delta)]^{-1}$ for every $\delta\in(0,1)$ has actually a nontrivial lower bound.