After nearly two decades of research, the question of a quantum PCP theorem for quantum Constraint Satisfaction Problems (CSPs) remains wide open. As a result, proving QMA-hardness of approximation for ground state energy estimation has remained elusive. Recently, it was shown [Bittel, Gharibian, Kliesch, CCC 2023] that a natural problem involving variational quantum circuits is QCMA-hard to approximate within ratio N^(1-eps) for any eps > 0 and N the input size. Unfortunately, this problem was not related to quantum CSPs, leaving the question of hardness of approximation for quantum CSPs open. In this work, we show that if instead of focusing on ground state energies, one considers computing properties of the ground space, QCMA-hardness of computing ground space properties can be shown. In particular, we show that it is (1) QCMA-complete within ratio N^(1-eps) to approximate the Ground State Connectivity problem (GSCON), and (2) QCMA-hard within the same ratio to estimate the amount of entanglement of a local Hamiltonian's ground state, denoted Ground State Entanglement (GSE). As a bonus, a simplification of our construction yields NP-completeness of approximation for a natural k-SAT reconfiguration problem, to be contrasted with the recent PCP-based PSPACE hardness of approximation results for a different definition of k-SAT reconfiguration [Karthik C.S. and Manurangsi, 2023, and Hirahara, Ohsaka, STOC 2024].
Measurement-based quantum computation (MBQC) offers a fundamentally unique paradigm to design quantum algorithms. Indeed, due to the inherent randomness of quantum measurements, the natural operations in MBQC are not deterministic and unitary, but are rather augmented with probabilistic byproducts. Yet, the main algorithmic use of MBQC so far has been to completely counteract this probabilistic nature in order to simulate unitary computations expressed in the circuit model. In this work, we propose designing MBQC algorithms that embrace this inherent randomness and treat the random byproducts in MBQC as a resource for computation. As a natural application where randomness can be beneficial, we consider generative modeling, a task in machine learning centered around generating complex probability distributions. To address this task, we propose a variational MBQC algorithm equipped with control parameters that allow one to directly adjust the degree of randomness to be admitted in the computation. Our algebraic and numerical findings indicate that this additional randomness can lead to significant gains in expressivity and learning performance for certain generative modeling tasks, respectively. These results highlight the potential advantages in exploiting the inherent randomness of MBQC and motivate further research into MBQC-based algorithms.
Despite outstanding processes in many tasks, Large Language Models (LLMs) still lack accuracy when dealing with highly technical domains. Especially, telecommunications (telco) is a particularly challenging domain due the large amount of lexical, semantic and conceptual peculiarities. Yet, this domain holds many valuable use cases, directly linked to industrial needs. Hence, this paper studies how LLMs can be adapted to the telco domain. It reports our effort to (i) collect a massive corpus of domain-specific data (800M tokens, 80K instructions), (ii) perform adaptation using various methodologies, and (iii) benchmark them against larger generalist models in downstream tasks that require extensive knowledge of telecommunications. Our experiments on Llama-2-7b show that domain-adapted models can challenge the large generalist models. They also suggest that adaptation can be restricted to a unique instruction-tuning step, dicarding the need for any fine-tuning on raw texts beforehand.
We consider estimators obtained by iterates of the conjugate gradient (CG) algorithm applied to the normal equation of prototypical statistical inverse problems. Stopping the CG algorithm early induces regularisation, and optimal convergence rates of prediction and reconstruction error are established in wide generality for an ideal oracle stopping time. Based on this insight, a fully data-driven early stopping rule $\tau$ is constructed, which also attains optimal rates, provided the error in estimating the noise level is not dominant. The error analysis of CG under statistical noise is subtle due to its nonlinear dependence on the observations. We provide an explicit error decomposition and identify two terms in the prediction error, which share important properties of classical bias and variance terms. Together with a continuous interpolation between CG iterates, this paves the way for a comprehensive error analysis of early stopping. In particular, a general oracle-type inequality is proved for the prediction error at $\tau$. For bounding the reconstruction error, a more refined probabilistic analysis, based on concentration of self-normalised Gaussian processes, is developed. The methodology also provides some new insights into early stopping for CG in deterministic inverse problems. A numerical study for standard examples shows good results in practice for early stopping at $\tau$.
We investigate the proof complexity of systems based on positive branching programs, i.e. non-deterministic branching programs (NBPs) where, for any 0-transition between two nodes, there is also a 1-transition. Positive NBPs compute monotone Boolean functions, just like negation-free circuits or formulas, but constitute a positive version of (non-uniform) NL, rather than P or NC1, respectively. The proof complexity of NBPs was investigated in previous work by Buss, Das and Knop, using extension variables to represent the dag-structure, over a language of (non-deterministic) decision trees, yielding the system eLNDT. Our system eLNDT+ is obtained by restricting their systems to a positive syntax, similarly to how the 'monotone sequent calculus' MLK is obtained from the usual sequent calculus LK by restricting to negation-free formulas. Our main result is that eLNDT+ polynomially simulates eLNDT over positive sequents. Our proof method is inspired by a similar result for MLK by Atserias, Galesi and Pudl\'ak, that was recently improved to a bona fide polynomial simulation via works of Je\v{r}\'abek and Buss, Kabanets, Kolokolova and Kouck\'y. Along the way we formalise several properties of counting functions within eLNDT+ by polynomial-size proofs and, as a case study, give explicit polynomial-size poofs of the propositional pigeonhole principle.
We establish a general convergence theory of the Rayleigh--Ritz method and the refined Rayleigh--Ritz method for computing some simple eigenpair $(\lambda_{*},x_{*})$ of a given analytic regular nonlinear eigenvalue problem (NEP). In terms of the deviation $\varepsilon$ of $x_{*}$ from a given subspace $\mathcal{W}$, we establish a priori convergence results on the Ritz value, the Ritz vector and the refined Ritz vector. The results show that, as $\varepsilon\rightarrow 0$, there exists a Ritz value that unconditionally converges to $\lambda_*$ and the corresponding refined Ritz vector does so too but the Ritz vector converges conditionally and it may fail to converge and even may not be unique. We also present an error bound for the approximate eigenvector in terms of the computable residual norm of a given approximate eigenpair, and give lower and upper bounds for the error of the refined Ritz vector and the Ritz vector as well as for that of the corresponding residual norms. These results nontrivially extend some convergence results on these two methods for the linear eigenvalue problem to the NEP. Examples are constructed to illustrate the main results.
We study the behavior of a label propagation algorithm (LPA) on the Erd\H{o}s-R\'enyi random graph $\mathcal{G}(n,p)$. Initially, given a network, each vertex starts with a random label in the interval $[0,1]$. Then, in each round of LPA, every vertex switches its label to the majority label in its neighborhood (including its own label). At the first round, ties are broken towards smaller labels, while at each of the next rounds, ties are broken uniformly at random. The algorithm terminates once all labels stay the same in two consecutive iterations. LPA is successfully used in practice for detecting communities in networks (corresponding to vertex sets with the same label after termination of the algorithm). Perhaps surprisingly, LPA's performance on dense random graphs is hard to analyze, and so far convergence to consensus was known only when $np\ge n^{3/4+\varepsilon}$, where LPA converges in three rounds. By defining an alternative label attribution procedure which converges to the label propagation algorithm after three rounds, a careful multi-stage exposure of the edges allows us to break the $n^{3/4+\varepsilon}$ barrier and show that, when $np \ge n^{5/8+\varepsilon}$, a.a.s.\ the algorithm terminates with a single label. Moreover, we show that, if $np\gg n^{2/3}$, a.a.s.\ this label is the smallest one, whereas if $n^{5/8+\varepsilon}\le np\ll n^{2/3}$, the surviving label is a.a.s.\ not the smallest one. En passant, we show a presumably new monotonicity lemma for Binomial random variables that might be of independent interest.
We propose a novel methodology to solve a key eigenvalue optimization problem which arises in the contractivity analysis of neural ODEs. When looking at contractivity properties of a one layer weight-tied neural ODE $\dot{u}(t)=\sigma(Au(t)+b)$ (with $u,b \in {\mathbb R}^n$, $A$ is a given $n \times n$ matrix, $\sigma : {\mathbb R} \to {\mathbb R}$ denotes an activation function and for a vector $z \in {\mathbb R}^n$, $\sigma(z) \in {\mathbb R}^n$ has to be interpreted entry-wise), we are led to study the logarithmic norm of a set of products of type $D A$, where $D$ is a diagonal matrix such that ${\mathrm{diag}}(D) \in \sigma'({\mathbb R}^n)$. Specifically, given a real number $c$ (usually $c=0$), the problem consists in finding the largest positive interval $\text{I}\subseteq \mathbb [0,\infty)$ such that the logarithmic norm $\mu(DA) \le c$ for all diagonal matrices $D$ with $D_{ii}\in \text{I}$. We propose a two-level nested methodology: an inner level where, for a given $\text{I}$, we compute an optimizer $D^\star(\text{I})$ by a gradient system approach, and an outer level where we tune $\text{I}$ so that the value $c$ is reached by $\mu(D^\star(\text{I})A)$. We extend the proposed two-level approach to the general multilayer, and possibly time-dependent, case $\dot{u}(t) = \sigma( A_k(t) \ldots \sigma ( A_{1}(t) u(t) + b_{1}(t) ) \ldots + b_{k}(t) )$ and we propose several numerical examples to illustrate its behaviour, including its stabilizing performance on a one-layer neural ODE applied to the classification of the MNIST handwritten digits dataset.
We derive a new adaptive leverage score sampling strategy for solving the Column Subset Selection Problem (CSSP). The resulting algorithm, called Adaptive Randomized Pivoting, can be viewed as a randomization of Osinsky's recently proposed deterministic algorithm for CSSP. It guarantees, in expectation, an approximation error that matches the optimal existence result in the Frobenius norm. Although the same guarantee can be achieved with volume sampling, our sampling strategy is much simpler and less expensive. To show the versatility of Adaptive Randomized Pivoting, we apply it to select indices in the Discrete Empirical Interpolation Method, in cross/skeleton approximation of general matrices, and in the Nystroem approximation of symmetric positive semi-definite matrices. In all these cases, the resulting randomized algorithms are new and they enjoy bounds on the expected error that match -- or improve -- the best known deterministic results. A derandomization of the algorithm for the Nystroem approximation results in a new deterministic algorithm with a rather favorable error bound.
The problem of estimating, from a random sample of points, the dimension of a compact subset S of the Euclidean space is considered. The emphasis is put on consistency results in the statistical sense. That is, statements of convergence to the true dimension value when the sample size grows to infinity. Among the many available definitions of dimension, we have focused (on the grounds of its statistical tractability) on three notions: the Minkowski dimension, the correlation dimension and the, perhaps less popular, concept of pointwise dimension. We prove the statistical consistency of some natural estimators of these quantities. Our proofs partially rely on the use of an instrumental estimator formulated in terms of the empirical volume function Vn (r), defined as the Lebesgue measure of the set of points whose distance to the sample is at most r. In particular, we explore the case in which the true volume function V (r) of the target set S is a polynomial on some interval starting at zero. An empirical study is also included. Our study aims to provide some theoretical support, and some practical insights, for the problem of deciding whether or not the set S has a dimension smaller than that of the ambient space. This is a major statistical motivation of the dimension studies, in connection with the so-called Manifold Hypothesis.
Gradient Descent (GD) and Conjugate Gradient (CG) methods are among the most effective iterative algorithms for solving unconstrained optimization problems, particularly in machine learning and statistical modeling, where they are employed to minimize cost functions. In these algorithms, tunable parameters, such as step sizes or conjugate parameters, play a crucial role in determining key performance metrics, like runtime and solution quality. In this work, we introduce a framework that models algorithm selection as a statistical learning problem, and thus learning complexity can be estimated by the pseudo-dimension of the algorithm group. We first propose a new cost measure for unconstrained optimization algorithms, inspired by the concept of primal-dual integral in mixed-integer linear programming. Based on the new cost measure, we derive an improved upper bound for the pseudo-dimension of gradient descent algorithm group by discretizing the set of step size configurations. Moreover, we generalize our findings from gradient descent algorithm to the conjugate gradient algorithm group for the first time, and prove the existence a learning algorithm capable of probabilistically identifying the optimal algorithm with a sufficiently large sample size.