Since the seminal works of Strassen and Valiant it has been a central theme in algebraic complexity theory to understand the relative complexity of algebraic problems, that is, to understand which algebraic problems (be it bilinear maps like matrix multiplication in Strassen's work, or the determinant and permanent polynomials in Valiant's) can be reduced to each other (under the appropriate notion of reduction). In this paper we determine precisely how many independent scalar multiplications can be reduced to a given bilinear map (this number is called the subrank, and extends the concept of matrix diagonalization to tensors), for essentially all (i.e. generic) bilinear maps. Namely, we prove for a generic bilinear map $T : V \times V \to V$ where $\dim(V) = n$ that $\theta(\sqrt{n})$ independent scalar multiplications can be reduced to $T$. Our result significantly improves on the previous upper bound from the work of Strassen (1991) and B\"urgisser (1990) which was $n^{2/3 + o(1)}$. Our full result is much more general and applies not only to bilinear maps and 3-tensors but also to $k$-tensors, for which we find that the generic subrank is $\theta(n^{1/(k-1)})$. Moreover, as an application we prove that the subrank is not additive under the direct sum. The subrank plays a central role in several areas of complexity theory (matrix multiplication algorithms, barrier results) and combinatorics (e.g., the cap set problem and sunflower problem). As a consequence of our result we obtain several large separations between the subrank and tensor methods that have received much interest recently, notably the slice rank (Tao, 2016), analytic rank (Gowers--Wolf, 2011; Lovett, 2018; Bhrushundi--Harsha--Hatami--Kopparty--Kumar, 2020), geometric rank (Kopparty--Moshkovitz--Zuiddam, 2020), and G-stable rank (Derksen, 2020).
We establish optimal convergence rates up to a log-factor for a class of deep neural networks in a classification setting under a restraint sometimes referred to as the Tsybakov noise condition. We construct classifiers in a general setting where the boundary of the bayes-rule can be approximated well by neural networks. Corresponding rates of convergence are proven with respect to the misclassification error. It is then shown that these rates are optimal in the minimax sense if the boundary satisfies a smoothness condition. Non-optimal convergence rates already exist for this setting. Our main contribution lies in improving existing rates and showing optimality, which was an open problem. Furthermore, we show almost optimal rates under some additional restraints which circumvent the curse of dimensionality. For our analysis we require a condition which gives new insight on the restraint used. In a sense it acts as a requirement for the "correct noise exponent" for a class of functions.
We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In the traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the squared loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training. In experiments, we verify our analyses on the $L_2$-regularized logistic regression.
Recently, there has been a growing interest in efficient numerical algorithms based on tensor networks and low-rank techniques to approximate high-dimensional functions and the numerical solution to high-dimensional PDEs. In this paper, we propose a new tensor rank reduction method that leverages coordinate flows and can greatly increase the efficiency of high-dimensional tensor approximation algorithms. The idea is very simple: given a multivariate function, determine a coordinate transformation so that the function in the new coordinate system has smaller tensor rank. We restrict our analysis to linear coordinate transformations, which give rise to a new class of functions that we refer to as tensor ridge functions. By leveraging coordinate flows and tensor ridge functions, we develop an optimization method based on Riemannian gradient descent for determining a quasi-optimal linear coordinate transformation for tensor rank reduction. The theoretical results we present for rank reduction via linear coordinate transformations can be generalized to larger classes of nonlinear transformations. We demonstrate the effectiveness of the proposed new tensor rank reduction method on prototype function approximation problems, and in computing the numerical solution of the Liouville equation in dimensions three and five.
We study the problems of testing isomorphism of polynomials, algebras, and multilinear forms. Our first main results are average-case algorithms for these problems. For example, we develop an algorithm that takes two cubic forms $f, g\in \mathbb{F}_q[x_1,\dots, x_n]$, and decides whether $f$ and $g$ are isomorphic in time $q^{O(n)}$ for most $f$. This average-case setting has direct practical implications, having been studied in multivariate cryptography since the 1990s. Our second result concerns the complexity of testing equivalence of alternating trilinear forms. This problem is of interest in both mathematics and cryptography. We show that this problem is polynomial-time equivalent to testing equivalence of symmetric trilinear forms, by showing that they are both Tensor Isomorphism-complete (Grochow-Qiao, ITCS, 2021), therefore is equivalent to testing isomorphism of cubic forms over most fields.
Bi-quadratic programming over unit spheres is a fundamental problem in quantum mechanics introduced by pioneer work of Einstein, Schr\"odinger, and others. It has been shown to be NP-hard; so it must be solve by efficient heuristic algorithms such as the block improvement method (BIM). This paper focuses on the maximization of bi-quadratic forms, which leads to a rank-one approximation problem that is equivalent to computing the M-spectral radius and its corresponding eigenvectors. Specifically, we provide a tight upper bound of the M-spectral radius for nonnegative fourth-order partially symmetric (PS) tensors, which can be considered as an approximation of the M-spectral radius. Furthermore, we showed that the proposed upper bound can be obtained more efficiently, if the nonnegative fourth-order PS-tensors is a member of certain monoid semigroups. Furthermore, as an extension of the proposed upper bound, we derive the exact solutions of the M-spectral radius and its corresponding M-eigenvectors for certain classes of fourth-order PS-tensors. Lastly, as an application of the proposed bound, we obtain a practically testable sufficient condition for nonsingular elasticity M-tensors with strong ellipticity condition. We conduct several numerical experiments to demonstrate the utility of the proposed results. The results show that: (a) our proposed method can attain a tight upper bound of the M-spectral radius with little computational burden, and (b) such tight and efficient upper bounds greatly enhance the convergence speed of the BIM-algorithm, allowing it to be applicable for large-scale problems in applications.
Inspired by several delay-bounded mission-critical applications, optimizing the end-to-end reliability of multi-hop networks is an important problem subject to end-to-end delay constraints on the packets. Towards that direction, Automatic Repeat Request (ARQ) based strategies have been recently proposed wherein the problem statement is to distribute a certain total number of ARQs (that capture end-to-end delay) across the nodes such that the end-to-end reliability is optimized. Although such strategies provide a fine control to trade end-to-end delay with end-to-end reliability, their performance degrades in slowly-varying channel conditions. Pointing at this drawback, in this work, we propose a Chase Combing Hybrid ARQ (CC-HARQ) based multi-hop network addressing the problem statement of how to distribute a certain total number of ARQs such that the end-to-end reliability is optimized. Towards solving the problem, first we identify that the objective function of the optimization problem is intractable due to the presence of Marcum-Q functions in it. As a result, we propose an approximation on the objective function and then prove a set of necessary and sufficient conditions on the near-optimal ARQ distribution. Subsequently, we propose a low-complexity algorithm to solve the problem for any network size. We show that CC-HARQ based strategies are particularly appealing in slow-fading channels wherein the existing ARQ strategies fail.
We consider the NP-hard problem of approximating a tensor with binary entries by a rank-one tensor, referred to as rank-one Boolean tensor factorization problem. We formulate this problem, in an extended space of variables, as the problem of minimizing a linear function over a highly structured multilinear set. Leveraging on our prior results regarding the facial structure of multilinear polytopes, we propose novel linear programming relaxations for rank-one Boolean tensor factorization. To analyze the performance of the proposed linear programs, we consider a semi-random corruption model for the input tensor. We first consider the original NP-hard problem and establish necessary and sufficient conditions for the recovery of the ground truth with high probability. Next, we obtain sufficient conditions under which the proposed linear programming relaxations recover the ground truth with high probability. Our theoretical results as well as numerical simulations indicate that certain facets of the multilinear polytope significantly improve the recovery properties of linear programming relaxations for rank-one Boolean tensor factorization.
Citations in science are being studied from several perspectives. On the one hand, there are approaches such as scientometrics and the science of science, which take a more quantitative perspective. In this chapter I briefly review some of the literature on citations, citation distributions and models of citations. These citations feature prominently in another part of the literature which is dealing with research evaluation and the role of metrics and indicators in that process. Here I briefly review part of the discussion in research evaluation. This also touches on the subject of how citations relate to peer review. Finally, I try to integrate the two literatures with the aim of clarifying what I believe the two can learn from each other. The fundamental problem in research evaluation is that research quality is unobservable. This has consequences for conclusions that we can draw from quantitative studies of citations and citation models. The term "indicators" is a relevant concept in this context, which I try to clarify. Causality is important for properly understanding indicators, especially when indicators are used in practice: when we act on indicators, we enter causal territory. Even when an indicator might have been valid, through its very use, the consequences of its use may invalidate it. By combining citation models with proper causal reasoning and acknowledging the fundamental problem about unobservable research quality, we may hope to make progress.
Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower bounds that determine how much data movement is required to perform the Multi-TTM computation in parallel. The crux of the proof relies on analytically solving a constrained, nonlinear optimization problem. We also present a parallel algorithm to perform this computation that organizes the processors into a logical grid with twice as many modes as the input tensor. We show that with correct choices of grid dimensions, the communication cost of the algorithm attains the lower bounds and is therefore communication optimal. Finally, we show that our algorithm can significantly reduce communication compared to the straightforward approach of expressing the computation as a sequence of tensor-times-matrix operations.
Recently it was shown that the so-called guided local Hamiltonian problem -- estimating the smallest eigenvalue of a $k$-local Hamiltonian when provided with a description of a quantum state ('guiding state') that is guaranteed to have substantial overlap with the true groundstate -- is BQP-complete for $k \geq 6$ when the required precision is inverse polynomial in the system size $n$, and remains hard even when the overlap of the guiding state with the groundstate is close to a constant $\left(\frac12 - \Omega\left(\frac{1}{\mathop{poly}(n)}\right)\right)$. We improve upon this result in three ways: by showing that it remains BQP-complete when i) the Hamiltonian is 2-local, ii) the overlap between the guiding state and target eigenstate is as large as $1 - \Omega\left(\frac{1}{\mathop{poly}(n)}\right)$, and iii) when one is interested in estimating energies of excited states, rather than just the groundstate. Interestingly, iii) is only made possible by first showing that ii) holds.