A monotone Boolean circuit is composed of OR gates, AND gates and input gates corresponding to the input variables and the Boolean constants. It is $q$-multilinear if for each its output gate $o$ and for each prime implicant $s$ of the function computed at $o$, the arithmetic version of the circuit resulting from the replacement of OR and AND gates by addition and multiplication gates, respectively, computes a polynomial at $o$ which contains a monomial including the same variables as $s$ and each of the variables in $s$ has degree at most $q$ in the monomial. First, we study the complexity of computing semi-disjoint bilinear Boolean forms in terms of the size of monotone $q$-multilinear Boolean circuits. In particular, we show that any monotone $1$-multilinear Boolean circuit computing a semi-disjoint Boolean form with $p$ prime implicants includes at least $p$ AND gates. We also show that any monotone $q$-multilinear Boolean circuit computing a semi-disjoint Boolean form with $p$ prime implicants has $\Omega(\frac p {q^4})$ size. Next, we study the complexity of the monotone Boolean function $Isol_{k,n}$ that verifies if a $k$-dimensional Boolean matrix has at least one $1$ in each line (e.g., each row and column when $k=2$), in terms of monotone $q$-multilinear Boolean circuits. We show that that any $\Sigma_3$ monotone Boolean circuit for $Isol_{k,n}$ has an exponential in $n$ size or it is not $(k-1)$-multilinear.
Ferrers diagram rank-metric codes were introduced by Etzion and Silberstein in 2009. In their work, they proposed a conjecture on the largest dimension of a space of matrices over a finite field whose nonzero elements are supported on a given Ferrers diagram and all have rank lower bounded by a fixed positive integer $d$. Since stated, the Etzion-Silberstein conjecture has been verified in a number of cases, often requiring additional constraints on the field size or on the minimum rank $d$ in dependence of the corresponding Ferrers diagram. As of today, this conjecture still remains widely open. Using modular methods, we give a constructive proof of the Etzion-Silberstein conjecture for the class of strictly monotone Ferrers diagrams, which does not depend on the minimum rank $d$ and holds over every finite field. In addition, we leverage on the last result to also prove the conjecture for the class of MDS-constructible Ferrers diagrams, without requiring any restriction on the field size.
We propose a sampling algorithm that achieves superior complexity bounds in all the classical settings (strongly log-concave, log-concave, Logarithmic-Sobolev inequality (LSI), Poincar\'e inequality) as well as more general settings with semi-smooth or composite potentials. Our algorithm is based on the proximal sampler introduced in~\citet{lee2021structured}. The performance of this proximal sampler is determined by that of the restricted Gaussian oracle (RGO), a key step in the proximal sampler. The main contribution of this work is an inexact realization of RGO based on approximate rejection sampling. To bound the inexactness of RGO, we establish a new concentration inequality for semi-smooth functions over Gaussian distributions, extending the well-known concentration inequality for Lipschitz functions. Applying our RGO implementation to the proximal sampler, we achieve state-of-the-art complexity bounds in almost all settings. For instance, for strongly log-concave distributions, our method has complexity bound $\tilde\mathcal{O}(\kappa d^{1/2})$ without warm start, better than the minimax bound for MALA. For distributions satisfying the LSI, our bound is $\tilde \mathcal{O}(\hat \kappa d^{1/2})$ where $\hat \kappa$ is the ratio between smoothness and the LSI constant, better than all existing bounds.
In this paper, we investigate computational power of threshold circuits and other theoretical models of neural networks in terms of the following four complexity measures: size (the number of gates), depth, weight and energy. Here the energy complexity of a circuit measures sparsity of their computation, and is defined as the maximum number of gates outputting non-zero values taken over all the input assignments. As our main result, we prove that any threshold circuit $C$ of size $s$, depth $d$, energy $e$ and weight $w$ satisfies $\log (rk(M_C)) \le ed (\log s + \log w + \log n)$, where $rk(M_C)$ is the rank of the communication matrix $M_C$ of a $2n$-variable Boolean function that $C$ computes. Thus, such a threshold circuit $C$ is able to compute only a Boolean function of which communication matrix has rank bounded by a product of logarithmic factors of $s,w$ and linear factors of $d,e$. This implies an exponential lower bound on the size of even sublinear-depth threshold circuit if energy and weight are sufficiently small. For other models of neural networks such as a discretized ReLE circuits and decretized sigmoid circuits, we prove that a similar inequality also holds for a discretized circuit $C$: $rk(M_C) = O(ed(\log s + \log w + \log n)^3)$.
We prove the first polynomial separation between randomized and deterministic time-space tradeoffs of multi-output functions. In particular, we present a total function that on the input of $n$ elements in $[n]$, outputs $O(n)$ elements, such that: (1) There exists a randomized oblivious algorithm with space $O(\log n)$, time $O(n\log n)$ and one-way access to randomness, that computes the function with probability $1-O(1/n)$; (2) Any deterministic oblivious branching program with space $S$ and time $T$ that computes the function must satisfy $T^2S\geq\Omega(n^{2.5}/\log n)$. This implies that logspace randomized algorithms for multi-output functions cannot be black-box derandomized without an $\widetilde{\Omega}(n^{1/4})$ overhead in time. Since previously all the polynomial time-space tradeoffs of multi-output functions are proved via the Borodin-Cook method, which is a probabilistic method that inherently gives the same lower bound for randomized and deterministic branching programs, our lower bound proof is intrinsically different from previous works. We also examine other natural candidates for proving such separations, and show that any polynomial separation for these problems would resolve the long-standing open problem of proving $n^{1+\Omega(1)}$ time lower bound for decision problems with $\mathrm{polylog}(n)$ space.
It is well known that the Euler method for approximating the solutions of a random ordinary differential equation $\mathrm{d}X_t/\mathrm{d}t = f(t, X_t, Y_t)$ driven by a stochastic process $\{Y_t\}_t$ with $\theta$-H\"older sample paths is estimated to be of strong order $\theta$ with respect to the time step, provided $f=f(t, x, y)$ is sufficiently regular and with suitable bounds. Here, it is proved that, in many typical cases, further conditions on the noise can be exploited so that the strong convergence is actually of order 1, regardless of the H\"older regularity of the sample paths. This applies for instance to additive or multiplicative It\^o process noises (such as Wiener, Ornstein-Uhlenbeck, and geometric Brownian motion processes); to point-process noises (such as Poisson point processes and Hawkes self-exciting processes, which even have jump-type discontinuities); and to transport-type processes with sample paths of bounded variation. The result is based on a novel approach, estimating the global error as an iterated integral over both large and small mesh scales, and switching the order of integration to move the critical regularity to the large scale. The work is complemented with numerical simulations illustrating the strong order 1 convergence in those cases, and with an example with fractional Brownian motion noise with Hurst parameter $0 < H < 1/2$ for which the order of convergence is $H + 1/2$, hence lower than the attained order 1 in the examples above, but still higher than the order $H$ of convergence expected from previous works.
One of the major open problems in complexity theory is proving super-logarithmic lower bounds on the depth of circuits (i.e., $\mathbf{P}\not\subseteq\mathbf{NC}^{1}$). Karchmer, Raz, and Wigderson (Computational Complexity 5(3/4), 1995) suggested to approach this problem by proving that depth complexity of a composition of functions $f\diamond g$ is roughly the sum of the depth complexities of $f$ and $g$. They showed that the validity of this conjecture would imply that $\mathbf{P}\not\subseteq\mathbf{NC}^{1}$. The intuition that underlies the KRW conjecture is that the composition $f\diamond g$ should behave like a "direct-sum problem", in a certain sense, and therefore the depth complexity of $f\diamond g$ should be the sum of the individual depth complexities. Nevertheless, there are two obstacles toward turning this intuition into a proof: first, we do not know how to prove that $f\diamond g$ must behave like a direct-sum problem; second, we do not know how to prove that the complexity of the latter direct-sum problem is indeed the sum of the individual complexities. In this work, we focus on the second obstacle. To this end, we study a notion called "strong composition", which is the same as $f\diamond g$ except that it is forced to behave like a direct-sum problem. We prove a variant of the KRW conjecture for strong composition, thus overcoming the above second obstacle. This result demonstrates that the first obstacle above is the crucial barrier toward resolving the KRW conjecture. Along the way, we develop some general techniques that might be of independent interest.
For many applications involving a sequence of linear systems with slowly changing system matrices, subspace recycling, which exploits relationships among systems and reuses search space information, can achieve huge gains in iterations across the total number of linear system solves in the sequence. However, for general (i.e., non-identity) shifted systems with the shift value varying over a wide range, the properties of the linear systems vary widely as well, which makes recycling less effective. If such a sequence of systems is embedded in a nonlinear iteration, the problem is compounded, and special approaches are needed to use recycling effectively. In this paper, we develop new, more efficient, Krylov subspace recycling approaches for large-scale image reconstruction and restoration techniques that employ a nonlinear iteration to compute a suitable regularization matrix. For each new regularization matrix, we need to solve regularized linear systems, ${\bf A} + \gamma_\ell {\bf E}_k$, for a sequence of regularization parameters, $\gamma_\ell$, to find the optimally regularized solution that, in turn, will be used to update the regularization matrix. In this paper, we analyze system and solution characteristics to choose appropriate techniques to solve each system rapidly. Specifically, we use an inner-outer recycling approach with a larger, principal recycle space for each nonlinear step and smaller recycle spaces for each shift. We propose an efficient way to obtain good initial guesses from the principle recycle space and smaller shift-specific recycle spaces that lead to fast convergence. Our method is substantially reduces the total number of matrix-vector products that would arise in a naive approach. Our approach is more generally applicable to sequences of shifted systems where the matrices in the sum are positive semi-definite.
Over the last several decades, improvements in the fields of analytic combinatorics and computer algebra have made determining the asymptotic behaviour of sequences satisfying linear recurrence relations with polynomial coefficients largely a matter of routine, under assumptions that hold often in practice. The algorithms involved typically take a sequence, encoded by a recurrence relation and initial terms, and return the leading terms in an asymptotic expansion up to a big-O error term. Less studied, however, are effective techniques giving an explicit bound on asymptotic error terms. Among other things, such explicit bounds typically allow the user to automatically prove sequence positivity (an active area of enumerative and algebraic combinatorics) by exhibiting an index when positive leading asymptotic behaviour dominates any error terms. In this article, we present a practical algorithm for computing such asymptotic approximations with rigorous error bounds, under the assumption that the generating series of the sequence is a solution of a differential equation with regular (Fuchsian) dominant singularities. Our algorithm approximately follows the singularity analysis method of Flajolet and Odlyzko, except that all big-O terms involved in the derivation of the asymptotic expansion are replaced by explicit error terms. The computation of the error terms combines analytic bounds from the literature with effective techniques from rigorous numerics and computer algebra. We implement our algorithm in the SageMath computer algebra system and exhibit its use on a variety of applications (including our original motivating example, solution uniqueness in the Canham model for the shape of genus one biomembranes).
Integrated circuit verification has gathered considerable interest in recent times. Since these circuits keep growing in complexity year by year, pre-Silicon (pre-SI) verification becomes ever more important, in order to ensure proper functionality. Thus, in order to reduce the time needed for manually verifying ICs, we propose a machine learning (ML) approach, which uses less simulations. This method relies on an initial evaluation set of operating condition configurations (OCCs), in order to train Gaussian process (GP) surrogate models. By using surrogate models, we can propose further, more difficult OCCs. Repeating this procedure for several iterations has shown better GP estimation of the circuit's responses, on both synthetic and real circuits, resulting in a better chance of finding the worst case, or even failures, for certain circuit responses. Thus, we show that the proposed approach is able to provide OCCs closer to the specifications for all circuits and identify a failure (specification violation) for one of the responses of a real circuit.
We study the asymptotic generalization of an overparameterized linear model for multiclass classification under the Gaussian covariates bi-level model introduced in Subramanian et al.~'22, where the number of data points, features, and classes all grow together. We fully resolve the conjecture posed in Subramanian et al.~'22, matching the predicted regimes for generalization. Furthermore, our new lower bounds are akin to an information-theoretic strong converse: they establish that the misclassification rate goes to 0 or 1 asymptotically. One surprising consequence of our tight results is that the min-norm interpolating classifier can be asymptotically suboptimal relative to noninterpolating classifiers in the regime where the min-norm interpolating regressor is known to be optimal. The key to our tight analysis is a new variant of the Hanson-Wright inequality which is broadly useful for multiclass problems with sparse labels. As an application, we show that the same type of analysis can be used to analyze the related multilabel classification problem under the same bi-level ensemble.