The Zarankiewicz function gives, for a chosen matrix and minor size, the maximum number of ones in a binary matrix not containing an all-one minor. Tables of this function for small arguments have been compiled, but errors are known in them. We both correct the errors and extend these tables in the case of square minors by expressing the problem of finding the value at a specific point as a series of Boolean satisfiability problems, exploiting permutation symmetries for a significant reduction in the work needed. When the ambient matrix is also square we also give all non-isomorphic examples of matrices attaining the maximum, up to the aforementioned symmetries; it is found that most maximal matrices have some form of symmetry.
We describe some pseudorandom properties of binary linear codes achieving capacity on the binary erasure channel under bit-MAP decoding (as shown in Kudekar et al this includes doubly transitive codes and, in particular, Reed-Muller codes). We show that for all integer $q \ge 2$ the $\ell_q$ norm of the characteristic function of such 'pseudorandom' code decreases as fast as that of any code of the same rate (and equally fast as that of a random code) under the action of the noise operator. In information-theoretic terms this means that the $q^{th}$ R\'enyi entropy of this code increases as fast as possible over the binary symmetric channel. In particular (taking $q = \infty$) this shows that such codes have the smallest asymptotic undetected error probability (equal to that of a random code) over the BSC, for a certain range of parameters. We also study the number of times a certain local pattern, a 'rhombic' $4$-tuple of codewords, appears in a linear code, and show that for a certain range of parameters this number for pseudorandom codes is similar to that for a random code.
We develop a spectral method to solve the heat equation in a closed cylinder, achieving a near-optimal $\mathcal{O}(N\log N)$ complexity and high-order, \emph{spectral} accuracy. The algorithm relies on a novel Chebyshev--Chebyshev--Fourier (CCF) discretization of the cylinder, which is easily implemented and decouples the heat equation into a collection of smaller, sparse Sylvester equations. In turn, each of these equations is solved using the alternating direction implicit (ADI) method, which improves the complexity of each solve from cubic in the matrix size (in more traditional methods) to log-linear; overall, this represents an improvement in the heat equation solver from $\mathcal{O}(N^{7/3})$ (in traditional methods) to $\mathcal{O}(N\log N)$. Lastly, we provide numerical simulations demonstrating significant speed-ups over traditional spectral collocation methods and finite difference methods, and we provide a framework by which this heat equation solver could be applied to the incompressible Navier--Stokes equations. For the latter, we decompose the equations using a poloidal--toroidal (PT) decomposition, turning them into heat equations with nonlinear forcing from the advection term; by using implicit--explicit methods to integrate these, we can achieve the same $\mathcal{O}(N\log N)$ complexity and spectral accuracy achieved here in the heat equation.
Loop acceleration can be used to prove safety, reachability, runtime bounds, and (non-)termination of programs. To this end, a variety of acceleration techniques has been proposed. However, so far all of them have been monolithic, i.e., a single loop could not be accelerated using a combination of several different acceleration techniques. In contrast, we present a calculus that allows for combining acceleration techniques in a modular way and we show how to integrate many existing acceleration techniques into our calculus. Moreover, we propose two novel acceleration techniques that can be incorporated into our calculus seamlessly. Some of these acceleration techniques apply only to non-terminating loops. Thus, combining them with our novel calculus results in a new, modular approach for proving non-termination. An empirical evaluation demonstrates the applicability of our approach, both for loop acceleration and for proving non-termination.
The probabilistic satisfiability of a logical expression is a fundamental concept known as the partition function in statistical physics and field theory, an evaluation of a related graph's Tutte polynomial in mathematics, and the Moore-Shannon network reliability of that graph in engineering. It is the crucial element for decision-making under uncertainty. Not surprisingly, it is provably hard to compute exactly or even to approximate. Many of these applications are concerned only with a subset of problems for which the solutions are monotonic functions. Here we extend the weak- and strong-coupling methods of statistical physics to heterogeneous satisfiability problems and introduce a novel approach to constructing lower and upper bounds on the approximation error for monotonic problems. These bounds combine information from both perturbative analyses to produce bounds that are tight in the sense that they are saturated by some problem instance that is compatible with all the information contained in either approximation.
We provide sharp path-dependent generalization and excess risk guarantees for the full-batch Gradient Descent (GD) algorithm on smooth losses (possibly non-Lipschitz, possibly nonconvex), under an interpolation regime. At the heart of our analysis is a new generalization error bound for deterministic symmetric algorithms, which implies that average output stability and a bounded expected optimization error at termination lead to generalization. This result shows that small generalization error occurs along the optimization path, and allows us to bypass Lipschitz or sub-Gaussian assumptions on the loss prevalent in previous works. For nonconvex, Polyak-Lojasiewicz (PL), convex and strongly convex losses, we show the explicit dependence of the generalization error in terms of the accumulated path-dependent optimization error, terminal optimization error, number of samples, and number of iterations. For nonconvex smooth losses, we prove that full-batch GD efficiently generalizes close to any stationary point at termination, under the proper choice of a decreasing step size. Further, if the loss is nonconvex but the objective is PL, we derive quadratically vanishing bounds on the generalization error and the corresponding excess risk, for a choice of a large constant step size. For (resp. strongly-) convex smooth losses, we prove that full-batch GD also generalizes for large constant step sizes, and achieves (resp. quadratically) small excess risk while training fast. In all cases, we close the generalization error gap, by showing matching generalization and optimization error rates. Our full-batch GD generalization error and excess risk bounds are strictly tighter than existing bounds for (stochastic) GD, when the loss is smooth (but possibly non-Lipschitz).
We consider using gradient descent to minimize the nonconvex function $f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $\phi$ is an underlying smooth convex cost function defined over $n\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be overparameterized with respect to the rank $r^{\star}$ of the global minimizer $X^{\star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r>r^{\star}$, even when $\phi$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{\star}$.
Mini-batch optimal transport (m-OT) has been successfully used in practical applications that involve probability measures with a very high number of supports. The m-OT solves several smaller optimal transport problems and then returns the average of their costs and transportation plans. Despite its scalability advantage, the m-OT does not consider the relationship between mini-batches which leads to undesirable estimation. Moreover, the m-OT does not approximate a proper metric between probability measures since the identity property is not satisfied. To address these problems, we propose a novel mini-batch scheme for optimal transport, named Batch of Mini-batches Optimal Transport (BoMb-OT), that finds the optimal coupling between mini-batches and it can be seen as an approximation to a well-defined distance on the space of probability measures. Furthermore, we show that the m-OT is a limit of the entropic regularized version of the BoMb-OT when the regularized parameter goes to infinity. Finally, we carry out experiments on various applications including deep generative models, deep domain adaptation, approximate Bayesian computation, color transfer, and gradient flow to show that the BoMb-OT can be widely applied and performs well in various applications.
We develop a new method that improves the efficiency of equation-by-equation algorithms for solving polynomial systems. Our method is based on a novel geometric construction, and reduces the total number of homotopy paths that must be numerically continued. These improvements may be applied to the basic algorithms of numerical algebraic geometry in the settings of both projective and multiprojective varieties. Our computational experiments demonstrate significant savings obtained on several benchmark systems. We also present an extended case study on maximum likelihood estimation for rank-constrained symmetric $n\times n$ matrices, in which multiprojective $u$-generation allows us to complete the list of ML degrees for $n\le 6.$
Deep Learning algorithms have achieved the state-of-the-art performance for Image Classification and have been used even in security-critical applications, such as biometric recognition systems and self-driving cars. However, recent works have shown those algorithms, which can even surpass the human capabilities, are vulnerable to adversarial examples. In Computer Vision, adversarial examples are images containing subtle perturbations generated by malicious optimization algorithms in order to fool classifiers. As an attempt to mitigate these vulnerabilities, numerous countermeasures have been constantly proposed in literature. Nevertheless, devising an efficient defense mechanism has proven to be a difficult task, since many approaches have already shown to be ineffective to adaptive attackers. Thus, this self-containing paper aims to provide all readerships with a review of the latest research progress on Adversarial Machine Learning in Image Classification, however with a defender's perspective. Here, novel taxonomies for categorizing adversarial attacks and defenses are introduced and discussions about the existence of adversarial examples are provided. Further, in contrast to exisiting surveys, it is also given relevant guidance that should be taken into consideration by researchers when devising and evaluating defenses. Finally, based on the reviewed literature, it is discussed some promising paths for future research.
High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.