We consider the stochastic gradient method with random reshuffling ($\mathsf{RR}$) for tackling smooth nonconvex optimization problems. $\mathsf{RR}$ finds broad applications in practice, notably in training neural networks. In this work, we first investigate the concentration property of $\mathsf{RR}$'s sampling procedure and establish a new high probability sample complexity guarantee for driving the gradient (without expectation) below $\varepsilon$, which effectively characterizes the efficiency of a single $\mathsf{RR}$ execution. Our derived complexity matches the best existing in-expectation one up to a logarithmic term while imposing no additional assumptions nor changing $\mathsf{RR}$'s updating rule. Furthermore, by leveraging our derived high probability descent property and bound on the stochastic error, we propose a simple and computable stopping criterion for $\mathsf{RR}$ (denoted as $\mathsf{RR}$-$\mathsf{sc}$). This criterion is guaranteed to be triggered after a finite number of iterations, and then $\mathsf{RR}$-$\mathsf{sc}$ returns an iterate with its gradient below $\varepsilon$ with high probability. Moreover, building on the proposed stopping criterion, we design a perturbed random reshuffling method ($\mathsf{p}$-$\mathsf{RR}$) that involves an additional randomized perturbation procedure near stationary points. We derive that $\mathsf{p}$-$\mathsf{RR}$ provably escapes strict saddle points and efficiently returns a second-order stationary point with high probability, without making any sub-Gaussian tail-type assumptions on the stochastic gradient errors. Finally, we conduct numerical experiments on neural network training to support our theoretical findings.
We consider the time and space required for quantum computers to solve a wide variety of problems involving matrices, many of which have only been analyzed classically in prior work. Our main results show that for a range of linear algebra problems -- including matrix-vector product, matrix inversion, matrix multiplication and powering -- existing classical time-space tradeoffs, several of which are tight for every space bound, also apply to quantum algorithms. For example, for almost all matrices $A$, including the discrete Fourier transform (DFT) matrix, we prove that quantum circuits with at most $T$ input queries and $S$ qubits of memory require $T=\Omega(n^2/S)$ to compute matrix-vector product $Ax$ for $x \in \{0,1\}^n$. We similarly prove that matrix multiplication for $n\times n$ binary matrices requires $T=\Omega(n^3 / \sqrt{S})$. Because many of our lower bounds match deterministic algorithms with the same time and space complexity, we show that quantum computers cannot provide any asymptotic advantage for these problems with any space bound. We obtain matching lower bounds for the stronger notion of quantum cumulative memory complexity -- the sum of the space per layer of a circuit. We also consider Boolean (i.e. AND-OR) matrix multiplication and matrix-vector products, improving the previous quantum time-space tradeoff lower bounds for $n\times n$ Boolean matrix multiplication to $T=\Omega(n^{2.5}/S^{1/3})$ from $T=\Omega(n^{2.5}/S^{1/2})$. Our improved lower bound for Boolean matrix multiplication is based on a new coloring argument that extracts more from the strong direct product theorem used in prior work. Our tight lower bounds for linear algebra problems require adding a new bucketing method to the recording-query technique of Zhandry that lets us apply classical arguments to upper bound the success probability of quantum circuits.
Minimum sum vertex cover of an $n$-vertex graph $G$ is a bijection $\phi : V(G) \to [n]$ that minimizes the cost $\sum_{\{u,v\} \in E(G)} \min \{\phi(u), \phi(v) \}$. Finding a minimum sum vertex cover of a graph (the MSVC problem) is NP-hard. MSVC is studied well in the realm of approximation algorithms. The best-known approximation factor in polynomial time for the problem is $16/9$ [Bansal, Batra, Farhadi, and Tetali, SODA 2021]. Recently, Stankovic [APPROX/RANDOM 2022] proved that achieving an approximation ratio better than $1.014$ for MSVC is NP-hard, assuming the Unique Games Conjecture. We study the MSVC problem from the perspective of parameterized algorithms. The parameters we consider are the size of a minimum vertex cover and the size of a minimum clique modulator of the input graph. We obtain the following results. 1. MSVC can be solved in $2^{2^{O(k)}} n^{O(1)}$ time, where $k$ is the size of a minimum vertex cover. 2. MSVC can be solved in $f(k)\cdot n^{O(1)}$ time for some computable function $f$, where $k$ is the size of a minimum clique modulator.
We propose and analyze a class of meshfree, super-algebraically convergent methods for partial differential equations (PDEs) on surfaces using Fourier extensions minimizing a measure of non-smoothness (such as a Sobolev norm). Current spectral methods for surface PDEs are primarily limited to a small class of surfaces, such as subdomains of spheres. Other high order methods for surface PDEs typically use radial basis functions (RBFs). Many of these methods are not well-understood analytically for surface PDEs and are highly ill-conditioned. Our methods work by extending a surface PDE into a box-shaped domain so that differential operators of the extended function agree with the surface differential operators, as in the Closest Point Method. The methods can be proven to converge super-algebraically for certain well-posed linear PDEs, and spectral convergence to machine error has been observed numerically for a variety of problems. Our approach works on arbitrary smooth surfaces (closed or non-closed) defined by point clouds with minimal conditions.
Existing structural analysis methods may fail to find all hidden constraints for a system of differential-algebraic equations with parameters if the system is structurally unamenable for certain values of the parameters. In this paper, for polynomial systems of differential-algebraic equations, numerical methods are given to solve such cases using numerical real algebraic geometry. First, we propose an embedding method that for a given real analytic system constructs an equivalent system with a full-rank Jacobian matrix. Secondly, we introduce a witness point method, which can help to detect degeneration on all components of constraints of such systems. Thirdly, the two methods above lead to a numerical global structural analysis method for structurally unamenable differential-algebraic equations on all components of constraints.
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to generate "algorithmic beliefs" at each round, and use Bayesian posteriors to make decisions. The optimization objective to create "algorithmic beliefs," which we term "Algorithmic Information Ratio," represents an intrinsic complexity measure that effectively characterizes the frequentist regret of any algorithm. To the best of our knowledge, this is the first systematical approach to make Bayesian-type algorithms prior-free and applicable to adversarial settings, in a generic and optimal manner. Moreover, the algorithms are simple and often efficient to implement. As a major application, we present a novel algorithm for multi-armed bandits that achieves the "best-of-all-worlds" empirical performance in the stochastic, adversarial, and non-stationary environments. And we illustrate how these principles can be used in linear bandits, bandit convex optimization, and reinforcement learning.
The advent of quantum computers, operating on entirely different physical principles and abstractions from those of classical digital computers, sets forth a new computing paradigm that can potentially result in game-changing efficiencies and computational performance. Specifically, the ability to simultaneously evolve the state of an entire quantum system leads to quantum parallelism and interference. Despite these prospects, opportunities to bring quantum computing to bear on problems of computational mechanics remain largely unexplored. In this work, we demonstrate how quantum computing can indeed be used to solve representative volume element (RVE) problems in computational homogenisation with polylogarithmic complexity of~$ \mathcal{O}((\log N)^c)$, compared to~$\mathcal{O}(N^c)$ in classical computing. Thus, our quantum RVE solver attains exponential acceleration with respect to classical solvers, bringing concurrent multiscale computing closer to practicality. The proposed quantum RVE solver combines conventional algorithms such as a fixed-point iteration for a homogeneous reference material and the Fast Fourier Transform (FFT). However, the quantum computing reformulation of these algorithms requires a fundamental paradigm shift and a complete rethinking and overhaul of the classical implementation. We employ or develop several techniques, including the Quantum Fourier Transform (QFT), quantum encoding of polynomials, classical piecewise Chebyshev approximation of functions and an auxiliary algorithm for implementing the fixed-point iteration and show that, indeed, an efficient implementation of RVE solvers on quantum computers is possible. We additionally provide theoretical proofs and numerical evidence confirming the anticipated~$ \mathcal{O} \left ((\log N)^c \right) $ complexity of the proposed solver.
We develop a distributed Block Chebyshev-Davidson algorithm to solve large-scale leading eigenvalue problems for spectral analysis in spectral clustering. First, the efficiency of the Chebyshev-Davidson algorithm relies on the prior knowledge of the eigenvalue spectrum, which could be expensive to estimate. This issue can be lessened by the analytic spectrum estimation of the Laplacian or normalized Laplacian matrices in spectral clustering, making the proposed algorithm very efficient for spectral clustering. Second, to make the proposed algorithm capable of analyzing big data, a distributed and parallel version has been developed with attractive scalability. The speedup by parallel computing is approximately equivalent to $\sqrt{p}$, where $p$ denotes the number of processes. {Numerical results will be provided to demonstrate its efficiency in spectral clustering and scalability advantage over existing eigensolvers used for spectral clustering in parallel computing environments.}
We present Advancing Front Mapping (AFM), a provably robust algorithm for the computation of surface mappings to simple base domains. Given an input mesh and a convex or star-shaped target domain, AFM installs a (possibly refined) version of the input connectivity into the target shape, generating a piece-wise linear mapping between them. The algorithm is inspired by the advancing front meshing paradigm, which is revisited to operate on two embeddings at once, thus becoming a tool for compatible mesh generation. AFM extends the capabilities of existing robust approaches, such as Tutte or Progressive Embedding, by providing the same theoretical guarantees of injectivity and at the same time introducing two key advantages: support for a broader set of target domains (star-shaped polygons) and local mesh refinement, which is used to automatically open the space of solutions in case a valid mapping to the target domain does not exist. AFM relies solely on two topological operators (split and flip), and on the computation of segment intersections, thus permitting to compute provably injective mappings without solving any numerical problem. This makes the algorithm predictable, easy to implement, debug and deploy. We validated the capabilities of AFM extensively, executing more than one billion advancing front moves on 36K mapping tasks, proving that our theoretical guarantees nicely transition to a robust and practical implementation.
Car detection is an important task that serves as a crucial prerequisite for many automated driving functions. The large variations in lighting/weather conditions and vehicle densities of the scenes pose significant challenges to existing car detection algorithms to meet the highly accurate perception demand for safety, due to the unstable/limited color information, which impedes the extraction of meaningful/discriminative features of cars. In this work, we present a novel learning-based car detection method that leverages trichromatic linear polarization as an additional cue to disambiguate such challenging cases. A key observation is that polarization, characteristic of the light wave, can robustly describe intrinsic physical properties of the scene objects in various imaging conditions and is strongly linked to the nature of materials for cars (e.g., metal and glass) and their surrounding environment (e.g., soil and trees), thereby providing reliable and discriminative features for robust car detection in challenging scenes. To exploit polarization cues, we first construct a pixel-aligned RGB-Polarization car detection dataset, which we subsequently employ to train a novel multimodal fusion network. Our car detection network dynamically integrates RGB and polarization features in a request-and-complement manner and can explore the intrinsic material properties of cars across all learning samples. We extensively validate our method and demonstrate that it outperforms state-of-the-art detection methods. Experimental results show that polarization is a powerful cue for car detection.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.