Naively storing a counter up to value $n$ would require $\Omega(\log n)$ bits of memory. Nelson and Yu [NY22], following work of [Morris78], showed that if the query answers need only be $(1+\epsilon)$-approximate with probability at least $1 - \delta$, then $O(\log\log n + \log\log(1/\delta) + \log(1/\epsilon))$ bits suffice, and in fact this bound is tight. Morris' original motivation for studying this problem though, as well as modern applications, require not only maintaining one counter, but rather $k$ counters for $k$ large. This motivates the following question: for $k$ large, can $k$ counters be simultaneously maintained using asymptotically less memory than $k$ times the cost of an individual counter? That is to say, does this problem benefit from an improved {\it amortized} space complexity bound? We answer this question in the negative. Specifically, we prove a lower bound for nearly the full range of parameters showing that, in terms of memory usage, there is no asymptotic benefit possible via amortization when storing multiple counters. Our main proof utilizes a certain notion of "information cost" recently introduced by Braverman, Garg and Woodruff in FOCS 2020 to prove lower bounds for streaming algorithms.
As the accuracy of machine learning models increases at a fast rate, so does their demand for energy and compute resources. On a low level, the major part of these resources is consumed by data movement between different memory units. Modern hardware architectures contain a form of fast memory (e.g., cache, registers), which is small, and a slow memory (e.g., DRAM), which is larger but expensive to access. We can only process data that is stored in fast memory, which incurs data movement (input/output-operations, or I/Os) between the two units. In this paper, we provide a rigorous theoretical analysis of the I/Os needed in sparse feedforward neural network (FFNN) inference. We establish bounds that determine the optimal number of I/Os up to a factor of 2 and present a method that uses a number of I/Os within that range. Much of the I/O-complexity is determined by a few high-level properties of the FFNN (number of inputs, outputs, neurons, and connections), but if we want to get closer to the exact lower bound, the instance-specific sparsity patterns need to be considered. Departing from the 2-optimal computation strategy, we show how to reduce the number of I/Os further with simulated annealing. Complementing this result, we provide an algorithm that constructively generates networks with maximum I/O-efficiency for inference. We test the algorithms and empirically verify our theoretical and algorithmic contributions. In our experiments on real hardware we observe speedups of up to 45$\times$ relative to the standard way of performing inference.
Latent factor model estimation typically relies on either using domain knowledge to manually pick several observed covariates as factor proxies, or purely conducting multivariate analysis such as principal component analysis. However, the former approach may suffer from the bias while the latter can not incorporate additional information. We propose to bridge these two approaches while allowing the number of factor proxies to diverge, and hence make the latent factor model estimation robust, flexible, and statistically more accurate. As a bonus, the number of factors is also allowed to grow. At the heart of our method is a penalized reduced rank regression to combine information. To further deal with heavy-tailed data, a computationally attractive penalized robust reduced rank regression method is proposed. We establish faster rates of convergence compared with the benchmark. Extensive simulations and real examples are used to illustrate the advantages.
This paper deals with the derivation of Non-Intrusive Reduced Basis (NIRB) techniques for sensitivity analysis, more specifically the direct and adjoint state methods. For highly complex parametric problems, these two approaches may become too costly. To reduce computational times, Proper Orthogonal Decomposition (POD) and Reduced Basis Methods (RBMs) have already been investigated. The majority of these algorithms are however intrusive in the sense that the High-Fidelity (HF) code must be modified. To address this issue, non-intrusive strategies are employed. The NIRB two-grid method uses the HF code solely as a ``black-box'', requiring no code modification. Like other RBMs, it is based on an offline-online decomposition. The offline stage is time-consuming, but it is only executed once, whereas the online stage is significantly less expensive than an HF evaluation. In this paper, we propose new NIRB two-grid algorithms for both the direct and adjoint state methods. On a classical model problem, the heat equation, we prove that HF evaluations of sensitivities reach an optimal convergence rate in $L^{\infty}(0,T;H^1(\Omega))$, and then establish that these rates are recovered by the proposed NIRB approximations. These results are supported by numerical simulations. We then numerically demonstrate that a further deterministic post-treatment can be applied to the direct method. This further reduces computational costs of the online step while only computing a coarse solution of the initial problem. All numerical results are run with the model problem as well as a more complex problem, namely the Brusselator system.
The orthogonality dimension of a graph $G$ over $\mathbb{R}$ is the smallest integer $k$ for which one can assign a nonzero $k$-dimensional real vector to each vertex of $G$, such that every two adjacent vertices receive orthogonal vectors. We prove that for every sufficiently large integer $k$, it is $\mathsf{NP}$-hard to decide whether the orthogonality dimension of a given graph over $\mathbb{R}$ is at most $k$ or at least $2^{(1-o(1)) \cdot k/2}$. We further prove such hardness results for the orthogonality dimension over finite fields as well as for the closely related minrank parameter, which is motivated by the index coding problem in information theory. This in particular implies that it is $\mathsf{NP}$-hard to approximate these graph quantities to within any constant factor. Previously, the hardness of approximation was known to hold either assuming certain variants of the Unique Games Conjecture or for approximation factors smaller than $3/2$. The proofs involve the concept of line digraphs and bounds on their orthogonality dimension and on the minrank of their complement.
The problem of approximating the Pareto front of a multiobjective optimization problem can be reformulated as the problem of finding a set that maximizes the hypervolume indicator. This paper establishes the analytical expression of the Hessian matrix of the mapping from a (fixed size) collection of $n$ points in the $d$-dimensional decision space (or $m$ dimensional objective space) to the scalar hypervolume indicator value. To define the Hessian matrix, the input set is vectorized, and the matrix is derived by analytical differentiation of the mapping from a vectorized set to the hypervolume indicator. The Hessian matrix plays a crucial role in second-order methods, such as the Newton-Raphson optimization method, and it can be used for the verification of local optimal sets. So far, the full analytical expression was only established and analyzed for the relatively simple bi-objective case. This paper will derive the full expression for arbitrary dimensions ($m\geq2$ objective functions). For the practically important three-dimensional case, we also provide an asymptotically efficient algorithm with time complexity in $O(n\log n)$ for the exact computation of the Hessian Matrix' non-zero entries. We establish a sharp bound of $12m-6$ for the number of non-zero entries. Also, for the general $m$-dimensional case, a compact recursive analytical expression is established, and its algorithmic implementation is discussed. Also, for the general case, some sparsity results can be established; these results are implied by the recursive expression. To validate and illustrate the analytically derived algorithms and results, we provide a few numerical examples using Python and Mathematica implementations. Open-source implementations of the algorithms and testing data are made available as a supplement to this paper.
It remains an open problem to find the optimal configuration of phase shifts under the discrete constraint for intelligent reflecting surface (IRS) in polynomial time. The above problem is widely believed to be difficult because it is not linked to any known combinatorial problems that can be solved efficiently. The branch-and-bound algorithms and the approximation algorithms constitute the best results in this area. Nevertheless, this work shows that the global optimum can actually be reached in linear time on average in terms of the number of reflective elements (REs) of IRS. The main idea is to geometrically interpret the discrete beamforming problem as choosing the optimal point on the unit circle. Although the number of possible combinations of phase shifts grows exponentially with the number of REs, it turns out that there are only a linear number of circular arcs that possibly contain the optimal point. Furthermore, the proposed algorithm can be viewed as a novel approach to a special case of the discrete quadratic program (QP).
We give a simple combinatorial algorithm to deterministically approximately count the number of satisfying assignments of general constraint satisfaction problems (CSPs). Suppose that the CSP has domain size $q=O(1)$, each constraint contains at most $k=O(1)$ variables, shares variables with at most $\Delta=O(1)$ constraints, and is violated with probability at most $p$ by a uniform random assignment. The algorithm returns in polynomial time in an improved local lemma regime: \[ q^2\cdot k\cdot p\cdot\Delta^5\le C_0\quad\text{for a suitably small absolute constant }C_0. \] Here the key term $\Delta^5$ improves the previously best known $\Delta^7$ for general CSPs [JPV21b] and $\Delta^{5.714}$ for the special case of $k$-CNF [JPV21a, HSW21] . Our deterministic counting algorithm is a derandomization of the very recent fast sampling algorithm in [HWY22]. It departs substantially from all previous deterministic counting Lov\'{a}sz local lemma algorithms which relied on linear programming, and gives a deterministic approximate counting algorithm that straightforwardly derandomizes a fast sampling algorithm, hence unifying the fast sampling and deterministic approximate counting in the same algorithmic framework. To obtain the improved regime, in our analysis we develop a refinement of the $\{2,3\}$-trees that were used in the previous analyses of counting/sampling LLL. Similar techniques can be applied to the previous LP-based algorithms to obtain the same improved regime and may be of independent interests.
Variance reduction techniques such as SPIDER/SARAH/STORM have been extensively studied to improve the convergence rates of stochastic non-convex optimization, which usually maintain and update a sequence of estimators for a single function across iterations. What if we need to track multiple functional mappings across iterations but only with access to stochastic samples of $\mathcal{O}(1)$ functional mappings at each iteration? There is an important application in solving an emerging family of coupled compositional optimization problems in the form of $\sum_{i=1}^m f_i(g_i(\mathbf{w}))$, where $g_i$ is accessible through a stochastic oracle. The key issue is to track and estimate a sequence of $\mathbf g(\mathbf{w})=(g_1(\mathbf{w}), \ldots, g_m(\mathbf{w}))$ across iterations, where $\mathbf g(\mathbf{w})$ has $m$ blocks and it is only allowed to probe $\mathcal{O}(1)$ blocks to attain their stochastic values and Jacobians. To improve the complexity for solving these problems, we propose a novel stochastic method named Multi-block-Single-probe Variance Reduced (MSVR) estimator to track the sequence of $\mathbf g(\mathbf{w})$. It is inspired by STORM but introduces a customized error correction term to alleviate the noise not only in stochastic samples for the selected blocks but also in those blocks that are not sampled. With the help of the MSVR estimator, we develop several algorithms for solving the aforementioned compositional problems with improved complexities across a spectrum of settings with non-convex/convex/strongly convex/Polyak-{\L}ojasiewicz (PL) objectives. Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on the strong convexity parameter. Empirical studies on multi-task deep AUC maximization demonstrate the better performance of using the new estimator.
We propose a well-posed Maxwell-type boundary condition for the linear moment system in half-space. As a reduction of the Boltzmann equation, the moment equations are available to model Knudsen layers near a solid wall, where proper boundary conditions should be prescribed. Utilizing an orthogonal decomposition, we separate the part with a damping term from the system and then impose a new class of Maxwell-type boundary conditions on it. Due to the new boundary condition, we show that the half-space boundary value problem admits a unique solution with explicit expressions. Instantly, the well-posedness of the linear moment system is achieved. We apply the procedure to classical flow problems with the Shakhov collision term, such as the velocity slip and temperature jump problems. The model can capture Knudsen layers with very high accuracy using only a few moments.
Several clustering methods (e.g., Normalized Cut and Ratio Cut) divide the Min Cut cost function by a cluster dependent factor (e.g., the size or the degree of the clusters), in order to yield a more balanced partitioning. We, instead, investigate adding such regularizations to the original cost function. We first consider the case where the regularization term is the sum of the squared size of the clusters, and then generalize it to adaptive regularization of the pairwise similarities. This leads to shifting (adaptively) the pairwise similarities which might make some of them negative. We then study the connection of this method to Correlation Clustering and then propose an efficient local search optimization algorithm with fast theoretical convergence rate to solve the new clustering problem. In the following, we investigate the shift of pairwise similarities on some common clustering methods, and finally, we demonstrate the superior performance of the method by extensive experiments on different datasets.