Histograms, i.e., piece-wise constant approximations, are a popular tool used to represent data distributions. Traditionally, the difference between the histogram and the underlying distribution (i.e., the approximation error) is measured using the $L_p$ norm, which sums the differences between the two functions over all items in the domain. Although useful in many applications, the drawback of this error measure is that it treats approximation errors of all items in the same way, irrespective of whether the mass of an item is important for the downstream application that uses the approximation. As a result, even relatively simple distributions cannot be approximated by succinct histograms without incurring large error. In this paper, we address this issue by adapting the definition of approximation so that only the errors of the items that belong to the support of the distribution are considered. Under this definition, we develop efficient 1-pass and 2-pass streaming algorithms that compute near-optimal histograms in sub-linear space. We also present lower bounds on the space complexity of this problem. Surprisingly, under this notion of error, there is an exponential gap in the space complexity of 1-pass and 2-pass streaming algorithms. Finally, we demonstrate the utility of our algorithms on a collection of real and synthetic data sets.
Offline estimators are often inadequate for real-time applications. Nevertheless, many online estimators are found by sacrificing some statistical efficiency. This paper presents a general framework to understand and construct efficient nonparametric estimators for online problems. Statistically, we choose long-run variance as an exemplary estimand and derive the first set of sufficient conditions for O(1)-time or O(1)-space update, which allows methodological generation of estimators. Our asymptotic theory shows that the generated estimators dominate existing alternatives. Computationally, we introduce mini-batch estimation to accelerate online estimators for real-time applications. Implementation issues such as automatic optimal parameters selection are discussed. Practically, we demonstrate how to use our framework with recent development in change point detection, causal inference, and stochastic approximation. We also illustrate the strength of our estimators in some classical problems such as Markov chain Monte Carlo convergence diagnosis and confidence interval construction.
Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale empirical risk minimization setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.
In this work, we give a unifying view of locality in four settings: distributed algorithms, sequential greedy algorithms, dynamic algorithms, and online algorithms. We introduce a new model of computing, called the online-LOCAL model: the adversary reveals the nodes of the input graph one by one, in the same way as in classical online algorithms, but for each new node the algorithm can also inspect its radius-$T$ neighborhood before choosing the output. Instead of looking ahead in time, we have the power of looking around in space. We compare the online-LOCAL model with three other models: the LOCAL model of distributed computing, where each node produces its output based on its radius-$T$ neighborhood, its sequential counterpart SLOCAL, and the dynamic-LOCAL model, where changes in the dynamic input graph only influence the radius-$T$ neighborhood of the point of change. SLOCAL and dynamic-LOCAL models are sandwiched between LOCAL and online-LOCAL models, with LOCAL being the weakest and online-LOCAL the strongest model. In this work, we seek to answer the following question: is the online-LOCAL model strictly stronger than the LOCAL model when we look at graph algorithms for solving locally checkable labeling problems (LCLs)? First, we show that for LCL problems in paths, cycles, and rooted trees, all four models are roughly equivalent: the locality of any LCL problem falls in the same broad class - $O(\log^* n)$, $\Theta(\log n)$, or $n^{\Theta(1)}$ - in all four models. In particular, prior work on the LOCAL model directly generalizes to all four models. Second, we show that this equivalence does not hold in two-dimensional grids. We show that the locality of the $3$-coloring problem is $O(\log n)$ in the online-LOCAL model, while it is known to be $\Omega(\sqrt{n})$ in the LOCAL model.
We develop an optimization-based algorithm for parametric model order reduction (PMOR) of linear time-invariant dynamical systems. Our method aims at minimizing the $\mathcal{H}_\infty \otimes \mathcal{L}_\infty$ approximation error in the frequency and parameter domain by an optimization of the reduced order model (ROM) matrices. State-of-the-art PMOR methods often compute several nonparametric ROMs for different parameter samples, which are then combined to a single parametric ROM. However, these parametric ROMs can have a low accuracy between the utilized sample points. In contrast, our optimization-based PMOR method minimizes the approximation error across the entire parameter domain. Moreover, due to our flexible approach of optimizing the system matrices directly, we can enforce favorable features such as a port-Hamiltonian structure in our ROMs across the entire parameter domain. Our method is an extension of the recently developed SOBMOR-algorithm to parametric systems. We extend both the ROM parameterization and the adaptive sampling procedure to the parametric case. Several numerical examples demonstrate the effectiveness and high accuracy of our method in a comparison with other PMOR methods.
The A* algorithm is commonly used to solve NP-hard combinatorial optimization problems. When provided with an accurate heuristic function, A* can solve such problems in time complexity that is polynomial in the solution depth. This fact implies that accurate heuristic approximation for many such problems is also NP-hard. In this context, we examine a line of recent publications that propose the use of deep neural networks for heuristic approximation. We assert that these works suffer from inherent scalability limitations since -- under the assumption that P$\ne$NP -- such approaches result in either (a) network sizes that scale exponentially in the instance sizes or (b) heuristic approximation accuracy that scales inversely with the instance sizes. Our claim is supported by experimental results for three representative NP-hard search problems that show that fitting deep neural networks accurately to heuristic functions necessitates network sizes that scale exponentially with the instance size.
We study a fundamental model of online preference aggregation, where an algorithm maintains an ordered list of $n$ elements. An input is a stream of preferred sets $R_1, R_2, \dots, R_t, \dots$. Upon seeing $R_t$ and without knowledge of any future sets, an algorithm has to rerank elements (change the list ordering), so that at least one element of $R_t$ is found near the list front. The incurred cost is a sum of the list update costs (the number of swaps of neighboring list elements) and access costs (position of the first element of $R_t$ on the list). This scenario occurs naturally in applications such as ordering items in an online shop using aggregated preferences of shop customers. The theoretical underpinning of this problem is known as Min-Sum Set Cover. Unlike previous work (Fotakis et al., ICALP 2020, NIPS 2020) that mostly studied the performance of an online algorithm ALG against the static optimal solution (a single optimal list ordering), in this paper, we study an arguably harder variant where the benchmark is the provably stronger optimal dynamic solution OPT (that may also modify the list ordering). In terms of an online shop, this means that the aggregated preferences of its user base evolve with time. We construct a computationally efficient randomized algorithm whose competitive ratio (ALG-to-OPT cost ratio) is $O(r^2)$ and prove the existence of a deterministic $O(r^4)$-competitive algorithm. Here, $r$ is the maximum cardinality of sets $R_t$. This is the first algorithm whose ratio does not depend on $n$: the previously best algorithm for this problem was $O(r^{3/2} \cdot \sqrt{n})$-competitive and $\Omega(r)$ is a lower bound on the performance of any deterministic online algorithm.
Parallel Markov Chain Monte Carlo (pMCMC) algorithms generate clouds of proposals at each step to efficiently resolve a target probability distribution. We build a rigorous foundational framework for pMCMC algorithms that situates these methods within a unified `extended phase space' measure-theoretic formalism. Drawing on our recent work that provides a comprehensive theory for reversible single proposal methods, we herein derive general criteria for multiproposal acceptance mechanisms which yield unbiased chains on general state spaces. Our formulation encompasses a variety of methodologies, including proposal cloud resampling and Hamiltonian methods, while providing a basis for the derivation of novel algorithms. In particular, we obtain a top-down picture for a class of methods arising from `conditionally independent' proposal structures. As an immediate application, we identify several new algorithms including a multiproposal version of the popular preconditioned Crank-Nicolson (pCN) sampler suitable for high- and infinite-dimensional target measures which are absolutely continuous with respect to a Gaussian base measure. To supplement our theoretical results, we carry out a selection of numerical case studies that evaluate the efficacy of these novel algorithms. First, noting that the true potential of pMCMC algorithms arises from their natural parallelizability, we provide a limited parallelization study using TensorFlow and a graphics processing unit to scale pMCMC algorithms that leverage as many as 100k proposals at each step. Second, we use our multiproposal pCN algorithm (mpCN) to resolve a selection of problems in Bayesian statistical inversion for partial differential equations motivated by fluid measurement. These examples provide preliminary evidence of the efficacy of mpCN for high-dimensional target distributions featuring complex geometries and multimodal structures.
Propositional model counting, or #SAT, is the problem of computing the number of satisfying assignments of a Boolean formula. Many problems from different application areas, including many discrete probabilistic inference problems, can be translated into model counting problems to be solved by #SAT solvers. Exact #SAT solvers, however, are often not scalable to industrial size instances. In this paper, we present Neuro#, an approach for learning branching heuristics to improve the performance of exact #SAT solvers on instances from a given family of problems. We experimentally show that our method reduces the step count on similarly distributed held-out instances and generalizes to much larger instances from the same problem family. It is able to achieve these results on a number of different problem families having very different structures. In addition to step count improvements, Neuro# can also achieve orders of magnitude wall-clock speedups over the vanilla solver on larger instances in some problem families, despite the runtime overhead of querying the model.
We investigate trade-offs in static and dynamic evaluation of hierarchical queries with arbitrary free variables. In the static setting, the trade-off is between the time to partially compute the query result and the delay needed to enumerate its tuples. In the dynamic setting, we additionally consider the time needed to update the query result under single-tuple inserts or deletes to the database. Our approach observes the degree of values in the database and uses different computation and maintenance strategies for high-degree (heavy) and low-degree (light) values. For the latter it partially computes the result, while for the former it computes enough information to allow for on-the-fly enumeration. We define the preprocessing time, the update time, and the enumeration delay as functions of the light/heavy threshold. By appropriately choosing this threshold, our approach recovers a number of prior results when restricted to hierarchical queries. We show that for a restricted class of hierarchical queries, our approach achieves worst-case optimal update time and enumeration delay conditioned on the Online Matrix-Vector Multiplication Conjecture.
In this paper, we present optimal error estimates of the local discontinuous Galerkin method with generalized numerical fluxes for one-dimensional nonlinear convection-diffusion systems. The upwind-biased flux with adjustable numerical viscosity for the convective term is chosen based on the local characteristic decomposition, which is helpful in resolving discontinuities of degenerate parabolic equations without enforcing any limiting procedure. For the diffusive term, a pair of generalized alternating fluxes are considered. By constructing and analyzing generalized Gauss-Radau projections with respect to different convective or diffusive terms, we derive optimal error estimates for nonlinear convection-diffusion systems with the symmetrizable flux Jacobian and fully nonlinear diffusive problems. Numerical experiments including long time simulations, different boundary conditions and degenerate equations with discontinuous initial data are provided to demonstrate the sharpness of theoretical results.