In Euclidean Uniform Facility Location, the input is a set of clients in $\mathbb{R}^d$ and the goal is to place facilities to serve them, so as to minimize the total cost of opening facilities plus connecting the clients. We study the classical setting of dynamic geometric streams, where the clients are presented as a sequence of insertions and deletions of points in the grid $\{1,\ldots,\Delta\}^d$, and we focus on the \emph{high-dimensional regime}, where the algorithm's space complexity must be polynomial (and certainly not exponential) in $d\cdot\log\Delta$. We present a new algorithmic framework, based on importance sampling from the stream, for $O(1)$-approximation of the optimal cost using only $\mathrm{poly}(d\cdot\log\Delta)$ space. This framework is easy to implement in two passes, one for sampling points and the other for estimating their contribution. Over random-order streams, we can extend this to a one-pass algorithm by using the two halves of the stream separately. Our main result, for arbitrary-order streams, computes $O(d^{1.5})$-approximation in one pass by using the new framework but combining the two passes differently. This improves upon previous algorithms that either need space exponential in $d$ or only guarantee $O(d\cdot\log^2\Delta)$-approximation, and therefore our algorithms for high-dimensional streams are the first to avoid the $O(\log\Delta)$-factor in approximation that is inherent to the widely-used quadtree decomposition. Our improvement is achieved by employing a geometric hashing scheme that maps points in $\mathbb{R}^d$ into buckets of bounded diameter, with the key property that every point set of small-enough diameter is hashed into at most $\mathrm{poly}(d)$ distinct buckets. We complement our results by showing $1.085$-approximation requires space exponential in $\mathrm{poly}(d\cdot\log\Delta)$, even for insertion-only streams.
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages. On one hand, RL approaches are able to learn global control policies directly from data, but generally require large sample sizes to properly converge towards feasible policies. On the other hand, TO methods are able to exploit gradient-based information extracted from simulators to quickly converge towards a locally optimal control trajectory which is only valid within the vicinity of the solution. Over the past decade, several approaches have aimed to adequately combine the two classes of methods in order to obtain the best of both worlds. Following on from this line of research, we propose several improvements on top of these approaches to learn global control policies quicker, notably by leveraging sensitivity information stemming from TO methods via Sobolev learning, and augmented Lagrangian techniques to enforce the consensus between TO and policy learning. We evaluate the benefits of these improvements on various classical tasks in robotics through comparison with existing approaches in the literature.
Recently, high dimensional vector auto-regressive models (VAR), have attracted a lot of interest, due to novel applications in the health, engineering and social sciences. The presence of temporal dependence poses additional challenges to the theory of penalized estimation techniques widely used in the analysis of their iid counterparts. However, recent work (e.g., [Basu and Michailidis, 2015, Kock and Callot, 2015]) has established optimal consistency of $\ell_1$-LASSO regularized estimates applied to models involving high dimensional stable, Gaussian processes. The only price paid for temporal dependence is an extra multiplicative factor that equals 1 for independent and identically distributed (iid) data. Further, [Wong et al., 2020] extended these results to heavy tailed VARs that exhibit "$\beta$-mixing" dependence, but the rates rates are sub-optimal, while the extra factor is intractable. This paper improves these results in two important directions: (i) We establish optimal consistency rates and corresponding finite sample bounds for the underlying model parameters that match those for iid data, modulo a price for temporal dependence, that is easy to interpret and equals 1 for iid data. (ii) We incorporate more general penalties in estimation (which are not decomposable unlike the $\ell_1$ norm) to induce general sparsity patterns. The key technical tool employed is a novel, easy-to-use concentration bound for heavy tailed linear processes, that do not rely on "mixing" notions and give tighter bounds.
Node clustering is a powerful tool in the analysis of networks. We introduce a graph neural network framework to obtain node embeddings for directed networks in a self-supervised manner, including a novel probabilistic imbalance loss, which can be used for network clustering. Here, we propose directed flow imbalance measures, which are tightly related to directionality, to reveal clusters in the network even when there is no density difference between clusters. In contrast to standard approaches in the literature, in this paper, directionality is not treated as a nuisance, but rather contains the main signal. DIGRAC optimizes directed flow imbalance for clustering without requiring label supervision, unlike existing graph neural network methods, and can naturally incorporate node features, unlike existing spectral methods. Extensive experimental results on synthetic data, in the form of directed stochastic block models, and real-world data at different scales, demonstrate that our method, based on flow imbalance, attains state-of-the-art results on directed graph clustering when compared against 10 state-of-the-art methods from the literature, for a wide range of noise and sparsity levels, graph structures and topologies, and even outperforms supervised methods.
We study two problems of private matrix multiplication, over a distributed computing system consisting of a master node, and multiple servers who collectively store a family of public matrices using Maximum-Distance-Separable (MDS) codes. In the first problem of Private and Secure Matrix Multiplication from Colluding servers (MDS-C-PSMM), the master intends to compute the product of its confidential matrix $\mathbf{A}$ with a target matrix stored on the servers, without revealing any information about $\mathbf{A}$ and the index of target matrix to some colluding servers. In the second problem of Fully Private Matrix Multiplication from Colluding servers (MDS-C-FPMM), the matrix $\mathbf{A}$ is also selected from another family of public matrices stored at the servers in MDS form. In this case, the indices of the two target matrices should both be kept private from colluding servers. We develop novel strategies for MDS-C-PSMM and MDS-C-FPMM, which simultaneously guarantee information-theoretic data/index privacy and computation correctness. The key ingredient is a careful design of secret sharings of the matrix $\mathbf{A}$ and the private indices, which are tailored to matrix multiplication task and MDS storage structure, such that the computation results from the servers can be viewed as evaluations of a polynomial at distinct points, from which the intended result can be obtained through polynomial interpolation. We compare the proposed MDS-C-PSMM strategy with a previous MDS-PSMM strategy with a weaker privacy guarantee (non-colluding servers), and demonstrate substantial improvements over the previous strategy in terms of communication and computation performance.
Bayes factors are an increasingly popular tool for indexing evidence from experiments. For two competing population models, the Bayes factor reflects the relative likelihood of observing some data under one model compared to the other. In general, computing a Bayes factor is difficult, because computing the marginal likelihood of each model requires integrating the product of the likelihood and a prior distribution on the population parameter(s). In this paper, we develop a new analytic formula for computing Bayes factors directly from minimal summary statistics in repeated-measures designs. This work is an improvement on previous methods for computing Bayes factors from summary statistics (e.g., the BIC method), which produce Bayes factors that violate the Sellke upper bound of evidence for smaller sample sizes. The new approach taken in this paper extends requires knowing only the $F$-statistic and degrees of freedom, both of which are commonly reported in most empirical work. In addition to providing computational examples, we report a simulation study that benchmarks the new formula against other methods for computing Bayes factors in repeated-measures designs. Our new method provides an easy way for researchers to compute Bayes factors directly from a minimal set of summary statistics, allowing users to index the evidential value of their own data, as well as data reported in published studies.
The Stackelberg game model, where a leader commits to a strategy and the follower best responds, has found widespread application, particularly to security problems. In the security setting, the goal is for the leader to compute an optimal strategy to commit to, in order to protect some asset. In many of these applications, the parameters of the follower utility model are not known with certainty. Distributionally-robust optimization addresses this issue by allowing a distribution over possible model parameters, where this distribution comes from a set of possible distributions. The goal is to maximize the expected utility with respect to the worst-case distribution. We initiate the study of distributionally-robust models for computing the optimal strategy to commit to. We consider the case of normal-form games with uncertainty about the follower utility model. Our main theoretical result is to show that a distributionally-robust Stackelberg equilibrium always exists across a wide array of uncertainty models. For the case of a finite set of possible follower utility functions we present two algorithms to compute a distributionally-robust strong Stackelberg equilibrium (DRSSE) using mathematical programs. Next, in the general case where there is an infinite number of possible follower utility functions and the uncertainty is represented by a Wasserstein ball around a finitely-supported nominal distribution, we give an incremental mixed-integer-programming-based algorithm for computing the optimal distributionally-robust strategy. Experiments substantiate the tractability of our algorithm on a classical Stackelberg game, showing that our approach scales to medium-sized games.
Classical results in general equilibrium theory assume divisible goods and convex preferences of market participants. In many real-world markets, participants have non-convex preferences and the allocation problem needs to consider complex constraints. Electricity markets are a prime example. In such markets, Walrasian prices are impossible, and heuristic pricing rules based on the dual of the relaxed allocation problem are used in practice. However, these rules have been criticized for high side-payments and inadequate congestion signals. We show that existing pricing heuristics optimize specific design goals that can be conflicting. The trade-offs can be substantial, and we establish that the design of pricing rules is fundamentally a multi-objective optimization problem addressing different incentives. In addition to traditional multi-objective optimization techniques using weighing of individual objectives, we introduce a novel parameter-free pricing rule that minimizes incentives for market participants to deviate locally. Our findings show how the new pricing rule capitalizes on the upsides of existing pricing rules under scrutiny today. It leads to prices that incur low make-whole payments while providing adequate congestion signals and low lost opportunity costs. Our suggested pricing rule does not require weighing of objectives, it is computationally scalable, and balances trade-offs in a principled manner, addressing an important policy issue in electricity markets.
Stochastic kriging has been widely employed for simulation metamodeling to predict the response surface of complex simulation models. However, its use is limited to cases where the design space is low-dimensional because, in general, the sample complexity (i.e., the number of design points required for stochastic kriging to produce an accurate prediction) grows exponentially in the dimensionality of the design space. The large sample size results in both a prohibitive sample cost for running the simulation model and a severe computational challenge due to the need to invert large covariance matrices. Based on tensor Markov kernels and sparse grid experimental designs, we develop a novel methodology that dramatically alleviates the curse of dimensionality. We show that the sample complexity of the proposed methodology grows only slightly in the dimensionality, even under model misspecification. We also develop fast algorithms that compute stochastic kriging in its exact form without any approximation schemes. We demonstrate via extensive numerical experiments that our methodology can handle problems with a design space of more than 10,000 dimensions, improving both prediction accuracy and computational efficiency by orders of magnitude relative to typical alternative methods in practice.
In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.