Suppose we are given integer $k \leq n$ and $n$ boxes labeled $1,\ldots, n$ by an adversary, each containing a number chosen from an unknown distribution. We have to choose an order to sequentially open these boxes, and each time we open the next box in this order, we learn its number. If we reject a number in a box, the box cannot be recalled. Our goal is to accept the $k$ largest of these numbers, without necessarily opening all boxes. This is the free order multiple-choice secretary problem. Free order variants were studied extensively for the secretary and prophet problems. Kesselheim, Kleinberg, and Niazadeh KKN (STOC'15) initiated a study of randomness-efficient algorithms (with the cheapest order in terms of used random bits) for the free order secretary problems. We present an algorithm for free order multiple-choice secretary, which is simultaneously optimal for the competitive ratio and used amount of randomness. I.e., we construct a distribution on orders with optimal entropy $\Theta(\log\log n)$ such that a deterministic multiple-threshold algorithm is $1-O(\sqrt{\log k/k})$-competitive. This improves in three ways the previous best construction by KKN, whose competitive ratio is $1 - O(1/k^{1/3}) - o(1)$. Our competitive ratio is (near)optimal for the multiple-choice secretary problem; it works for exponentially larger parameter $k$; and our algorithm is a simple deterministic multiple-threshold algorithm, while that in KKN is randomized. We also prove a corresponding lower bound on the entropy of optimal solutions for the multiple-choice secretary problem, matching entropy of our algorithm, where no such previous lower bound was known. We obtain our algorithmic results with a host of new techniques, and with these techniques we also improve significantly the previous results of KKN about constructing entropy-optimal distributions for the classic free order secretary.
The significant presence of demand charges in electric bills motivates large-load customers to utilize energy storage to reduce the peak procurement from the grid. We herein study the problem of energy storage allocation for peak minimization, under the online setting where irrevocable decisions are sequentially made without knowing future demands. The problem is uniquely challenging due to (i) the coupling of online decisions across time imposed by the inventory constraints and (ii) the noncumulative nature of the peak procurement. We apply the CR-Pursuit framework and address the challenges unique to our minimization problem to design an online algorithm achieving the optimal competitive ratio (CR) among all online algorithms. We show that the optimal CR can be computed in polynomial time by solving a linear number of linear-fractional problems. More importantly, we generalize our approach to develop an \emph{anytime-optimal} online algorithm that achieves the best possible CR at any epoch, given the inputs and online decisions so far. The algorithm retains the optimal worst-case performance and attains adaptive average-case performance. Trace-driven simulations show that our algorithm can decrease the peak demand by an extra 19% compared to baseline alternatives under typical settings.
As a special infinite-order vector autoregressive (VAR) model, the vector autoregressive moving average (VARMA) model can capture much richer temporal patterns than the widely used finite-order VAR model. However, its practicality has long been hindered by its non-identifiability, computational intractability, and relative difficulty of interpretation. This paper introduces a novel infinite-order VAR model which, with only a little sacrifice of generality, inherits the essential temporal patterns of the VARMA model but avoids all of the above drawbacks. As another attractive feature, the temporal and cross-sectional dependence structures of this model can be interpreted separately, since they are characterized by different sets of parameters. For high-dimensional time series, this separation motivates us to impose sparsity on the parameters determining the cross-sectional dependence. As a result, greater statistical efficiency and interpretability can be achieved, while no loss of temporal information is incurred by the imposed sparsity. We introduce an $\ell_1$-regularized estimator for the proposed model and derive the corresponding nonasymptotic error bounds. An efficient block coordinate descent algorithm and a consistent model order selection method are developed. The merit of the proposed approach is supported by simulation studies and a real-world macroeconomic data analysis.
We propose a symbolic execution method for programs that can draw random samples. In contrast to existing work, our method can verify randomized programs with unknown inputs and can prove probabilistic properties that universally quantify over all possible inputs. Our technique augments standard symbolic execution with a new class of \emph{probabilistic symbolic variables}, which represent the results of random draws, and computes symbolic expressions representing the probability of taking individual paths. We implement our method on top of the \textsc{KLEE} symbolic execution engine alongside multiple optimizations and use it to prove properties about probabilities and expected values for a range of challenging case studies written in C++, including Freivalds' algorithm, randomized quicksort, and a randomized property-testing algorithm for monotonicity. We evaluate our method against \textsc{Psi}, an exact probabilistic symbolic inference engine, and \textsc{Storm}, a probabilistic model checker, and show that our method significantly outperforms both tools.
We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic bandit tasks. An adversary may design the tasks, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). We design an algorithm based on a reduction to bandit submodular maximization and show that, in the regime of large number of tasks and small number of optimal arms, its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(NM\sqrt{M \tau}+N^{2/3}M\tau)$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N^{1/2}\sqrt{M K \tau})$ regret.
Connectivity augmentation problems are among the most elementary questions in Network Design. Many of these problems admit natural $2$-approximation algorithms, often through various classic techniques, whereas it remains open whether approximation factors below $2$ can be achieved. One of the most basic examples thereof is the Weighted Connectivity Augmentation Problem (WCAP). In WCAP, one is given an undirected graph together with a set of additional weighted candidate edges, and the task is to find a cheapest set of candidate edges whose addition to the graph increases its edge-connectivity. We present a $(1.5+\varepsilon)$-approximation algorithm for WCAP, showing for the first time that factors below $2$ are achievable. On a high level, we design a well-chosen local search algorithm, inspired by recent advances for Weighted Tree Augmentation. To measure progress, we consider a directed weakening of WCAP and show that it has highly structured planar solutions. Interpreting a solution of the original problem as one of this directed weakening allows us to describe local exchange steps in a clean and algorithmically amenable way. Leveraging these insights, we show that we can efficiently search for good exchange steps within a component class for link sets that is closely related to bounded treewidth subgraphs of circle graphs. Moreover, we prove that an optimum solution can be decomposed into smaller components, at least one of which leads to a good local search step as long as we did not yet achieve the claimed approximation guarantee.
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal of the decision maker is to maximize their total reward, while at the same time achieving small cumulative constraints violation across the $T$ rounds. We present the first best-of-both-world type algorithm for this general class of problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown stochastic model, and in the case in which they are selected at each round by an adversary. Our algorithm is the first to provide guarantees in the adversarial setting with respect to the optimal fixed strategy that satisfies the long-term constraints. In particular, it guarantees a $\rho/(1+\rho)$ fraction of the optimal reward and sublinear regret, where $\rho$ is a feasibility parameter related to the existence of strictly feasible solutions. Our framework employs traditional regret minimizers as black-box components. Therefore, by instantiating it with an appropriate choice of regret minimizers it can handle the full-feedback as well as the bandit-feedback setting. Moreover, it allows the decision maker to seamlessly handle scenarios with non-convex rewards and constraints. We show how our framework can be applied in the context of budget-management mechanisms for repeated auctions in order to guarantee long-term constraints that are not packing (e.g., ROI constraints).
Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that designs that are optimal over certain exponential families can achieve expected regret that grows logarithmically in the number of arm plays, at a rate governed by the Lai-Robbins lower bound. In this paper, we show that when one uses such optimized designs, the regret distribution of the associated algorithms necessarily has a very heavy tail, specifically, that of a truncated Cauchy distribution. Furthermore, for $p>1$, the $p$'th moment of the regret distribution grows much faster than poly-logarithmically, in particular as a power of the total number of arm plays. We show that optimized UCB bandit designs are also fragile in an additional sense, namely when the problem is even slightly mis-specified, the regret can grow much faster than the conventional theory suggests. Our arguments are based on standard change-of-measure ideas, and indicate that the most likely way that regret becomes larger than expected is when the optimal arm returns below-average rewards in the first few arm plays, thereby causing the algorithm to believe that the arm is sub-optimal. To alleviate the fragility issues exposed, we show that UCB algorithms can be modified so as to ensure a desired degree of robustness to mis-specification. In doing so, we also provide a sharp trade-off between the amount of UCB exploration and the tail exponent of the resulting regret distribution.
Classical results in general equilibrium theory assume divisible goods and convex preferences of market participants. In many real-world markets, participants have non-convex preferences and the allocation problem needs to consider complex constraints. Electricity markets are a prime example. In such markets, Walrasian prices are impossible, and heuristic pricing rules based on the dual of the relaxed allocation problem are used in practice. However, these rules have been criticized for high side-payments and inadequate congestion signals. We show that existing pricing heuristics optimize specific design goals that can be conflicting. The trade-offs can be substantial, and we establish that the design of pricing rules is fundamentally a multi-objective optimization problem addressing different incentives. In addition to traditional multi-objective optimization techniques using weighing of individual objectives, we introduce a novel parameter-free pricing rule that minimizes incentives for market participants to deviate locally. Our findings show how the new pricing rule capitalizes on the upsides of existing pricing rules under scrutiny today. It leads to prices that incur low make-whole payments while providing adequate congestion signals and low lost opportunity costs. Our suggested pricing rule does not require weighing of objectives, it is computationally scalable, and balances trade-offs in a principled manner, addressing an important policy issue in electricity markets.
In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback. While the mean reward criterion has been extensively studied, other measures that reflect an aversion to adverse outcomes, such as mean-variance or conditional value-at-risk (CVaR), can be of interest for critical applications (healthcare, agriculture). Algorithms have been proposed for such risk-aware measures under bandit feedback without contextual information. In this work, we study contextual bandits where such risk measures can be elicited as linear functions of the contexts through the minimization of a convex loss. A typical example that fits within this framework is the expectile measure, which is obtained as the solution of an asymmetric least-square problem. Using the method of mixtures for supermartingales, we derive confidence sequences for the estimation of such risk measures. We then propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits. This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent, at the cost of slightly higher regret. We conclude by evaluating the resulting algorithms on numerical experiments.
This paper considers a natural fault-tolerant shortest paths problem: for some constant integer $f$, given a directed weighted graph with no negative cycles and two fixed vertices $s$ and $t$, compute (either explicitly or implicitly) for every tuple of $f$ edges, the distance from $s$ to $t$ if these edges fail. We call this problem $f$-Fault Replacement Paths ($f$FRP). We first present an $\tilde{O}(n^3)$ time algorithm for $2$FRP in $n$-vertex directed graphs with arbitrary edge weights and no negative cycles. As $2$FRP is a generalization of the well-studied Replacement Paths problem (RP) that asks for the distances between $s$ and $t$ for any single edge failure, $2$FRP is at least as hard as RP. Since RP in graphs with arbitrary weights is equivalent in a fine-grained sense to All-Pairs Shortest Paths (APSP) [Vassilevska Williams and Williams FOCS'10, J.~ACM'18], $2$FRP is at least as hard as APSP, and thus a substantially subcubic time algorithm in the number of vertices for $2$FRP would be a breakthrough. Therefore, our algorithm in $\tilde{O}(n^3)$ time is conditionally nearly optimal. Our algorithm implies an $\tilde{O}(n^{f+1})$ time algorithm for the $f$FRP problem, giving the first improvement over the straightforward $O(n^{f+2})$ time algorithm. Then we focus on the restriction of $2$FRP to graphs with small integer weights bounded by $M$ in absolute values. Using fast rectangular matrix multiplication, we obtain a randomized algorithm that runs in $\tilde{O}(M^{2/3}n^{2.9153})$ time. This implies an improvement over our $\tilde{O}(n^{f+1})$ time arbitrary weight algorithm for all $f>1$. We also present a data structure variant of the algorithm that can trade off pre-processing and query time. In addition to the algebraic algorithms, we also give an $n^{8/3-o(1)}$ conditional lower bound for combinatorial $2$FRP algorithms in directed unweighted graphs.