The recently introduced online Minimum Peak Appointment Scheduling (MPAS) problem is a variant of the online bin-packing problem that allows for deferred decision making. Specifically, it allows for the problem to be split into an online phase where a stream of appointment requests arrive requiring a scheduled time, followed by an offline phase where those appointments are scheduled into rooms. Similar to the bin-packing problem, the aim is to use the minimum number of rooms in the final configuration. This model more accurately captures scheduling appointments than bin packing. For example, a dialysis patient needs to know what time to arrive for an appointment, but does not need to know the assigned station ahead of time. Previous work developed a randomized algorithm for this problem which achieved an asymptotic competitive ratio of at most 1.5, proving that online MPAS was fundamentally different from the online bin-packing problem. Our main contribution is to develop a new randomized algorithm for the problem that achieves an asymptotic competitive ratio under 1.455, indicating the potential for further progress. This improvement is attained by modifying the process for scheduling appointments to increase the density of the packing in the worst case, along with utilizing the dual of the bin-packing linear programming relaxation to perform the analysis. We also present the first known lower bound of 1.2 on the asymptotic competitive ratio of both deterministic and randomized online MPAS algorithm. These results demonstrate how deferred decision-making can be leveraged to yield improved worst-case performance, a phenomenon which should be investigated in a broader class of settings.
In some preference aggregation scenarios, voters' preferences are highly structured: e.g., the set of candidates may have one-dimensional structure (so that voters' preferences are single-peaked) or be described by a binary decision tree (so that voters' preferences are group-separable). However, sometimes a single axis or a decision tree is insufficient to capture the voters' preferences; rather, there is a small number $k$ of axes or decision trees such that each vote in the profile is consistent with one of these axes (resp., trees). In this work, we study the complexity of deciding whether voters' preferences can be explained in this manner. For $k=2$, we use the technique developed by Yang~[2020] in the context of single-peaked preferences to obtain a polynomial-time algorithm for several domains: value-restricted preferences, group-separable preferences, and a natural subdomain of group-separable preferences, namely, caterpillar group-separable preferences. For $k\ge 3$, the problem is known to be hard for single-peaked preferences; we show that this is also the case for value-restricted and group-separable preferences. Our positive results for $k=2$ make use of forbidden minor characterizations of the respective domains; in particular, we establish that the domain of caterpillar group-separable preferences admits a forbidden minor characterization.
Modularity is a central principle throughout the design process for cyber-physical systems. Modularity reduces complexity and increases reuse of behavior. In this paper we pose and answer the following question: how can we identify independent `modules' within the structure of reactive control architectures? To this end, we propose a graph-structured control architecture we call a decision structure, and show how it generalises some reactive control architectures which are popular in Artificial Intelligence (AI) and robotics, specifically Teleo-Reactive programs (TRs), Decision Trees (DTs), Behavior Trees (BTs) and Generalised Behavior Trees ($k$-BTs). Inspired by the definition of a module in graph theory, we define modules in decision structures and show how each decision structure possesses a canonical decomposition into its modules. We can naturally characterise each of the BTs, $k$-BTs, DTs and TRs by properties of their module decomposition. This allows us to recognise which decision structures are equivalent to each of these architectures in quadratic time. Our proposed concept of modules extends to formal verification, under any verification scheme capable of verifying a decision structure. Namely, we prove that a modification to a module within a decision structure has no greater flow-on effects than a modification to an individual action within that structure. This enables verification on modules to be done locally and hierarchically, where structures can be verified and then repeatedly locally modified, with modules replaced by modules while preserving correctness. To illustrate the findings, we present an example of a solar-powered drone controlled by a decision structure. We use a Linear Temporal Logic-based verification scheme to verify the correctness of this structure, and then show how one can modify modules while preserving its correctness.
We consider the problem of efficiently scheduling jobs with precedence constraints on a set of identical machines in the presence of a uniform communication delay. Such precedence-constrained jobs can be modeled as a directed acyclic graph, $G = (V, E)$. In this setting, if two precedence-constrained jobs $u$ and $v$, with $v$ dependent on $u$ ($u \prec v$), are scheduled on different machines, then $v$ must start at least $\rho$ time units after $u$ completes. The scheduling objective is to minimize makespan, i.e. the total time from when the first job starts to when the last job finishes. The focus of this paper is to provide an efficient approximation algorithm with near-linear running time. We build on the algorithm of Lepere and Rapine [STACS 2002] for this problem to give an $O\left(\frac{\ln \rho}{\ln \ln \rho} \right)$-approximation algorithm that runs in $\tilde{O}(|V| + |E|)$ time.
We introduce an online convex optimization algorithm using projected subgradient descent with optimal adaptive learning rates, with sequential and efficient first-order updates. Our method provides a subgradient adaptive minimax optimal dynamic regret guarantee for a sequence of general convex functions with no known additional properties such as strong-convexity, smoothness, exp-concavity or even Lipschitz-continuity. The guarantee is against any comparator decision sequence with bounded "complexity", defined by the cumulative distance traveled via changes between successive decisions. We show optimality by generating a lower bound of the worst-case second-order dynamic regret, which incorporates actual subgradient norms and matches with our guarantees within a constant factor. We also derive the extension for independent learning in each decision coordinate separately. Additionally, we demonstrate how to best preserve our guarantees when the bound on total successive changes in the dynamic comparator sequence grows in time or the feedback regarding such bound arrives partially with time, both in a truly online manner. Then, as a major contribution, we examine the scenario when we receive no information regarding the successive changes, but instead, by a unique re-purposing of the expert mixture framework with novel additions, we eliminate the need of such information in, again, a truly online manner. Moreover, we show the ability to compete against all dynamic comparator sequences simultaneously (universally) with minimax optimality, where the guarantees depend on the "complexity" of each comparator separately. We also discuss potential modifications to our approach which addresses further complexity reductions for time, computation, memory, and we also further the universal competitiveness via guarantees taking into account concentrations of a comparator sequence in the decision set.
We consider a subclass of $n$-player stochastic games, in which players have their own internal state/action spaces while they are coupled through their payoff functions. It is assumed that players' internal chains are driven by independent transition probabilities. Moreover, players can only receive realizations of their payoffs but not the actual functions, nor can they observe each others' states/actions. Under some assumptions on the structure of the payoff functions, we develop efficient learning algorithms based on Dual Averaging and Dual Mirror Descent, which provably converge almost surely or in expectation to the set of $\epsilon$-Nash equilibrium policies. In particular, we derive upper bounds on the number of iterates that scale polynomially in terms of the game parameters to achieve an $\epsilon$-Nash equilibrium policy. Besides Markov potential games and linear-quadratic stochastic games, this work provides another interesting subclass of $n$-player stochastic games that under some assumption provably admit polynomial-time learning algorithm for finding their $\epsilon$-Nash equilibrium policies.
Restless and collapsing bandits are often used to model budget-constrained resource allocation in settings where arms have action-dependent transition probabilities, such as the allocation of health interventions among patients. However, state-of-the-art Whittle-index-based approaches to this planning problem either do not consider fairness among arms, or incentivize fairness without guaranteeing it. We thus introduce ProbFair, a probabilistically fair policy that maximizes total expected reward and satisfies the budget constraint while ensuring a strictly positive lower bound on the probability of being pulled at each timestep. We evaluate our algorithm on a real-world application, where interventions support continuous positive airway pressure (CPAP) therapy adherence among patients, as well as on a broader class of synthetic transition matrices. We find that ProbFair preserves utility while providing fairness guarantees.
Purpose: This is an attempt to better bridge the gap between the mathematical and the engineering/physical aspects of the topic. We trace the different sources of non-convexification in the context of topology optimization problems starting from domain discretization, passing through penalization for discreteness and effects of filtering methods, and end with a note on continuation methods. Design/Methodology/Approach: Starting from the global optimum of the compliance minimization problem, we employ analytical tools to investigate how intermediate density penalization affects the convexity of the problem, the potential penalization-like effects of various filtering techniques, how continuation methods can be used to approach the global optimum, and how the initial guess has some weight in determining the final optimum. Findings: The non-convexification effects of the penalization of intermediate density elements simply overshadows any other type of non-convexification introduced into the problem, mainly due to its severity and locality. Continuation methods are strongly recommended to overcome the problem of local minima, albeit its step and convergence criteria are left to the user depending on the type of application. Originality/Value: In this article, we present a comprehensive treatment of the sources of non-convexity in density-based topology optimization problems, with a focus on linear elastic compliance minimization. We put special emphasis on the potential penalization-like effects of various filtering techniques through a detailed mathematical treatment.
We propose throughput and cost optimal job scheduling algorithms in cloud computing platforms offering Infrastructure as a Service. We first consider online migration and propose job scheduling algorithms to minimize job migration and server running costs. We consider algorithms that assume knowledge of job-size on arrival of jobs. We characterize the optimal cost subject to system stability. We develop a drift-plus-penalty framework based algorithm that can achieve optimal cost arbitrarily closely. Specifically this algorithm yields a trade-off between delay and costs. We then relax the job-size knowledge assumption and give an algorithm that uses readily offered service to the jobs. We show that this algorithm gives order-wise identical cost as the job size based algorithm. Later, we consider offline job migration that incurs migration delays. We again present throughput optimal algorithms that minimize server running cost. We illustrate the performance of the proposed algorithms and compare these to the existing algorithms via simulation.
We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.
We propose accelerated randomized coordinate descent algorithms for stochastic optimization and online learning. Our algorithms have significantly less per-iteration complexity than the known accelerated gradient algorithms. The proposed algorithms for online learning have better regret performance than the known randomized online coordinate descent algorithms. Furthermore, the proposed algorithms for stochastic optimization exhibit as good convergence rates as the best known randomized coordinate descent algorithms. We also show simulation results to demonstrate performance of the proposed algorithms.