We consider the problem of dividing limited resources to individuals arriving over $T$ rounds. Each round has a random number of individuals arrive, and individuals can be characterized by their type (i.e. preferences over the different resources). A standard notion of `fairness' in this setting is that an allocation simultaneously satisfy envy-freeness and efficiency. The former is an individual guarantee, requiring that each agent prefers their own allocation over the allocation of any other; in contrast, efficiency is a global property, requiring that the allocations clear the available resources. For divisible resources, when the number of individuals of each type are known upfront, the above desiderata are simultaneously achievable for a large class of utility functions. However, in an online setting when the number of individuals of each type are only revealed round by round, no policy can guarantee these desiderata simultaneously, and hence the best one can do is to try and allocate so as to approximately satisfy the two properties. We show that in the online setting, the two desired properties (envy-freeness and efficiency) are in direct contention, in that any algorithm achieving additive counterfactual envy-freeness up to a factor of $L_T$ necessarily suffers a efficiency loss of at least $1 / L_T$. We complement this uncertainty principle with a simple algorithm, HopeGuardrail, which allocates resources based on an adaptive threshold policy and is able to achieve any fairness-efficiency point on this frontier. In simulation results, our algorithm provides allocations close to the optimal fair solution in hindsight, motivating its use in practical applications as the algorithm is able to adapt to any desired fairness efficiency trade-off.
Cell-free Massive MIMO systems consist of a large number of geographically distributed access points (APs) that serve users by coherent joint transmission. Downlink power allocation is important in these systems, to determine which APs should transmit to which users and with what power. If the system is implemented correctly, it can deliver a more uniform user performance than conventional cellular networks. To this end, previous works have shown how to perform system-wide max-min fairness power allocation when using maximum ratio precoding. In this paper, we first generalize this method to arbitrary precoding, and then train a neural network to perform approximately the same power allocation but with reduced computational complexity. Finally, we train one neural network per AP to mimic system-wide max-min fairness power allocation, but using only local information. By learning the structure of the local propagation environment, this method outperforms the state-of-the-art distributed power allocation method from the Cell-free Massive MIMO literature.
Given many popular functional forms for the Lorenz curve do not have a closed-form expression for the Gini index and no study has utilized the observed Gini index to estimate parameter(s) associated with the corresponding parametric functional form, a simple method for estimating the Lorenz curve is introduced. It utilizes 3 indicators, namely, the Gini index and the income shares of the bottom and the top in order to calculate the values of parameters associated with the specified functional form which has a closed-form expression for the Gini index. No error minimization technique is required in order to estimate the Lorenz curve. The data on the Gini index and the income shares of 4 countries that have different level of income inequality, economic, sociological, and regional backgrounds from the United Nations University-World Income Inequality Database are used to illustrate how the simple method works. The overall results indicate that the estimated Lorenz curves fit the actual observations practically well. This simple method could be useful in the situation where the availability of data on income distribution is low. However, if more data on income distribution are available, this study shows that the specified functional form could be used to directly estimate the Lorenz curve. Moreover, the estimated values of the Gini index calculated based on the specified functional form are virtually identical to their actual observations.
Direct policy search serves as one of the workhorses in modern reinforcement learning (RL), and its applications in continuous control tasks have recently attracted increasing attention. In this work, we investigate the convergence theory of policy gradient (PG) methods for learning the linear risk-sensitive and robust controller. In particular, we develop PG methods that can be implemented in a derivative-free fashion by sampling system trajectories, and establish both global convergence and sample complexity results in the solutions of two fundamental settings in risk-sensitive and robust control: the finite-horizon linear exponential quadratic Gaussian, and the finite-horizon linear-quadratic disturbance attenuation problems. As a by-product, our results also provide the first sample complexity for the global convergence of PG methods on solving zero-sum linear-quadratic dynamic games, a nonconvex-nonconcave minimax optimization problem that serves as a baseline setting in multi-agent reinforcement learning (MARL) with continuous spaces. One feature of our algorithms is that during the learning phase, a certain level of robustness/risk-sensitivity of the controller is preserved, which we termed as the implicit regularization property, and is an essential requirement in safety-critical control systems.
This paper presents a control-theoretic framework which stably combines optimal feedback policies with online learning for control of uncertain nonlinear systems. Given unknown parameters within a bounded range, the resulting adaptive control laws guarantee convergence of the closed-loop system to the state of zero cost. The proposed framework is able to employ the certainty equivalence principle when designing optimal policies and value functions by online adjustment of the learning rate - a mechanism needed to guarantee stable learning and control. The approach is demonstrated on the familiar mountain car problem, where it is shown to yield near-optimal behavior despite the presence of parametric uncertainty.
We study a fair resource scheduling problem, where a set of interval jobs are to be allocated to heterogeneous machines controlled by agents. Each job is associated with release time, deadline, and processing time such that it can be processed if its complete processing period is between its release time and deadline. The machines gain possibly different utilities by processing different jobs, and all jobs assigned to the same machine should be processed without overlap. We consider two widely studied solution concepts, namely, maximin share fairness and envy-freeness. For both criteria, we discuss the extent to which fair allocations exist and present constant approximation algorithms for various settings.
This paper studies optimal motion planning subject to motion and environment uncertainties. By modeling the system as a probabilistic labeled Markov decision process (PL-MDP), the control objective is to synthesize a finite-memory policy, under which the agent satisfies complex high-level tasks expressed as linear temporal logic (LTL) with desired satisfaction probability. In particular, the cost optimization of the trajectory that satisfies infinite horizon tasks is considered, and the trade-off between reducing the expected mean cost and maximizing the probability of task satisfaction is analyzed. Instead of using traditional Rabin automata, the LTL formulas are converted to limit-deterministic B\"uchi automata (LDBA) with a reachability acceptance condition and a compact graph structure. The novelty of this work lies in considering the cases where LTL specifications can be potentially infeasible and developing a relaxed product MDP between PL-MDP and LDBA. The relaxed product MDP allows the agent to revise its motion plan whenever the task is not fully feasible and quantify the revised plan's violation measurement. A multi-objective optimization problem is then formulated to jointly consider the probability of task satisfaction, the violation with respect to original task constraints, and the implementation cost of the policy execution. The formulated problem can be solved via coupled linear programs. To the best of our knowledge, this work first bridges the gap between probabilistic planning revision of potential infeasible LTL specifications and optimal control synthesis of both plan prefix and plan suffix of the trajectory over the infinite horizons. Experimental results are provided to demonstrate the effectiveness of the proposed framework.
The one-fifth success rule is one of the best-known and most widely accepted techniques to control the parameters of evolutionary algorithms. While it is often applied in the literal sense, a common interpretation sees the one-fifth success rule as a family of success-based updated rules that are determined by an update strength $F$ and a success rate. We analyze in this work how the performance of the (1+1) Evolutionary Algorithm on LeadingOnes depends on these two hyper-parameters. Our main result shows that the best performance is obtained for small update strengths $F=1+o(1)$ and success rate $1/e$. We also prove that the running time obtained by this parameter setting is, apart from lower order terms, the same that is achieved with the best fitness-dependent mutation rate. We show similar results for the resampling variant of the (1+1) Evolutionary Algorithm, which enforces to flip at least one bit per iteration.
Motivated by many interesting real-world applications in logistics and online advertising, we consider an online allocation problem subject to lower and upper resource constraints, where the requests arrive sequentially, sampled i.i.d. from an unknown distribution, and we need to promptly make a decision given limited resources and lower bounds requirements. First, with knowledge of the measure of feasibility, i.e., $\alpha$, we propose a new algorithm that obtains $1-O(\frac{\epsilon}{\alpha-\epsilon})$ -competitive ratio for the offline problems that know the entire requests ahead of time. Inspired by the previous studies, this algorithm adopts an innovative technique to dynamically update a threshold price vector for making decisions. Moreover, an optimization method to estimate the optimal measure of feasibility is proposed with theoretical guarantee at the end of this paper. Based on this method, if we tolerate slight violation of the lower bounds constraints with parameter $\eta$, the proposed algorithm is naturally extended to the settings without strong feasible assumption, which cover the significantly unexplored infeasible scenarios.
We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate $\widetilde{O}(B_{\star} \sqrt{S A K})$, where $K$ is the number of episodes, $S$ is the number of states, $A$ is the number of actions and $B_{\star}$ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of $B_{\star}$, nor of $T_{\star}$ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of $T_{\star}$ is available) where the regret only contains a logarithmic dependence on $T_{\star}$, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.