We consider the problem of maximizing the Nash social welfare when allocating a set $\mathcal{G}$ of indivisible goods to a set $\mathcal{N}$ of agents. We study instances, in which all agents have 2-value additive valuations: The value of every agent $i \in \mathcal{N}$ for every good $j \in \mathcal{G}$ is $v_{ij} \in \{p,q\}$, for $p,q \in \mathbb{N}$, $p \le q$. Maybe surprisingly, we design an algorithm to compute an optimal allocation in polynomial time if $p$ divides $q$, i.e., when $p=1$ and $q \in \mathbb{N}$ after appropriate scaling. The problem is \classNP-hard whenever $p$ and $q$ are coprime and $p \ge 3$. In terms of approximation, we present positive and negative results for general $p$ and $q$. We show that our algorithm obtains an approximation ratio of at most 1.0345. Moreover, we prove that the problem is \classAPX-hard, with a lower bound of $1.000015$ achieved at $p/q = 4/5$.
In recent years, researchers have made significant progress in devising reinforcement-learning algorithms for optimizing linear temporal logic (LTL) objectives and LTL-like objectives. Despite these advancements, there are fundamental limitations to how well this problem can be solved that previous studies have alluded to but, to our knowledge, have not examined in depth. In this paper, we address theoretically the hardness of learning with general LTL objectives. We formalize the problem under the probably approximately correct learning in Markov decision processes (PAC-MDP) framework, a standard framework for measuring sample complexity in reinforcement learning. In this formalization, we prove that the optimal policy for any LTL formula is PAC-MDP-learnable only if the formula is in the most limited class in the LTL hierarchy, consisting of only finite-horizon-decidable properties. Practically, our result implies that it is impossible for a reinforcement-learning algorithm to obtain a PAC-MDP guarantee on the performance of its learned policy after finitely many interactions with an unconstrained environment for non-finite-horizon-decidable LTL objectives.
The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative bandit learning under three typical real-world communication scenarios, namely, (a) message-passing over stochastic time-varying networks, (b) instantaneous reward-sharing over a network with random delays, and (c) message-passing with adversarially corrupted rewards, including byzantine communication. For each of these environments, we propose decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret as well. Furthermore, in the setting with perfect communication, we present an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies. Finally, we present tight network-dependent minimax lower bounds on the group regret. Our proposed algorithms are straightforward to implement and obtain competitive empirical performance.
Many problems in signal processing and machine learning can be formalized as weak submodular optimization tasks. For such problems, a simple greedy algorithm (\textsc{Greedy}) is guaranteed to find a solution achieving the objective with a value no worse than $1-e^{-1/c}$ of the optimal, where $c$ is the multiplicative weak-submodularity constant. Due to the high cost of querying large-scale systems, the complexity of \textsc{Greedy} becomes prohibitive in contemporary applications. In this work, we study the tradeoff between performance and complexity when one resorts to random sampling strategies to reduce the query complexity of \textsc{Greedy}. Specifically, we quantify the effect of uniform sampling strategies on \textsc{Greedy}'s performance through two metrics: (i) probability of identifying an optimal subset, and (ii) suboptimality with respect to the optimal solution. The latter implies that uniform sampling strategies with a fixed sampling size achieve a non-trivial approximation factor; however, we show that with overwhelming probability, these methods fail to find the optimal subset. Our analysis shows that the failure of uniform sampling strategies with fixed sample size can be circumvented by successively increasing the size of the search space. Building upon this insight, we propose a simple progressive stochastic greedy algorithm and study its approximation guarantees. Moreover, we demonstrate effectiveness of the proposed method in dimensionality reduction applications and feature selection tasks for clustering and object tracking.
Imitation learning (IL) is a framework that learns to imitate expert behavior from demonstrations. Recently, IL shows promising results on high dimensional and control tasks. However, IL typically suffers from sample inefficiency in terms of environment interaction, which severely limits their application to simulated domains. In industrial applications, learner usually have a high interaction cost, the more interactions with environment, the more damage it causes to the environment and the learner itself. In this article, we make an effort to improve sample efficiency by introducing a novel scheme of inverse reinforcement learning. Our method, which we call \textit{Model Reward Function Based Imitation Learning} (MRFIL), uses an ensemble dynamic model as a reward function, what is trained with expert demonstrations. The key idea is to provide the agent with an incentive to match the demonstrations over a long horizon, by providing a positive reward upon encountering states in line with the expert demonstration distribution. In addition, we demonstrate the convergence guarantee for new objective function. Experimental results show that our algorithm reaches the competitive performance and significantly reducing the environment interactions compared to IL methods.
We study the computational complexity of finding fair allocations of indivisible goods in the setting where a social network on the agents is given. Notions of fairness in this context are "localized", that is, agents are only concerned about the bundles allocated to their neighbors, rather than every other agent in the system. We comprehensively address the computational complexity of finding locally envy-free and Pareto efficient allocations in the setting where the agents have binary valuations for the goods and the underlying social network is modeled by an undirected graph. We study the problem in the framework of parameterized complexity. We show that the problem is computationally intractable even in fairly restricted scenarios, for instance, even when the underlying graph is a path. We show NP-hardness for settings where the graph has only two distinct valuations among the agents. We demonstrate W-hardness with respect to the number of goods or the size of the vertex cover of the underlying graph. We also consider notions of proportionality that respect the structure of the underlying graph and show that two natural versions of this notion have different complexities: allocating according to the notion that accounts for locality to the greatest degree turns out to be computationally intractable, while for other notions, the allocation problem can be modeled as a structured ILP which can be solved efficiently.
Rankings, especially those in search and recommendation systems, often determine how people access information and how information is exposed to people. Therefore, how to balance the relevance and fairness of information exposure is considered as one of the key problems for modern IR systems. As conventional ranking frameworks that myopically sorts documents with their relevance will inevitably introduce unfair result exposure, recent studies on ranking fairness mostly focus on dynamic ranking paradigms where result rankings can be adapted in real-time to support fairness in groups (i.e., races, genders, etc.). Existing studies on fairness in dynamic learning to rank, however, often achieve the overall fairness of document exposure in ranked lists by significantly sacrificing the performance of result relevance and fairness on the top results. To address this problem, we propose a fair and unbiased ranking method named Maximal Marginal Fairness (MMF). The algorithm integrates unbiased estimators for both relevance and merit-based fairness while providing an explicit controller that balances the selection of documents to maximize the marginal relevance and fairness in top-k results. Theoretical and empirical analysis shows that, with small compromises on long list fairness, our method achieves superior efficiency and effectiveness comparing to the state-of-the-art algorithms in both relevance and fairness for top-k rankings.
We investigate the problem of fair recommendation in the context of two-sided online platforms, comprising customers on one side and producers on the other. Traditionally, recommendation services in these platforms have focused on maximizing customer satisfaction by tailoring the results according to the personalized preferences of individual customers. However, our investigation reveals that such customer-centric design may lead to unfair distribution of exposure among the producers, which may adversely impact their well-being. On the other hand, a producer-centric design might become unfair to the customers. Thus, we consider fairness issues that span both customers and producers. Our approach involves a novel mapping of the fair recommendation problem to a constrained version of the problem of fairly allocating indivisible goods. Our proposed FairRec algorithm guarantees at least Maximin Share (MMS) of exposure for most of the producers and Envy-Free up to One item (EF1) fairness for every customer. Extensive evaluations over multiple real-world datasets show the effectiveness of FairRec in ensuring two-sided fairness while incurring a marginal loss in the overall recommendation quality.
Personalized recommendation systems (RS) are extensively used in many services. Many of these are based on learning algorithms where the RS uses the recommendation history and the user response to learn an optimal strategy. Further, these algorithms are based on the assumption that the user interests are rigid. Specifically, they do not account for the effect of learning strategy on the evolution of the user interests. In this paper we develop influence models for a learning algorithm that is used to optimally recommend websites to web users. We adapt the model of \cite{Ioannidis10} to include an item-dependent reward to the RS from the suggestions that are accepted by the user. For this we first develop a static optimisation scheme when all the parameters are known. Next we develop a stochastic approximation based learning scheme for the RS to learn the optimal strategy when the user profiles are not known. Finally, we describe several user-influence models for the learning algorithm and analyze their effect on the steady user interests and on the steady state optimal strategy as compared to that when the users are not influenced.
This paper presents a safety-aware learning framework that employs an adaptive model learning method together with barrier certificates for systems with possibly nonstationary agent dynamics. To extract the dynamic structure of the model, we use a sparse optimization technique, and the resulting model will be used in combination with control barrier certificates which constrain feedback controllers only when safety is about to be violated. Under some mild assumptions, solutions to the constrained feedback-controller optimization are guaranteed to be globally optimal, and the monotonic improvement of a feedback controller is thus ensured. In addition, we reformulate the (action-)value function approximation to make any kernel-based nonlinear function estimation method applicable. We then employ a state-of-the-art kernel adaptive filtering technique for the (action-)value function approximation. The resulting framework is verified experimentally on a brushbot, whose dynamics is unknown and highly complex.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.