A ubiquitous learning problem in today's digital market is, during repeated interactions between a seller and a buyer, how a seller can gradually learn optimal pricing decisions based on the buyer's past purchase responses. A fundamental challenge of learning in such a strategic setup is that the buyer will naturally have incentives to manipulate his responses in order to induce more favorable learning outcomes for him. To understand the limits of the seller's learning when facing such a strategic and possibly manipulative buyer, we study a natural yet powerful buyer manipulation strategy. That is, before the pricing game starts, the buyer simply commits to "imitate" a different value function by pretending to always react optimally according to this imitative value function. We fully characterize the optimal imitative value function that the buyer should imitate as well as the resultant seller revenue and buyer surplus under this optimal buyer manipulation. Our characterizations reveal many useful insights about what happens at equilibrium. For example, a seller with concave production cost will obtain essentially 0 revenue at equilibrium whereas the revenue for a seller with convex production cost is the Bregman divergence of her cost function between no production and certain production. Finally, and importantly, we show that a more powerful class of pricing schemes does not necessarily increase, in fact, may be harmful to, the seller's revenue. Our results not only lead to an effective prescriptive way for buyers to manipulate learning algorithms but also shed lights on the limits of what a seller can really achieve when pricing in the dark.
Automated hyperparameter optimization (HPO) has gained great popularity and is an important ingredient of most automated machine learning frameworks. The process of designing HPO algorithms, however, is still an unsystematic and manual process: Limitations of prior work are identified and the improvements proposed are -- even though guided by expert knowledge -- still somewhat arbitrary. This rarely allows for gaining a holistic understanding of which algorithmic components are driving performance, and carries the risk of overlooking good algorithmic design choices. We present a principled approach to automated benchmark-driven algorithm design applied to multifidelity HPO (MF-HPO): First, we formalize a rich space of MF-HPO candidates that includes, but is not limited to common HPO algorithms, and then present a configurable framework covering this space. To find the best candidate automatically and systematically, we follow a programming-by-optimization approach and search over the space of algorithm candidates via Bayesian optimization. We challenge whether the found design choices are necessary or could be replaced by more naive and simpler ones by performing an ablation analysis. We observe that using a relatively simple configuration, in some ways simpler than established methods, performs very well as long as some critical configuration parameters have the right value.
In iterated games, a player can unilaterally exert influence over the outcome through a careful choice of strategy. A powerful class of such "payoff control" strategies was discovered by Press and Dyson in 2012. Their so-called "zero-determinant" (ZD) strategies allow a player to unilaterally enforce a linear relationship between both players' payoffs. It was subsequently shown that when the slope of this linear relationship is positive, ZD strategies are robustly effective against a selfishly optimizing co-player, in that all adapting paths of the selfish player lead to the maximal payoffs for both players (at least when there are certain restrictions on the game parameters). In this paper, we investigate the efficacy of selfish learning against a fixed player in more general settings, for both ZD and non-ZD strategies. We first prove that in any symmetric 2x2 game, the selfish player's final strategy must be of a certain form and cannot be fully stochastic. We then show that there are prisoner's dilemma interactions for which the robustness result does not hold when one player uses a fixed ZD strategy with positive slope. We give examples of selfish adapting paths that lead to locally but not globally optimal payoffs, undermining the robustness of payoff control strategies. For non-ZD strategies, these pathologies arise regardless of the original restrictions on the game parameters. Our results illuminate the difficulty of implementing robust payoff control and selfish optimization, even in the simplest context of playing against a fixed strategy.
It is well-known that an algorithm exists which approximates the NP-complete problem of Set Cover within a factor of ln(n), and it was recently proven that this approximation ratio is optimal unless P = NP. This optimality result is the product of many advances in characterizations of NP, in terms of interactive proof systems and probabilistically checkable proofs (PCP), and improvements to the analyses thereof. However, as a result, it is difficult to extract the development of Set Cover approximation bounds from the greater scope of proof system analysis. This paper attempts to present a chronological progression of results on lower-bounding the approximation ratio of Set Cover. We analyze a series of proofs of progressively better bounds and unify the results under similar terminologies and frameworks to provide an accurate comparison of proof techniques and their results. We also treat many preliminary results as black-boxes to better focus our analysis on the core reductions to Set Cover instances. The result is alternative versions of several hardness proofs, beginning with initial inapproximability results and culminating in a version of the proof that ln(n) is a tight lower bound.
Human-centered systems of systems such as social networks, Internet of Things, or healthcare systems are growingly becoming major facets of modern life. Realistic models of human behavior in such systems play a significant role in their accurate modeling and prediction. Yet, human behavior under uncertainty often violates the predictions by the conventional probabilistic models. Recently, quantum-like decision theories have shown a considerable potential to explain the contradictions in human behavior by applying quantum probability. But providing a quantum-like decision theory that could predict, rather than describe the current, state of human behavior is still one of the unsolved challenges. The main novelty of our approach is introducing an entangled Bayesian network inspired by the entanglement concept in quantum information theory, in which each human is a part of the entire society. Accordingly, society's effect on the dynamic evolution of the decision-making process, which is less often considered in decision theories, is modeled by the entanglement measures. The proposed predictive entangled quantum-like Bayesian network (PEQBN) is evaluated on 22 experimental tasks. Results confirm that PEQBN provides more realistic predictions of human decisions under uncertainty, when compared with classical Bayesian networks and three recent quantum-like approaches.
We propose a new nonparametric modeling framework for causal inference when outcomes depend on how agents are linked in a social or economic network. Such network interference describes a large literature on treatment spillovers, social interactions, social learning, information diffusion, disease and financial contagion, social capital formation, and more. Our approach works by first characterizing how an agent is linked in the network using the configuration of other agents and connections nearby as measured by path distance. The impact of a policy or treatment assignment is then learned by pooling outcome data across similarly configured agents. We demonstrate the approach by proposing an asymptotically valid test for the hypothesis of policy irrelevance/no treatment effects and bounding the mean-squared error of a k-nearest-neighbor estimator for the average or distributional policy effect/treatment response.
This paper studies optimal motion planning subject to motion and environment uncertainties. By modeling the system as a probabilistic labeled Markov decision process (PL-MDP), the control objective is to synthesize a finite-memory policy, under which the agent satisfies high-level complex tasks expressed as linear temporal logic (LTL) with desired satisfaction probability. In particular, the cost optimization of the trajectory that satisfies infinite-horizon tasks is considered, and the trade-off between reducing the expected mean cost and maximizing the probability of task satisfaction is analyzed. Instead of using traditional Rabin automata, the LTL formulas are converted to limit-deterministic B\"uchi automata (LDBA) with a more straightforward accepting condition and a more compact graph structure. The novelty of this work lies in the consideration of the cases that LTL specifications can be potentially infeasible and the development of a relaxed product MDP between PL-MDP and LDBA. The relaxed product MDP allows the agent to revise its motion plan whenever the task is not fully feasible and to quantify the violation measurement of the revised plan. A multi-objective optimization problem is then formulated to jointly consider the probability of the task satisfaction, the violation with respect to original task constraints, and the implementation cost of the policy execution, which is solved via coupled linear programs. To the best of our knowledge, it is the first work that bridges the gap between planning revision and optimal control synthesis of both plan prefix and plan suffix of the agent trajectory over the infinite horizon. Experimental results are provided to demonstrate the effectiveness of the proposed framework.
Training datasets for machine learning often have some form of missingness. For example, to learn a model for deciding whom to give a loan, the available training data includes individuals who were given a loan in the past, but not those who were not. This missingness, if ignored, nullifies any fairness guarantee of the training procedure when the model is deployed. Using causal graphs, we characterize the missingness mechanisms in different real-world scenarios. We show conditions under which various distributions, used in popular fairness algorithms, can or can not be recovered from the training data. Our theoretical results imply that many of these algorithms can not guarantee fairness in practice. Modeling missingness also helps to identify correct design principles for fair algorithms. For example, in multi-stage settings where decisions are made in multiple screening rounds, we use our framework to derive the minimal distributions required to design a fair algorithm. Our proposed algorithm decentralizes the decision-making process and still achieves similar performance to the optimal algorithm that requires centralization and non-recoverable distributions.
Predictions obtained by, e.g., artificial neural networks have a high accuracy but humans often perceive the models as black boxes. Insights about the decision making are mostly opaque for humans. Particularly understanding the decision making in highly sensitive areas such as healthcare or fifinance, is of paramount importance. The decision-making behind the black boxes requires it to be more transparent, accountable, and understandable for humans. This survey paper provides essential definitions, an overview of the different principles and methodologies of explainable Supervised Machine Learning (SML). We conduct a state-of-the-art survey that reviews past and recent explainable SML approaches and classifies them according to the introduced definitions. Finally, we illustrate principles by means of an explanatory case study and discuss important future directions.
We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate. We illustrate the benefits of the method in Lifelong RL experiments.
To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.