We study how the quality dimension affects the social optimum in a model of spatial differentiation where two facilities provide a public service. If quality enters linearly in the individuals' utility function, a symmetric configuration, in which both facilities have the same quality and serve groups of individuals of the same size, does not maximize the social welfare. This is a surprising result as all individuals are symmetrically identical having the same quality valuation. We also show that a symmetric configuration of facilities may maximize the social welfare if the individuals' marginal utility of quality is decreasing.
We consider a participatory budgeting problem in which each voter submits a proposal for how to divide a single divisible resource (such as money or time) among several possible alternatives (such as public projects or activities) and these proposals must be aggregated into a single aggregate division. Under $\ell_1$ preferences -- for which a voter's disutility is given by the $\ell_1$ distance between the aggregate division and the division he or she most prefers -- the social welfare-maximizing mechanism, which minimizes the average $\ell_1$ distance between the outcome and each voter's proposal, is incentive compatible (Goel et al. 2016). However, it fails to satisfy the natural fairness notion of proportionality, placing too much weight on majority preferences. Leveraging a connection between market prices and the generalized median rules of Moulin (1980), we introduce the independent markets mechanism, which is both incentive compatible and proportional. We unify the social welfare-maximizing mechanism and the independent markets mechanism by defining a broad class of moving phantom mechanisms that includes both. We show that every moving phantom mechanism is incentive compatible. Finally, we characterize the social welfare-maximizing mechanism as the unique Pareto-optimal mechanism in this class, suggesting an inherent tradeoff between Pareto optimality and proportionality.
Constrained tensor and matrix factorization models allow to extract interpretable patterns from multiway data. Therefore identifiability properties and efficient algorithms for constrained low-rank approximations are nowadays important research topics. This work deals with columns of factor matrices of a low-rank approximation being sparse in a known and possibly overcomplete basis, a model coined as Dictionary-based Low-Rank Approximation (DLRA). While earlier contributions focused on finding factor columns inside a dictionary of candidate columns, i.e. one-sparse approximations, this work is the first to tackle DLRA with sparsity larger than one. I propose to focus on the sparse-coding subproblem coined Mixed Sparse-Coding (MSC) that emerges when solving DLRA with an alternating optimization strategy. Several algorithms based on sparse-coding heuristics (greedy methods, convex relaxations) are provided to solve MSC. The performance of these heuristics is evaluated on simulated data. Then, I show how to adapt an efficient MSC solver based on the LASSO to compute Dictionary-based Matrix Factorization and Canonical Polyadic Decomposition in the context of hyperspectral image processing and chemometrics. These experiments suggest that DLRA extends the modeling capabilities of low-rank approximations, helps reducing estimation variance and enhances the identifiability and interpretability of estimated factors.
Online health communities provide a knowledge exchange platform for a wide range of diseases and health conditions. Informational and emotional support helps forum participants orient around health issues beyond in-person doctor visits. So far, little is known about the relation between the level of participation and participants' contributions in online health communities. To gain insights on the issue, we analyzed 456 posts in 56 threads from the Dermatology sub-forum of an online health community. While low participation threads (short threads) revolved around solving an individual's health issue through diagnosis suggestions and medical advice, participants in high participation threads (long threads) built collective knowledge and a sense of community, typically discussing chronic and rare conditions that medical professionals were unfamiliar with or could not treat effectively. Our results suggest that in short threads an individual's health issue is addressed, while in long threads, sub-communities about specific rare and chronic diseases emerge. This has implications for the user interface design of health forums, which could be developed to better support community building elements, even in short threads.
Deep learning models have shown great potential for image-based diagnosis assisting clinical decision making. At the same time, an increasing number of reports raise concerns about the potential risk that machine learning could amplify existing health disparities due to human biases that are embedded in the training data. It is of great importance to carefully investigate the extent to which biases may be reproduced or even amplified if we wish to build fair artificial intelligence systems. Seyyed-Kalantari et al. advance this conversation by analysing the performance of a disease classifier across population subgroups. They raise performance disparities related to underdiagnosis as a point of concern; we identify areas from this analysis which we believe deserve additional attention. Specifically, we wish to highlight some theoretical and practical difficulties associated with assessing model fairness through testing on data drawn from the same biased distribution as the training data, especially when the sources and amount of biases are unknown.
In the theory of linear switching systems with discrete time, as in other areas of mathematics, the problem of studying the growth rate of the norms of all possible matrix products $A_{\sigma_{n}}\cdots A_{\sigma_{0}}$ with factors from a set of matrices $\mathscr{A}$ arises. So far, only for a relatively small number of classes of matrices $\mathscr{A}$ has it been possible to accurately describe the sequences of matrices that guarantee the maximum rate of increase of the corresponding norms. Moreover, in almost all cases studied theoretically, the index sequences $\{\sigma_{n}\}$ of matrices maximizing the norms of the corresponding matrix products have been shown to be periodic or so-called Sturmian, which entails a whole set of "good" properties of the sequences $\{A_{\sigma_{n}}\}$, in particular the existence of a limiting frequency of occurrence of each matrix factor $A_{i}\in\mathscr{A}$ in them. In the paper it is shown that this is not always the case: a class of matrices is defined consisting of two $2\times 2$ matrices, similar to rotations in the plane, in which the sequence $\{A_{\sigma_{n}}\}$ maximizing the growth rate of the norms $\|A_{\sigma_{n}}\cdots A_{\sigma_{0}}\|$ is not Sturmian. All considerations are based on numerical modeling and cannot be considered mathematically rigorous in this part; rather, they should be interpreted as a set of questions for further comprehensive theoretical analysis.
Group testing can help maintain a widespread testing program using fewer resources amid a pandemic. In group testing, we are given $n$ samples, one per individual. These samples are arranged into $m < n$ pooled samples, where each pool is obtained by mixing a subset of the $n$ individual samples. Infected individuals are then identified using a group testing algorithm. In this paper, we use side information (SI) collected from contact tracing (CT) within nonadaptive/single-stage group testing algorithms. We generate CT SI data by incorporating characteristics of disease spread between individuals. These data are fed into two signal and measurement models for group testing, and numerical results show that our algorithms provide improved sensitivity and specificity. We also show how to incorporate CT SI into the design of the pooling matrix. That said, our numerical results suggest that the utilization of SI in the pooling matrix design based on the minimization of a weighted coherence measure does not yield significant performance gains beyond the incorporation of SI in the group testing algorithm.
To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.
BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT. Using a subset of GLUE tasks and a set of handcrafted features-of-interest, we propose the methodology and carry out a qualitative and quantitative analysis of the information encoded by the individual BERT's heads. Our findings suggest that there is a limited set of attention patterns that are repeated across different heads, indicating the overall model overparametrization. While different heads consistently use the same attention patterns, they have varying impact on performance across different tasks. We show that manually disabling attention in certain heads leads to a performance improvement over the regular fine-tuned BERT models.
In this paper we aim to answer questions based on images when provided with a dataset of question-answer pairs for a number of images during training. A number of methods have focused on solving this problem by using image based attention. This is done by focusing on a specific part of the image while answering the question. Humans also do so when solving this problem. However, the regions that the previous systems focus on are not correlated with the regions that humans focus on. The accuracy is limited due to this drawback. In this paper, we propose to solve this problem by using an exemplar based method. We obtain one or more supporting and opposing exemplars to obtain a differential attention region. This differential attention is closer to human attention than other image based attention methods. It also helps in obtaining improved accuracy when answering questions. The method is evaluated on challenging benchmark datasets. We perform better than other image based attention methods and are competitive with other state of the art methods that focus on both image and questions.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.