There is an extensive literature in social choice theory studying the consequences of weakening the assumptions of Arrow's Impossibility Theorem. Much of this literature suggests that there is no escape from Arrow-style impossibility theorems unless one drastically violates the Independence of Irrelevant Alternatives (IIA). In this paper, we present a more positive outlook. We propose a model of comparing candidates in elections, which we call the Advantage-Standard (AS) model. The requirement that a collective choice rule (CCR) be rationalizable by the AS model is in the spirit of but weaker than IIA; yet it is stronger than what is known in the literature as weak IIA (two profiles alike on x, y cannot have opposite strict social preferences on x and y). In addition to motivating violations of IIA, the AS model makes intelligible violations of another Arrovian assumption: the negative transitivity of the strict social preference relation P. While previous literature shows that only weakening IIA to weak IIA or only weakening negative transitivity of P to acyclicity still leads to impossibility theorems, we show that jointly weakening IIA to AS rationalizability and weakening negative transitivity of P leads to no such impossibility theorems. Indeed, we show that several appealing CCRs are AS rationalizable, including even transitive CCRs.
In recent years, the Graph Model has become increasingly popular, especially in the application domain of social networks. The model has been semantically augmented with properties and labels attached to the graph elements. It is difficult to ensure data quality for the properties and the data structure because the model does not need a schema. In this paper, we propose a schema bound Typed Graph Model with properties and labels. These enhancements improve not only data quality but also the quality of graph analysis. The power of this model is provided by using hyper-nodes and hyper-edges, which allows to present a data structure on different abstraction levels. We demonstrate by example the superiority of this model over the property graph data model of Hidders and other prevalent data models, namely the relational, object-oriented, and XML model.
We consider iterative semi-supervised learning (SSL) algorithms that iteratively generate pseudo-labels for a large amount unlabelled data to progressively refine the model parameters. In particular, we seek to understand the behaviour of the {\em generalization error} of iterative SSL algorithms using information-theoretic principles. To obtain bounds that are amenable to numerical evaluation, we first work with a simple model -- namely, the binary Gaussian mixture model. Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates. The theoretical results on the simple model are corroborated by extensive experiments on several benchmark datasets such as the MNIST and CIFAR datasets in which we notice that the generalization error improves after several pseudo-labelling iterations, but saturates afterwards.
We review the rapidly growing literature on auxiliary information-based (AIB) process monitoring methods. Under this approach, there is an assumption that the auxiliary variable, which is correlated with the quality variable of interest, has a known mean, or some other parameter, which cannot change over time. We demonstrate that violations of this assumption can have serious adverse effects both when the process is stable and when there has been a process shift. Some process shifts can become undetectable. We also show that the basic AIB approach is a special case of simple linear regression profile monitoring. The AIB charting techniques require strong assumptions. Based on our results, we warn against the use of AIB approach in quality control applications.
In software engineering, a great number of new approaches are being actively researched, and a lot of tools are being developed based on them. These tools require a framework for their creation and an opportunity to be used by potential developers. Modern IDEs provide both. In this paper, we describe the main capabilities of the IntelliJ Platform that could be useful for researchers that are developing code analysis tools. To illustrate the benefits of using the platform, we describe several use cases that researchers might be interested in: mining software data, running machine learning models on code, recommending refactorings, and visualizing data in the IDE. We provide several examples of existing plugins that implement these cases. Finally, to make it easier to start working with the platform, we develop and provide simple plugins for each use case that could serve as a template for a new project.
In this paper, I show how neural networks can be used to simultaneously estimate all unknown parameters in a spatial point process model from an observed point pattern. The method can be applied to any point process model which it is possible to simulate from. Through a simulation study, I conclude that the method recovers parameters well and in some situations provide better estimates than the most commonly used methods. I also illustrate how the method can be used on a real data example.
The recovery of signals that are sparse not in a basis, but rather sparse with respect to an over-complete dictionary is one of the most flexible settings in the field of compressed sensing with numerous applications. As in the standard compressed sensing setting, it is possible that the signal can be reconstructed efficiently from few, linear measurements, for example by the so-called $\ell_1$-synthesis method. However, it has been less well-understood which measurement matrices provably work for this setting. Whereas in the standard setting, it has been shown that even certain heavy-tailed measurement matrices can be used in the same sample complexity regime as Gaussian matrices, comparable results are only available for the restrictive class of sub-Gaussian measurement vectors as far as the recovery of dictionary-sparse signals via $\ell_1$-synthesis is concerned. In this work, we fill this gap and establish optimal guarantees for the recovery of vectors that are (approximately) sparse with respect to a dictionary via the $\ell_1$-synthesis method from linear, potentially noisy measurements for a large class of random measurement matrices. In particular, we show that random measurements that fulfill only a small-ball assumption and a weak moment assumption, such as random vectors with i.i.d. Student-$t$ entries with a logarithmic number of degrees of freedom, lead to comparable guarantees as (sub-)Gaussian measurements. As a technical tool, we show a bound on the expectation of the sum of squared order statistics under very general assumptions, which might be of independent interest. As a corollary of our results, we also obtain a slight improvement on the weakest assumption on a measurement matrix with i.i.d. rows sufficient for uniform recovery in standard compressed sensing, improving on results by Lecu\'e and Mendelson and Dirksen, Lecu\'e and Rauhut.
The policy gradient theorem states that the policy should only be updated in states that are visited by the current policy, which leads to insufficient planning in the off-policy states, and thus to convergence to suboptimal policies. We tackle this planning issue by extending the policy gradient theory to policy updates with respect to any state density. Under these generalized policy updates, we show convergence to optimality under a necessary and sufficient condition on the updates' state densities, and thereby solve the aforementioned planning issue. We also prove asymptotic convergence rates that significantly improve those in the policy gradient literature. To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (JH), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores. JH's independent policies allow to record two separate replay buffers: one on-policy (Dr Jekyll's) and one off-policy (Mr Hyde's), and therefore to update JH's models with a mixture of on-policy and off-policy updates. More than an algorithm, JH defines principles for actor-critic algorithms to satisfy the requirements we identify in our analysis. We extensively test on finite MDPs where JH demonstrates a superior ability to recover from converging to a suboptimal policy without impairing its speed of convergence. We also implement a deep version of the algorithm and test it on a simple problem where it shows promising results.
Deep models trained in supervised mode have achieved remarkable success on a variety of tasks. When labeled samples are limited, self-supervised learning (SSL) is emerging as a new paradigm for making use of large amounts of unlabeled samples. SSL has achieved promising performance on natural language and image learning tasks. Recently, there is a trend to extend such success to graph data using graph neural networks (GNNs). In this survey, we provide a unified review of different ways of training GNNs using SSL. Specifically, we categorize SSL methods into contrastive and predictive models. In either category, we provide a unified framework for methods as well as how these methods differ in each component under the framework. Our unified treatment of SSL methods for GNNs sheds light on the similarities and differences of various methods, setting the stage for developing new methods and algorithms. We also summarize different SSL settings and the corresponding datasets used in each setting. To facilitate methodological development and empirical comparison, we develop a standardized testbed for SSL in GNNs, including implementations of common baseline methods, datasets, and evaluation metrics.
Knowledge distillation is a strategy of training a student network with guide of the soft output from a teacher network. It has been a successful method of model compression and knowledge transfer. However, currently knowledge distillation lacks a convincing theoretical understanding. On the other hand, recent finding on neural tangent kernel enables us to approximate a wide neural network with a linear model of the network's random features. In this paper, we theoretically analyze the knowledge distillation of a wide neural network. First we provide a transfer risk bound for the linearized model of the network. Then we propose a metric of the task's training difficulty, called data inefficiency. Based on this metric, we show that for a perfect teacher, a high ratio of teacher's soft labels can be beneficial. Finally, for the case of imperfect teacher, we find that hard labels can correct teacher's wrong prediction, which explains the practice of mixing hard and soft labels.
In this paper, we develop the continuous time dynamic topic model (cDTM). The cDTM is a dynamic topic model that uses Brownian motion to model the latent topics through a sequential collection of documents, where a "topic" is a pattern of word use that we expect to evolve over the course of the collection. We derive an efficient variational approximate inference algorithm that takes advantage of the sparsity of observations in text, a property that lets us easily handle many time points. In contrast to the cDTM, the original discrete-time dynamic topic model (dDTM) requires that time be discretized. Moreover, the complexity of variational inference for the dDTM grows quickly as time granularity increases, a drawback which limits fine-grained discretization. We demonstrate the cDTM on two news corpora, reporting both predictive perplexity and the novel task of time stamp prediction.