Instant runoff voting (IRV) is an increasingly-popular alternative to traditional plurality voting in which voters submit rankings over the candidates rather than individual votes. In practice, municipalities often restrict the ballot length, the number of candidates a voter is allowed to rank on their ballot. We theoretically and empirically analyze how ballot length can influence the outcome of an election, given fixed voter preferences. We show that there exist preference profiles over $k$ candidates such that up to $k-1$ different candidates win at different ballot lengths. We derive exact lower bounds on the number of voters required for such profiles and provide constructions matching these bounds. Additionally, we fully characterize which sequences of winners are possible over ballot lengths and provide explicit profile constructions achieving any feasible winner sequence. Finally, we analyze a collection of 168 real-world elections, where we truncate rankings to simulate shorter ballots. We find that shorter ballots could have changed the outcome in one quarter of these elections and that longer ballots can favor particular candidates. Our results highlight ballot length as a consequential degree of freedom in the design of IRV elections.
We introduce two new tools to assess the validity of statistical distributions. These tools are based on components derived from a new statistical quantity, the $comparison$ $curve$. The first tool is a graphical representation of these components on a $bar$ $plot$ (B plot), which can provide a detailed appraisal of the validity of the statistical model, in particular when supplemented by acceptance regions related to the model. The knowledge gained from this representation can sometimes suggest an existing $goodness$-$of$-$fit$ test to supplement this visual assessment with a control of the type I error. Otherwise, an adaptive test may be preferable and the second tool is the combination of these components to produce a powerful $\chi^2$-type goodness-of-fit test. Because the number of these components can be large, we introduce a new selection rule to decide, in a data driven fashion, on their proper number to take into consideration. In a simulation, our goodness-of-fit tests are seen to be powerwise competitive with the best solutions that have been recommended in the context of a fully specified model as well as when some parameters must be estimated. Practical examples show how to use these tools to derive principled information about where the model departs from the data.
In this work, we study the simple yet universally applicable case of reward shaping in value-based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of the linear transformation is equivalent to changing the initialization of the $Q$-function in function approximation. Based on such an equivalence, we bring the key insight that a positive reward shifting leads to conservative exploitation, while a negative reward shifting leads to curiosity-driven exploration. Accordingly, conservative exploitation improves offline RL value estimation, and optimistic value estimation improves exploration for online RL. We validate our insight on a range of RL tasks and show its improvement over baselines: (1) In offline RL, the conservative exploitation leads to improved performance based on off-the-shelf algorithms; (2) In online continuous control, multiple value functions with different shifting constants can be used to tackle the exploration-exploitation dilemma for better sample efficiency; (3) In discrete control tasks, a negative reward shifting yields an improvement over the curiosity-based exploration method.
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. However, prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes. In this work, we show that invariances in the likelihood function of over-parametrised models contribute to this phenomenon because these invariances complicate the structure of the posterior by introducing discrete and/or continuous modes which cannot be well approximated by Gaussian mean-field distributions. In particular, we show that the mean-field approximation has an additional gap in the evidence lower bound compared to a purpose-built posterior that takes into account the known invariances. Importantly, this invariance gap is not constant; it vanishes as the approximation reverts to the prior. We proceed by first considering translation invariances in a linear model with a single data point in detail. We show that, while the true posterior can be constructed from a mean-field parametrisation, this is achieved only if the objective function takes into account the invariance gap. Then, we transfer our analysis of the linear model to neural networks. Our analysis provides a framework for future work to explore solutions to the invariance problem.
We consider observations $(X,y)$ from single index models with unknown link function, Gaussian covariates and a regularized M-estimator $\hat\beta$ constructed from convex loss function and regularizer. In the regime where sample size $n$ and dimension $p$ are both increasing such that $p/n$ has a finite limit, the behavior of the empirical distribution of $\hat\beta$ and the predicted values $X\hat\beta$ has been previously characterized in a number of models: The empirical distributions are known to converge to proximal operators of the loss and penalty in a related Gaussian sequence model, which captures the interplay between ratio $p/n$, loss, regularization and the data generating process. This connection between$(\hat\beta,X\hat\beta)$ and the corresponding proximal operators require solving fixed-point equations that typically involve unobservable quantities such as the prior distribution on the index or the link function. This paper develops a different theory to describe the empirical distribution of $\hat\beta$ and $X\hat\beta$: Approximations of $(\hat\beta,X\hat\beta)$ in terms of proximal operators are provided that only involve observable adjustments. These proposed observable adjustments are data-driven, e.g., do not require prior knowledge of the index or the link function. These new adjustments yield confidence intervals for individual components of the index, as well as estimators of the correlation of $\hat\beta$ with the index. The interplay between loss, regularization and the model is thus captured in a data-driven manner, without solving the fixed-point equations studied in previous works. The results apply to both strongly convex regularizers and unregularized M-estimation. Simulations are provided for the square and logistic loss in single index models including logistic regression and 1-bit compressed sensing with 20\% corrupted bits.
Parallel-in-time methods for partial differential equations (PDEs) have been the subject of intense development over recent decades, particularly for diffusion-dominated problems. It has been widely reported in the literature, however, that many of these methods perform quite poorly for advection-dominated problems. Here we analyze the particular iterative parallel-in-time algorithm of multigrid reduction-in-time (MGRIT) for discretizations of constant-wave-speed linear advection problems. We focus on common method-of-lines discretizations that employ upwind finite differences in space and Runge-Kutta methods in time. Using a convergence framework we developed in previous work, we prove for a subclass of these discretizations that, if using the standard approach of rediscretizing the fine-grid problem on the coarse grid, robust MGRIT convergence with respect to CFL number and coarsening factor is not possible. This poor convergence and non-robustness is caused, at least in part, by an inadequate coarse-grid correction for smooth Fourier modes known as characteristic components.We propose an alternative coarse-grid that provides a better correction of these modes. This coarse-grid operator is related to previous work and uses a semi-Lagrangian discretization combined with an implicitly treated truncation error correction. Theory and numerical experiments show the coarse-grid operator yields fast MGRIT convergence for many of the method-of-lines discretizations considered, including for both implicit and explicit discretizations of high order.
We investigate the novel problem of voting-based opinion maximization in a social network: Find a given number of seed nodes for a target campaigner, in the presence of other competing campaigns, so as to maximize a voting-based score for the target campaigner at a given time horizon. The bulk of the influence maximization literature assumes that social network users can switch between only two discrete states, inactive and active, and the choice to switch is frozen upon one-time activation. In reality, even when having a preferred opinion, a user may not completely despise the other opinions, and the preference level may vary over time due to social influence. To this end, we employ models rooted in opinion formation and diffusion, and use several voting-based scores to determine a user's vote for each of the multiple campaigners at a given time horizon. Our problem is NP-hard and non-submodular for various scores. We design greedy seed selection algorithms with quality guarantees for our scoring functions via sandwich approximation. To improve the efficiency, we develop random walk and sketch-based opinion computation, with quality guarantees. Empirical results validate our effectiveness, efficiency, and scalability.
Skills play a central role in the job market and many human resources (HR) processes. In the wake of other digital experiences, today's online job market has candidates expecting to see the right opportunities based on their skill set. Similarly, enterprises increasingly need to use data to guarantee that the skills within their workforce remain future-proof. However, structured information about skills is often missing, and processes building on self- or manager-assessment have shown to struggle with issues around adoption, completeness, and freshness of the resulting data. Extracting skills is a highly challenging task, given the many thousands of possible skill labels mentioned either explicitly or merely described implicitly and the lack of finely annotated training corpora. Previous work on skill extraction overly simplifies the task to an explicit entity detection task or builds on manually annotated training data that would be infeasible if applied to a complete vocabulary of skills. We propose an end-to-end system for skill extraction, based on distant supervision through literal matching. We propose and evaluate several negative sampling strategies, tuned on a small validation dataset, to improve the generalization of skill extraction towards implicitly mentioned skills, despite the lack of such implicit skills in the distantly supervised data. We observe that using the ESCO taxonomy to select negative examples from related skills yields the biggest improvements, and combining three different strategies in one model further increases the performance, up to 8 percentage points in RP@5. We introduce a manually annotated evaluation benchmark for skill extraction based on the ESCO taxonomy, on which we validate our models. We release the benchmark dataset for research purposes to stimulate further research on the task.
Artificial neural networks have a broad array of applications today due to their high degree of flexibility and ability to model nonlinear functions from data. However, the trustworthiness of neural networks is limited due to their black-box nature, their poor ability to generalize from small datasets, and their inconsistent convergence during training. Aluminum electrolysis is a complex nonlinear process with many interrelated sub-processes. Artificial neural networks can potentially be well suited for modeling the aluminum electrolysis process, but the safety-critical nature of this process requires trustworthy models. In this work, sparse neural networks are trained to model the system dynamics of an aluminum electrolysis simulator. The sparse model structure has a significantly reduction in model complexity compared to a corresponding dense neural network. We argue that this makes the model more interpretable. Furthermore, the empirical study shows that the sparse models generalize better from small training sets than dense neural networks. Moreover, training an ensemble of sparse neural networks with different parameter initializations show that the models converge to similar model structures with similar learned input features.
We study the problem of estimating an unknown parameter in a distributed and online manner. Existing work on distributed online learning typically either focuses on asymptotic analysis, or provides bounds on regret. However, these results may not directly translate into bounds on the error of the learned model after a finite number of time-steps. In this paper, we propose a distributed online estimation algorithm which enables each agent in a network to improve its estimation accuracy by communicating with neighbors. We provide non-asymptotic bounds on the estimation error, leveraging the statistical properties of the underlying model. Our analysis demonstrates a trade-off between estimation error and communication costs. Further, our analysis allows us to determine a time at which the communication can be stopped (due to the costs associated with communications), while meeting a desired estimation accuracy. We also provide a numerical example to validate our results.
Unsupervised domain adaptation has recently emerged as an effective paradigm for generalizing deep neural networks to new target domains. However, there is still enormous potential to be tapped to reach the fully supervised performance. In this paper, we present a novel active learning strategy to assist knowledge transfer in the target domain, dubbed active domain adaptation. We start from an observation that energy-based models exhibit free energy biases when training (source) and test (target) data come from different distributions. Inspired by this inherent mechanism, we empirically reveal that a simple yet efficient energy-based sampling strategy sheds light on selecting the most valuable target samples than existing approaches requiring particular architectures or computation of the distances. Our algorithm, Energy-based Active Domain Adaptation (EADA), queries groups of targe data that incorporate both domain characteristic and instance uncertainty into every selection round. Meanwhile, by aligning the free energy of target data compact around the source domain via a regularization term, domain gap can be implicitly diminished. Through extensive experiments, we show that EADA surpasses state-of-the-art methods on well-known challenging benchmarks with substantial improvements, making it a useful option in the open world. Code is available at //github.com/BIT-DA/EADA.