Economic models may exhibit incompleteness depending on whether or not they admit certain policy-relevant features such as strategic interaction, self-selection, or state dependence. We develop a novel test of model incompleteness and analyze its asymptotic properties. A key observation is that one can identify the least-favorable parametric model that represents the most challenging scenario for detecting local alternatives without knowledge of the selection mechanism. We build a robust test of incompleteness on a score function constructed from such a model. The proposed procedure remains computationally tractable even with nuisance parameters because it suffices to estimate them only under the null hypothesis of model completeness. We illustrate the test by applying it to a market entry model and a triangular model with a set-valued control function.
For a sequence of classification tasks that arrive over time, it is common that tasks are evolving in the sense that consecutive tasks often have a higher similarity. The incremental learning of a growing sequence of tasks holds promise to enable accurate classification even with few samples per task by leveraging information from all the tasks in the sequence (forward and backward learning). However, existing techniques developed for continual learning and concept drift adaptation are either designed for tasks with time-independent similarities or only aim to learn the last task in the sequence. This paper presents incremental minimax risk classifiers (IMRCs) that effectively exploit forward and backward learning and account for evolving tasks. In addition, we analytically characterize the performance improvement provided by forward and backward learning in terms of the tasks' expected quadratic change and the number of tasks. The experimental evaluation shows that IMRCs can result in a significant performance improvement, especially for reduced sample sizes.
Numerous studies have demonstrated the ability of neural language models to learn various linguistic properties without direct supervision. This work takes an initial step towards exploring the less researched topic of how neural models discover linguistic properties of words, such as gender, as well as the rules governing their usage. We propose to use an artificial corpus generated by a PCFG based on French to precisely control the gender distribution in the training data and determine under which conditions a model correctly captures gender information or, on the contrary, appears gender-biased.
We present a fast adaptive method for the evaluation of heat potentials, which plays a key role in the integral equation approach for the solution of the heat equation, especially in a non-stationary domain. The algorithm utilizes a sum-of-exponential based fast Gauss transform that evaluates the convolution of a Gaussian with either discrete or continuous volume distributions. The latest implementation of the algorithm allows for both periodic and free space boundary conditions. The history dependence is overcome by splitting the heat potentials into a smooth history part and a singular local part. We discuss the resolution of the history part on an adaptive volume grid in detail, providing sharp estimates that allow for the construction of an optimal grid, justifying the efficiency of the bootstrapping scheme. While the discussion in this paper is restricted to one spatial dimension, the generalization to two and three dimensions is straightforward. The performance of the algorithm is illustrated via several numerical examples.
We settle the parameterized complexities of several variants of independent set reconfiguration and dominating set reconfiguration, parameterized by the number of tokens. We show that both problems are XL-complete when there is no limit on the number of moves, XNL-complete when a maximum length $\ell$ for the sequence is given in binary in the input, and XNLP-complete when $\ell$ is given in unary. The problems were known to be $\mathrm{W}[1]$- and $\mathrm{W}[2]$-hard respectively when $\ell$ is also a parameter. We complete the picture by showing membership in those classes. Moreover, we show that for all the variants that we consider, token sliding and token jumping are equivalent under pl-reductions. We introduce partitioned variants of token jumping and token sliding, and give pl-reductions between the four variants that have precise control over the number of tokens and the length of the reconfiguration sequence.
Graph neural networks (GNNs) are widely used for modeling complex interactions between entities represented as vertices of a graph. Despite recent efforts to theoretically analyze the expressive power of GNNs, a formal characterization of their ability to model interactions is lacking. The current paper aims to address this gap. Formalizing strength of interactions through an established measure known as separation rank, we quantify the ability of certain GNNs to model interaction between a given subset of vertices and its complement, i.e. between the sides of a given partition of input vertices. Our results reveal that the ability to model interaction is primarily determined by the partition's walk index -- a graph-theoretical characteristic defined by the number of walks originating from the boundary of the partition. Experiments with common GNN architectures corroborate this finding. As a practical application of our theory, we design an edge sparsification algorithm named Walk Index Sparsification (WIS), which preserves the ability of a GNN to model interactions when input edges are removed. WIS is simple, computationally efficient, and in our experiments has markedly outperformed alternative methods in terms of induced prediction accuracy. More broadly, it showcases the potential of improving GNNs by theoretically analyzing the interactions they can model.
We propose a model of treatment interference where the response of a unit depends only on its treatment status and the statuses of units within its K-neighborhood. Current methods for detecting interference include carefully designed randomized experiments and conditional randomization tests on a set of focal units. We give guidance on how to choose focal units under this model of interference. We then conduct a simulation study to evaluate the efficacy of existing methods for detecting network interference. We show that this choice of focal units leads to powerful tests of treatment interference which outperform current experimental methods.
During the pre-training step of natural language models, the main objective is to learn a general representation of the pre-training dataset, usually requiring large amounts of textual data to capture the complexity and diversity of natural language. Contrasting this, in most cases, the size of the data available to solve the specific downstream task is often dwarfed by the aforementioned pre-training dataset, especially in domains where data is scarce. We introduce controlled randomness, i.e. noise, into the training process to improve fine-tuning language models and explore the performance of targeted noise in addition to the parameters of these models. We find that adding such noise can improve the performance in our two downstream tasks of joint named entity recognition and relation extraction and text summarization.
Modern NLP models are often trained over large untrusted datasets, raising the potential for a malicious adversary to compromise model behaviour. For instance, backdoors can be implanted through crafting training instances with a specific textual trigger and a target label. This paper posits that backdoor poisoning attacks exhibit \emph{spurious correlation} between simple text features and classification labels, and accordingly, proposes methods for mitigating spurious correlation as means of defence. Our empirical study reveals that the malicious triggers are highly correlated to their target labels; therefore such correlations are extremely distinguishable compared to those scores of benign features, and can be used to filter out potentially problematic instances. Compared with several existing defences, our defence method significantly reduces attack success rates across backdoor attacks, and in the case of insertion-based attacks, our method provides a near-perfect defence.
Deep neural networks have revolutionized many machine learning tasks in power systems, ranging from pattern recognition to signal processing. The data in these tasks is typically represented in Euclidean domains. Nevertheless, there is an increasing number of applications in power systems, where data are collected from non-Euclidean domains and represented as the graph-structured data with high dimensional features and interdependency among nodes. The complexity of graph-structured data has brought significant challenges to the existing deep neural networks defined in Euclidean domains. Recently, many studies on extending deep neural networks for graph-structured data in power systems have emerged. In this paper, a comprehensive overview of graph neural networks (GNNs) in power systems is proposed. Specifically, several classical paradigms of GNNs structures (e.g., graph convolutional networks, graph recurrent neural networks, graph attention networks, graph generative networks, spatial-temporal graph convolutional networks, and hybrid forms of GNNs) are summarized, and key applications in power systems such as fault diagnosis, power prediction, power flow calculation, and data generation are reviewed in detail. Furthermore, main issues and some research trends about the applications of GNNs in power systems are discussed.
Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of Natural Language Processing (NLP). We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized. We detail the operations and explainability techniques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community. Finally, we point out the current gaps and encourage directions for future work in this important research area.