The collective schedules problem consists in computing a schedule of tasks shared between individuals. Tasks may have different duration, and individuals have preferences over the order of the shared tasks. This problem has numerous applications since tasks may model public infrastructure projects, events taking place in a shared room, or work done by co-workers. Our aim is, given the preferred schedules of individuals (voters), to return a consensus schedule. We propose an axiomatic study of the collective schedule problem, by using classic axioms in computational social choice and new axioms that take into account the duration of the tasks. We show that some axioms are incompatible, and we study the axioms fulfilled by three rules: one which has been studied in the seminal paper on collective schedules (Pascual et al. 2018), one which generalizes the Kemeny rule, and one which generalizes Spearman's footrule. From an algorithmic point of view, we show that these rules solve NP-hard problems, but that it is possible to solve optimally these problems for small but realistic size instances, and we give an efficient heuristic for large instances. We conclude this paper with experiments.
Deep learning has been able to outperform humans in terms of classification accuracy in many tasks. However, to achieve robustness to adversarial perturbations, the best methodologies require to perform adversarial training on a much larger training set that has been typically augmented using generative models (e.g., diffusion models). Our main objective in this work, is to reduce these data requirements while achieving the same or better accuracy-robustness trade-offs. We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin). We find that the existing approaches that prune samples with low margin fails to increase robustness when we add a lot of synthetic data, and explain this situation with a perceptron learning task. Moreover, we find that pruning high margin samples for better accuracy increases the harmful impact of mislabeled perturbed data in adversarial training, hurting both robustness and accuracy. We thus propose PUMA, a new data pruning strategy that computes the margin using DeepFool, and prunes the training samples of highest margin without hurting performance by jointly adjusting the training attack norm on the samples of lowest margin. We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies. Not only PUMA achieves similar robustness with less data, but it also significantly increases the model accuracy, improving the performance trade-off.
Forman-Ricci curvature (FRC) is a potent and powerful tool for analysing empirical networks, as the distribution of the curvature values can identify structural information that is not readily detected by other geometrical methods. Crucially, FRC captures higher-order structural information of clique complexes of a graph or Vietoris-Rips complexes, which is not readily accessible to alternative methods. However, existing FRC platforms are prohibitively computationally expensive. Therefore, herein we develop an efficient set-theoretic formulation for computing such high-order FRC in simplicial complexes. Significantly, our set theory representation reveals previous computational bottlenecks and also accelerates the computation of FRC. Finally, We provide a pseudo-code, a software implementation coined FastForman, as well as a benchmark comparison with alternative implementations. We envisage that FastForman will be used in Topological and Geometrical Data analysis for high-dimensional complex data sets. Moreover, our development paves the way for future generalisations towards efficient computations of FRC on cell complexes.
Discontinuous Galerkin (DG) methods for solving elliptic equations are gaining popularity in the computational physics community for their high-order spectral convergence and their potential for parallelization on computing clusters. However, problems in numerical relativity with extremely stretched grids, such as initial data problems for binary black holes that impose boundary conditions at large distances from the black holes, have proven challenging for DG methods. To alleviate this problem we have developed a primal DG scheme that is generically applicable to a large class of elliptic equations, including problems on curved and extremely stretched grids. The DG scheme accommodates two widely used initial data formulations in numerical relativity, namely the puncture formulation and the extended conformal thin-sandwich (XCTS) formulation. We find that our DG scheme is able to stretch the grid by a factor of $\sim 10^9$ and hence allows to impose boundary conditions at large distances. The scheme converges exponentially with resolution both for the smooth XCTS problem and for the non-smooth puncture problem. With this method we are able to generate high-quality initial data for binary black hole problems using a parallelizable DG scheme. The code is publicly available in the open-source SpECTRE numerical relativity code.
Many real-world processes have complex tail dependence structures that cannot be characterized using classical Gaussian processes. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. In this paper, we aim to push the boundaries on computation and modeling of high-dimensional spatial extremes via integrating a new spatial extremes model that has flexible and non-stationary dependence properties in the encoding-decoding structure of a variational autoencoder called the XVAE. The XVAE can emulate spatial observations and produce outputs that have the same statistical properties as the inputs, especially in the tail. Our approach also provides a novel way of making fast inference with complex extreme-value processes. Through extensive simulation studies, we show that our XVAE is substantially more time-efficient than traditional Bayesian inference while outperforming many spatial extremes models with a stationary dependence structure. Lastly, we analyze a high-resolution satellite-derived dataset of sea surface temperature in the Red Sea, which includes 30 years of daily measurements at 16703 grid cells. We demonstrate how to use XVAE to identify regions susceptible to marine heatwaves under climate change and examine the spatial and temporal variability of the extremal dependence structure.
Several branches of computing use a system's physical dynamics to do computation. We show that the dynamics of an underdamped harmonic oscillator can perform multifunctional computation, solving distinct problems at distinct times within a dynamical trajectory. Oscillator computing usually focuses on the oscillator's phase as the information-carrying component. Here we focus on the time-resolved amplitude of an oscillator whose inputs influence its frequency, which has a natural parallel as the activity of a time-dependent neural unit. We call this unit an oscillatron. The activity of an oscillatron at fixed time is a nonmonotonic function of the input, and so it can solve nonlinearly-separable problems such as XOR. The activity of the oscillatron at fixed input is a nonmonotonic function of time, and so it is multifunctional in a temporal sense, able to carry out distinct nonlinear computations at distinct times within the same dynamical trajectory. Time-resolved computing of this nature can be done in or out of equilibrium, with the natural time evolution of the system giving us multiple computations for the price of one.
Algorithms which learn environments represented by automata in the past have had complexity scaling with the number of states in the automaton, which can be exponentially large even for automata recognizing regular expressions with a small description length. We thus formalize a compositional language that can construct automata as transformations between certain types of category, representable as string diagrams, which better reflects the description complexity of various automata. We define complexity constraints on this framework by having them operate on categories enriched over filtered sets, and using these constraints, we prove elementary results on the runtime and expressivity of a subset of these transformations which operate deterministically on finite state spaces. These string diagrams, or "string machines," are themselves morphisms in a category, so it is possible for string machines to create other string machines in runtime to model computations which take more than constant memory. We prove sufficient conditions for polynomial runtime guarantees on these, which can help develop complexity constraints on string machines which also encapsulate runtime complexity.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.