SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region. However, little is know about how fast SARSA converges to that region and how large the region is. In this paper, we make progress towards solving this open problem by showing the convergence rate of projected SARSA to a bounded region. Importantly, the region is much smaller than the ball used for projection provided that the the magnitude of the reward is not too large. Our analysis applies to expected SARSA as well as SARSA($\lambda$). Existing works regarding the convergence of linear SARSA to a fixed point all require the Lipschitz constant of SARSA's policy improvement operator to be sufficiently small; our analysis instead applies to arbitrary Lipschitz constants and thus characterizes the behavior of linear SARSA for a new regime.
The approximate uniform sampling of graph realizations with a given degree sequence is an everyday task in several social science, computer science, engineering etc. projects. One approach is using Markov chains. The best available current result about the well-studied switch Markov chain is that it is rapidly mixing on P-stable degree sequences (see DOI:10.1016/j.ejc.2021.103421). The switch Markov chain does not change any degree sequence. However, there are cases where degree intervals are specified rather than a single degree sequence. (A natural scenario where this problem arises is in hypothesis testing on social networks that are only partially observed.) Rechner, Strowick, and M\"uller-Hannemann introduced in 2018 the notion of degree interval Markov chain which uses three (separately well-studied) local operations (switch, hinge-flip and toggle), and employing on degree sequence realizations where any two sequences under scrutiny have very small coordinate-wise distance. Recently Amanatidis and Kleer published a beautiful paper (arXiv:2110.09068), showing that the degree interval Markov chain is rapidly mixing if the sequences are coming from a system of very thin intervals which are centered not far from a regular degree sequence. In this paper we extend substantially their result, showing that the degree interval Markov chain is rapidly mixing if the intervals are centred at P-stable degree sequences.
We introduce a new distortion measure for point processes called functional-covering distortion. It is inspired by intensity theory and is related to both the covering of point processes and logarithmic loss distortion. We obtain the distortion-rate function with feedforward under this distortion measure for a large class of point processes. For Poisson processes, the rate-distortion function is obtained under a general condition called constrained functional-covering distortion, of which both covering and functional-covering are special cases. Also for Poisson processes, we characterize the rate-distortion region for a two-encoder CEO problem and show that feedforward does not enlarge this region.
A central quest of probing is to uncover how pre-trained models encode a linguistic property within their representations. An encoding, however, might be spurious-i.e., the model might not rely on it when making predictions. In this paper, we try to find encodings that the model actually uses, introducing a usage-based probing setup. We first choose a behavioral task which cannot be solved without using the linguistic property. Then, we attempt to remove the property by intervening on the model's representations. We contend that, if an encoding is used by the model, its removal should harm the performance on the chosen behavioral task. As a case study, we focus on how BERT encodes grammatical number, and on how it uses this encoding to solve the number agreement task. Experimentally, we find that BERT relies on a linear encoding of grammatical number to produce the correct behavioral output. We also find that BERT uses a separate encoding of grammatical number for nouns and verbs. Finally, we identify in which layers information about grammatical number is transferred from a noun to its head verb.
In a sports competition, a team might lose a powerful incentive to exert full effort if its final rank does not depend on the outcome of the matches still to be played. Therefore, the organiser should reduce the probability of such a situation to the extent possible. Our paper provides a classification scheme to identify these weakly (where one team is indifferent) or strongly (where both teams are indifferent) stakeless games. A statistical model is estimated to simulate the UEFA Champions League groups and compare the candidate schedules used in the 2021/22 season according to the competitiveness of the matches played in the last round(s). The option followed in four of the eight groups is found to be optimal under a wide set of parameters. Minimising the number of strongly stakeless matches is verified to be a likely goal in the computer draw of the fixture that remains hidden from the public.
We study online convex optimization with switching costs, a practically important but also extremely challenging problem due to the lack of complete offline information. By tapping into the power of machine learning (ML) based optimizers, ML-augmented online algorithms (also referred to as expert calibration in this paper) have been emerging as state of the art, with provable worst-case performance guarantees. Nonetheless, by using the standard practice of training an ML model as a standalone optimizer and plugging it into an ML-augmented algorithm, the average cost performance can be even worse than purely using ML predictions. In order to address the "how to learn" challenge, we propose EC-L2O (expert-calibrated learning to optimize), which trains an ML-based optimizer by explicitly taking into account the downstream expert calibrator. To accomplish this, we propose a new differentiable expert calibrator that generalizes regularized online balanced descent and offers a provably better competitive ratio than pure ML predictions when the prediction error is large. For training, our loss function is a weighted sum of two different losses -- one minimizing the average ML prediction error for better robustness, and the other one minimizing the post-calibration average cost. We also provide theoretical analysis for EC-L2O, highlighting that expert calibration can be even beneficial for the average cost performance and that the high-percentile tail ratio of the cost achieved by EC-L2O to that of the offline optimal oracle (i.e., tail cost ratio) can be bounded. Finally, we test EC-L2O by running simulations for sustainable datacenter demand response. Our results demonstrate that EC-L2O can empirically achieve a lower average cost as well as a lower competitive ratio than the existing baseline algorithms.
In this work, we introduce a novel approach to formulating an artificial viscosity for shock capturing in nonlinear hyperbolic systems by utilizing the property that the solutions of hyperbolic conservation laws are not reversible in time in the vicinity of shocks. The proposed approach does not require any additional governing equations or a priori knowledge of the hyperbolic system in question, is independent of the mesh and approximation order, and requires the use of only one tunable parameter. The primary novelty is that the resulting artificial viscosity is unique for each component of the conservation law which is advantageous for systems in which some components exhibit discontinuities while others do not. The efficacy of the method is shown in numerical experiments of multi-dimensional hyperbolic conservation laws such as nonlinear transport, Euler equations, and ideal magnetohydrodynamics using a high-order discontinuous spectral element method on unstructured grids.
In variable selection, a selection rule that prescribes the permissible sets of selected variables (called a "selection dictionary") is desirable due to the inherent structural constraints among the candidate variables. The methods that can incorporate such restrictions can improve model interpretability and prediction accuracy. Penalized regression can integrate selection rules by assigning the coefficients to different groups and then applying penalties to the groups. However, no general framework has been proposed to formalize selection rules and their applications. In this work, we establish a framework for structured variable selection that can incorporate universal structural constraints. We develop a mathematical language for constructing arbitrary selection rules, where the selection dictionary is formally defined. We show that all selection rules can be represented as a combination of operations on constructs, which can be used to identify the related selection dictionary. One may then apply some criteria to select the best model. We show that the theoretical framework can help to identify the grouping structure in existing penalized regression methods. In addition, we formulate structured variable selection into mixed-integer optimization problems which can be solved by existing software. Finally, we discuss the significance of the framework in the context of statistics.
In the pooled data problem we are given a set of $n$ agents, each of which holds a hidden state bit, either $0$ or $1$. A querying procedure returns for a query set the sum of the states of the queried agents. The goal is to reconstruct the states using as few queries as possible. In this paper we consider two noise models for the pooled data problem. In the noisy channel model, the result for each agent flips with a certain probability. In the noisy query model, each query result is subject to random Gaussian noise. Our results are twofold. First, we present and analyze for both error models a simple and efficient distributed algorithm that reconstructs the initial states in a greedy fashion. Our novel analysis pins down the range of error probabilities and distributions for which our algorithm reconstructs the exact initial states with high probability. Secondly, we present simulation results of our algorithm and compare its performance with approximate message passing (AMP) algorithms that are conjectured to be optimal in a number of related problems.
Proactive dialogue system is able to lead the conversation to a goal topic and has advantaged potential in bargain, persuasion and negotiation. Current corpus-based learning manner limits its practical application in real-world scenarios. To this end, we contribute to advance the study of the proactive dialogue policy to a more natural and challenging setting, i.e., interacting dynamically with users. Further, we call attention to the non-cooperative user behavior -- the user talks about off-path topics when he/she is not satisfied with the previous topics introduced by the agent. We argue that the targets of reaching the goal topic quickly and maintaining a high user satisfaction are not always converge, because the topics close to the goal and the topics user preferred may not be the same. Towards this issue, we propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting. Specifically, we learn the trade-off via a learned goal weight, which consists of four factors (dialogue turn, goal completion difficulty, user satisfaction estimation, and cooperative degree). The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.