We develop a statistical framework to evaluate evidence of alleged cheating with illegal signalling in sports from a forensic perspective. We explain why instead of a frequenstic procedure, a Bayesian approach is called for. We apply this framework to cases of alleged cheating in professional bridge and professional baseball. The diversity of these applications illustrates the generality of the method.
In decision-making, maxitive functions are used for worst-case and best-case evaluations. Maxitivity gives rise to a rich structure that is well-studied in the context of the pointwise order. In this article, we investigate maxitivity with respect to general preorders and provide a representation theorem for such functionals. The results are illustrated for different stochastic orders in the literature, including the usual stochastic order, the increasing convex/concave order, and the dispersive order.
Graph neural networks (GNNs) provide state-of-the-art results in a wide variety of tasks which typically involve predicting features at the vertices of a graph. They are built from layers of graph convolutions which serve as a powerful inductive bias for describing the flow of information among the vertices. Often, more than one data modality is available. This work considers a setting in which several graphs have the same vertex set and a common vertex-level learning task. This generalizes standard GNN models to GNNs with several graph operators that do not commute. We may call this model graph-tuple neural networks (GtNN). In this work, we develop the mathematical theory to address the stability and transferability of GtNNs using properties of non-commuting non-expansive operators. We develop a limit theory of graphon-tuple neural networks and use it to prove a universal transferability theorem that guarantees that all graph-tuple neural networks are transferable on convergent graph-tuple sequences. In particular, there is no non-transferable energy under the convergence we consider here. Our theoretical results extend well-known transferability theorems for GNNs to the case of several simultaneous graphs (GtNNs) and provide a strict improvement on what is currently known even in the GNN case. We illustrate our theoretical results with simple experiments on synthetic and real-world data. To this end, we derive a training procedure that provably enforces the stability of the resulting model.
We develop both first and second order numerical optimization methods to solve non-smooth optimization problems featuring a shared sparsity penalty, constrained by differential equations with uncertainty. To alleviate the curse of dimensionality we use tensor product approximations. To handle the non-smoothness of the objective function we introduce a smoothed version of the shared sparsity objective. We consider both a benchmark elliptic PDE constraint, and a more realistic topology optimization problem. We demonstrate that the error converges linearly in iterations and the smoothing parameter, and faster than algebraically in the number of degrees of freedom, consisting of the number of quadrature points in one variable and tensor ranks. Moreover, in the topology optimization problem, the smoothed shared sparsity penalty actually reduces the tensor ranks compared to the unpenalised solution. This enables us to find a sparse high-resolution design under a high-dimensional uncertainty.
We investigate first-order notions of correlated equilibria; distributions of actions for smooth, potentially non-concave games such that players do not incur any regret against small modifications to their strategies along a set of continuous vector fields. We define two such notions, based on local deviations and on stationarity of the distribution, and identify the notion of coarseness as the setting where the associated vector fields are in fact gradient fields. For coarse equilibria, we prove that online (projected) gradient decent has a universal approximation property for both variants of equilibrium. In the non-coarse setting, we instead reduce the problem of finding an equilibrium to fixed-point computation via the usual framework of $\Phi$-regret minimisation, and identify tractable instances. Finally, we study the primal-dual framework to our notion of first-order equilibria. For coarse equilibria defined by a family of functions, we find that a dual bound on the worst-case expectation of a performance metric takes the form of a generalised Lyapunov function for the dynamics of the game. Specifically, usual primal-dual price of anarchy analysis for coarse correlated equilibria as well as the smoothness framework of Roughgarden are both equivalent to a problem of general Lyapunov function estimation. For non-coarse equilibria, we instead observe a vector field fit problem for the gradient dynamics of the game. These follow from containment results in normal form games; the usual notion of a (coarse) correlated equilibria is equivalent to our first-order local notions of (coarse) correlated equilibria with respect to an appropriately chosen set of vector fields.
Selling a perfectly divisible item to potential buyers is a fundamental task with apparent applications to pricing communication bandwidth and cloud computing services. Surprisingly, despite the rich literature on single-item auctions, revenue maximization when selling a divisible item is a much less understood objective. We introduce a Bayesian setting, in which the potential buyers have concave valuation functions (defined for each possible item fraction) that are randomly chosen according to known probability distributions. Extending the sequential posted pricing paradigm, we focus on mechanisms that use linear pricing, charging a fixed price for the whole item and proportional prices for fractions of it. Our goal is to understand the power of such mechanisms by bounding the gap between the expected revenue that can be achieved by the best among these mechanisms and the maximum expected revenue that can be achieved by any mechanism assuming mild restrictions on the behavior of the buyers. Under regularity assumptions for the probability distributions, we show that this revenue gap depends only logarithmically on a natural parameter characterizing the valuation functions and the number of agents. Our results follow by bounding the objective value of a mathematical program that maximizes the ex-ante relaxation of optimal revenue under linear pricing revenue constraints.
The success of over-parameterized neural networks trained to near-zero training error has caused great interest in the phenomenon of benign overfitting, where estimators are statistically consistent even though they interpolate noisy training data. While benign overfitting in fixed dimension has been established for some learning methods, current literature suggests that for regression with typical kernel methods and wide neural networks, benign overfitting requires a high-dimensional setting where the dimension grows with the sample size. In this paper, we show that the smoothness of the estimators, and not the dimension, is the key: benign overfitting is possible if and only if the estimator's derivatives are large enough. We generalize existing inconsistency results to non-interpolating models and more kernels to show that benign overfitting with moderate derivatives is impossible in fixed dimension. Conversely, we show that rate-optimal benign overfitting is possible for regression with a sequence of spiky-smooth kernels with large derivatives. Using neural tangent kernels, we translate our results to wide neural networks. We prove that while infinite-width networks do not overfit benignly with the ReLU activation, this can be fixed by adding small high-frequency fluctuations to the activation function. Our experiments verify that such neural networks, while overfitting, can indeed generalize well even on low-dimensional data sets.
Recurrent neural networks (RNNs) notoriously struggle to learn long-term memories, primarily due to vanishing and exploding gradients. The recent success of state-space models (SSMs), a subclass of RNNs, to overcome such difficulties challenges our theoretical understanding. In this paper, we delve into the optimization challenges of RNNs and discover that, as the memory of a network increases, changes in its parameters result in increasingly large output variations, making gradient-based learning highly sensitive, even without exploding gradients. Our analysis further reveals the importance of the element-wise recurrence design pattern combined with careful parametrizations in mitigating this effect. This feature is present in SSMs, as well as in other architectures, such as LSTMs. Overall, our insights provide a new explanation for some of the difficulties in gradient-based learning of RNNs and why some architectures perform better than others.
We propose to condition a generative model by a given image classifier uncertainty in order to analyze and explain its behavior. Preliminary experiments on synthetic data and a corrupted version of MNIST dataset illustrate the idea.
We develop a Markov model of curling matches, parametrised by the probability of winning an end and the probability distribution of scoring ends. In practical applications, these end-winning probabilities can be estimated econometrically, and are shown to depend on which team holds the hammer, as well as the offensive and defensive strengths of the respective teams. Using a maximum entropy argument, based on the idea of characteristic scoring patterns in curling, we predict that the points distribution of scoring ends should follow a constrained geometric distribution. We provide analytical results detailing when it is optimal to blank the end in preference to scoring one point and losing possession of the hammer. Statistical and simulation analysis of international curling matches is also performed.
Cybercriminal profiling and cyber-attack attribution have been elusive goals world-wide, due to their effects on societal and geopolitical balance and stability. Attributing actions to a group or state is a complex endeavour, with traditional established approaches including cyber threat intelligence and analysis of technical means such as malware analysis, network forensics and geopolitical intelligence. However, we propose an additional component for profiling threat actor groups through analysing cultural aspects of human behaviours and interactions. We utilise a set of variables which determine characteristics of national and organisational culture to create a cultural "footprint" of cybercriminal groups. As a case study, we conduct thematic analysis across the six dimensions of the Hofstede national culture classification and the eight dimensions of the Meyer classification on leaked internal communications of the ransomware group Conti. We propose that a systematic analysis of similar communications can serve as a practical tool for a) understanding the modus operandi of cybercrime and cyberwarfare-related groups, and b) profiling cybercriminal groups and/or nation-state actors. Insights from such applications can, first, assist in combating cybercrime and, second, if combined with additional cyber threat intelligence, can provide a level of confidence in nuanced cyber-attack attribution processes.