The success of a football team depends on various individual skills and performances of the selected players as well as how cohesively they perform. This work proposes a two-stage process for selecting optimal playing eleven of a football team from its pool of available players. In the first stage, for the reference team, a LASSO-induced modified trinomial logistic regression model is derived to analyze the probabilities of the three possible outcomes. The model takes into account strengths of the players in the team as well as those of the opponent, home advantage, and also the effects of individual players and player combinations beyond the recorded performances of these players. Careful use of the LASSO technique acts as an appropriate enabler of the player selection exercise while keeping the number of variables at a reasonable level. Then, in the second stage, a GRASP-type meta-heuristic is implemented for the team selection which maximizes the probability of win for the team. The work is illustrated with English Premier League data from 2008/09 to 2015/16. The application demonstrates that the model in the first stage furnishes valuable insights about the deciding factors for different teams whereas the optimization steps can be effectively used to determine the best possible starting lineup under various circumstances. Based on the adopted model and methodology, we propose a measure of efficiency in team selection by the team management and analyze the performance of EPL teams on this front.
Originally introduced as a neural network for ensemble learning, mixture of experts (MoE) has recently become a fundamental building block of highly successful modern deep neural networks for heterogeneous data analysis in several applications, including those in machine learning, statistics, bioinformatics, economics, and medicine. Despite its popularity in practice, a satisfactory level of understanding of the convergence behavior of Gaussian-gated MoE parameter estimation is far from complete. The underlying reason for this challenge is the inclusion of covariates in the Gaussian gating and expert networks, which leads to their intrinsically complex interactions via partial differential equations with respect to their parameters. We address these issues by designing novel Voronoi loss functions to accurately capture heterogeneity in the maximum likelihood estimator (MLE) for resolving parameter estimation in these models. Our results reveal distinct behaviors of the MLE under two settings: the first setting is when all the location parameters in the Gaussian gating are non-zeros while the second setting is when there exists at least one zero-valued location parameter. Notably, these behaviors can be characterized by the solvability of two different systems of polynomial equations. Finally, we conduct a simulation study to verify our theoretical results.
Deep learning models, including modern systems like large language models, are well known to offer unreliable estimates of the uncertainty of their decisions. In order to improve the quality of the confidence levels, also known as calibration, of a model, common approaches entail the addition of either data-dependent or data-independent regularization terms to the training loss. Data-dependent regularizers have been recently introduced in the context of conventional frequentist learning to penalize deviations between confidence and accuracy. In contrast, data-independent regularizers are at the core of Bayesian learning, enforcing adherence of the variational distribution in the model parameter space to a prior density. The former approach is unable to quantify epistemic uncertainty, while the latter is severely affected by model misspecification. In light of the limitations of both methods, this paper proposes an integrated framework, referred to as calibration-aware Bayesian neural networks (CA-BNNs), that applies both regularizers while optimizing over a variational distribution as in Bayesian learning. Numerical results validate the advantages of the proposed approach in terms of expected calibration error (ECE) and reliability diagrams.
We demonstrate the relevance of an algorithm called generalized iterative scaling (GIS) or simultaneous multiplicative algebraic reconstruction technique (SMART) and its rescaled block-iterative version (RBI-SMART) in the field of optimal transport (OT). Many OT problems can be tackled through the use of entropic regularization by solving the Schr\"odinger problem, which is an information projection problem, that is, with respect to the Kullback--Leibler divergence. Here we consider problems that have several affine constraints. It is well-known that cyclic information projections onto the individual affine sets converge to the solution. In practice, however, even these individual projections are not explicitly available in general. In this paper, we exchange them for one GIS iteration. If this is done for every affine set, we obtain RBI-SMART. We provide a convergence proof using an interpretation of these iterations as two-step affine projections in an equivalent problem. This is done in a slightly more general setting than RBI-SMART, since we use a mix of explicitly known information projections and GIS iterations. We proceed to specialize this algorithm to several OT applications. First, we find the measure that minimizes the regularized OT divergence to a given measure under moment constraints. Second and third, the proposed framework yields an algorithm for solving a regularized martingale OT problem, as well as a relaxed version of the barycentric weak OT problem. Finally, we show an approach from the literature for unbalanced OT problems.
Boundary value problems based on the convection-diffusion equation arise naturally in models of fluid flow across a variety of engineering applications and design feasibility studies. Naturally, their efficient numerical solution has continued to be an interesting and active topic of research for decades. In the context of finite-element discretization of these boundary value problems, the Streamline Upwind Petrov-Galerkin (SUPG) technique yields accurate discretization in the singularly perturbed regime. In this paper, we propose efficient multigrid iterative solution methods for the resulting linear systems. In particular, we show that techniques from standard multigrid for anisotropic problems can be adapted to these discretizations on both tensor-product as well as semi-structured meshes. The resulting methods are demonstrated to be robust preconditioners for several standard flow benchmarks.
In causal inference, sensitivity analysis is important to assess the robustness of study conclusions to key assumptions. We perform sensitivity analysis of the assumption that missing outcomes are missing completely at random. We follow a Bayesian approach, which is nonparametric for the outcome distribution and can be combined with an informative prior on the sensitivity parameter. We give insight in the posterior and provide theoretical guarantees in the form of Bernstein-von Mises theorems for estimating the mean outcome. We study different parametrisations of the model involving Dirichlet process priors on the distribution of the outcome and on the distribution of the outcome conditional on the subject being treated. We show that these parametrisations incorporate a prior on the sensitivity parameter in different ways and discuss the relative merits. We also present a simulation study, showing the performance of the methods in finite sample scenarios.
One of the most popular club football tournaments, the UEFA Champions League, will see a fundamental reform from the 2024/25 season: the traditional group stage will be replaced by one league where each of the 36 teams plays eight matches. To guarantee that the opponents of the clubs are of the same strength in the new design, it is crucial to forecast the performance of the teams before the tournament as well as possible. This paper investigates whether the currently used rating of the teams, the UEFA club coefficient, can be improved by taking the games played in the national leagues into account. According to our logistic regression models, a variant of the Elo method provides a higher accuracy in terms of explanatory power in the Champions League matches. The Union of European Football Associations (UEFA) is encouraged to follow the example of the FIFA World Ranking and reform the calculation of the club coefficients in order to avoid unbalanced schedules in the novel tournament format of the Champions League.
Artistic pieces can be studied from several perspectives, one example being their reception among readers over time. In the present work, we approach this interesting topic from the standpoint of literary works, particularly assessing the task of predicting whether a book will become a best seller. Dissimilarly from previous approaches, we focused on the full content of books and considered visualization and classification tasks. We employed visualization for the preliminary exploration of the data structure and properties, involving SemAxis and linear discriminant analyses. Then, to obtain quantitative and more objective results, we employed various classifiers. Such approaches were used along with a dataset containing (i) books published from 1895 to 1924 and consecrated as best sellers by the Publishers Weekly Bestseller Lists and (ii) literary works published in the same period but not being mentioned in that list. Our comparison of methods revealed that the best-achieved result - combining a bag-of-words representation with a logistic regression classifier - led to an average accuracy of 0.75 both for the leave-one-out and 10-fold cross-validations. Such an outcome suggests that it is unfeasible to predict the success of books with high accuracy using only the full content of the texts. Nevertheless, our findings provide insights into the factors leading to the relative success of a literary work.
Technology for open-ended language generation, a key application of artificial intelligence, has advanced to a great extent in recent years. Large-scale language models, which are trained on large corpora of text, are being used in a wide range of applications everywhere, from virtual assistants to conversational bots. While these language models output fluent text, existing research shows that these models can and do capture human biases. Many of these biases, especially those that could potentially cause harm, are being well-investigated. On the other hand, studies that infer and change human personality traits inherited by these models have been scarce or non-existent. Our work seeks to address this gap by exploring the personality traits of several large-scale language models designed for open-ended text generation and the datasets used for training them. We build on the popular Big Five factors and develop robust methods that quantify the personality traits of these models and their underlying datasets. In particular, we trigger the models with a questionnaire designed for personality assessment and subsequently classify the text responses into quantifiable traits using a Zero-shot classifier. Our estimation scheme sheds light on an important anthropomorphic element found in such AI models and can help stakeholders decide how they should be applied as well as how society could perceive them. Additionally, we examined approaches to alter these personalities, adding to our understanding of how AI models can be adapted to specific contexts.
Drones have become essential tools in a wide range of industries, including agriculture, surveying, and transportation. However, tracking unmanned aerial vehicles (UAVs) in challenging environments, such cluttered or GNSS-denied environments, remains a critical issue. Additionally, UAVs are being deployed as part of multi-robot systems, where tracking their position can be essential for relative state estimation. In this paper, we evaluate the performance of a multi-scan integration method for tracking UAVs in GNSS-denied environments using a solid-state LiDAR and a Kalman Filter (KF). We evaluate the algorithm's ability to track a UAV in a large open area at various distances and speeds. Our quantitative analysis shows that while "tracking by detection" using a constant velocity model is the only method that consistently tracks the target, integrating multiple scan frequencies using a KF achieves lower position errors and represents a viable option for tracking UAVs in similar scenarios.
Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by finetuning or distillation. Our method extracts LLM rationales as additional supervision for small models within a multi-task training framework. We present three findings across 4 NLP benchmarks: First, compared to both finetuning and distillation, our mechanism achieves better performance with much fewer labeled/unlabeled training examples. Second, compared to LLMs, we achieve better performance using substantially smaller model sizes. Third, we reduce both the model size and the amount of data required to outperform LLMs; our 770M T5 model outperforms the 540B PaLM model using only 80% of available data on a benchmark task.