In a sports competition, a team might lose a powerful incentive to exert full effort if its final rank does not depend on the outcome of the matches still to be played. Therefore, the organiser should reduce the probability of such a situation to the extent possible. Our paper provides a classification scheme to identify these weakly (where one team is indifferent) or strongly (where both teams are indifferent) stakeless games. A statistical model is estimated to simulate the UEFA Champions League groups and compare the candidate schedules used in the 2021/22 season according to the competitiveness of the matches played in the last round(s). The option followed in four of the eight groups is found to be optimal under a wide set of parameters. Minimising the number of strongly stakeless matches is verified to be a likely goal in the computer draw of the fixture that remains hidden from the public.
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in the high-dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite-dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step-size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. Interestingly, we find a critical scaling regime for the step-size below which the effective ballistic dynamics matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate. We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classification via two-layer networks for binary and XOR-type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub-optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations.
Games played in the last round of a round-robin tournament inspire match-fixing or tacit collusion if the two opposing teams can benefit from a particular result at the expense of other teams. In the case of four teams, the current study identifies all these situations caused by using head-to-head records as the primary tie-breaking principle. Simulations based on the 2016 UEFA European Football Championship reveal that the official tie-breaking policy substantially increases the risk of collusion, but it can be mitigated by choosing an optimal order of matches. Following the proposed schedule improves the competitiveness of the two games played in the last round and raises no problem on any aspect of the competition.
We study the problem of high-dimensional sparse mean estimation in the presence of an $\epsilon$-fraction of adversarial outliers. Prior work obtained sample and computationally efficient algorithms for this task for identity-covariance subgaussian distributions. In this work, we develop the first efficient algorithms for robust sparse mean estimation without a priori knowledge of the covariance. For distributions on $\mathbb R^d$ with "certifiably bounded" $t$-th moments and sufficiently light tails, our algorithm achieves error of $O(\epsilon^{1-1/t})$ with sample complexity $m = (k\log(d))^{O(t)}/\epsilon^{2-2/t}$. For the special case of the Gaussian distribution, our algorithm achieves near-optimal error of $\tilde O(\epsilon)$ with sample complexity $m = O(k^4 \mathrm{polylog}(d))/\epsilon^2$. Our algorithms follow the Sum-of-Squares based, proofs to algorithms approach. We complement our upper bounds with Statistical Query and low-degree polynomial testing lower bounds, providing evidence that the sample-time-error tradeoffs achieved by our algorithms are qualitatively the best possible.
Bilevel optimization has arisen as a powerful tool in modern machine learning. However, due to the nested structure of bilevel optimization, even gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations, which can be costly and unscalable in practice. Recently, Hessian-free bilevel schemes have been proposed to resolve this issue, where the general idea is to use zeroth- or first-order methods to approximate the full hypergradient of the bilevel problem. However, we empirically observe that such approximation can lead to large variance and unstable training, but estimating only the response Jacobian matrix as a partial component of the hypergradient turns out to be extremely effective. To this end, we propose a new Hessian-free method, which adopts the zeroth-order-like method to approximate the response Jacobian matrix via taking difference between two optimization paths. Theoretically, we provide the convergence rate analysis for the proposed algorithms, where our key challenge is to characterize the approximation and smoothness properties of the trajectory-dependent estimator, which can be of independent interest. This is the first known convergence rate result for this type of Hessian-free bilevel algorithms. Experimentally, we demonstrate that the proposed algorithms outperform baseline bilevel optimizers on various bilevel problems. Particularly, in our experiment on few-shot meta-learning with ResNet-12 network over the miniImageNet dataset, we show that our algorithm outperforms baseline meta-learning algorithms, while other baseline bilevel optimizers do not solve such meta-learning problems within a comparable time frame.
Sellers in online markets face the challenge of determining the right time to sell in view of uncertain future offers. Classical stopping theory assumes that sellers have full knowledge of the value distributions, and leverage this knowledge to determine stopping rules that maximize expected welfare. In practice, however, stopping rules must often be determined under partial information, based on scarce data or expert predictions. Consider a seller that has one item for sale and receives successive offers drawn from some value distributions. The decision on whether or not to accept an offer is irrevocable, and the value distributions are only partially known. We therefore let the seller adopt a robust maximin strategy, assuming that value distributions are chosen adversarially by nature to minimize the value of the accepted offer. We provide a general maximin solution to this stopping problem that identifies the optimal (threshold-based) stopping rule for the seller for all statistical information structures. We then perform a detailed analysis for when the seller knows the common mean, dispersion (variance or mean absolute deviation) and support of the distributions. We show for this information structure that the seller's stopping rule consists of decreasing thresholds converging to the common mean, and that nature's adversarial response, in the long run, is to always create an all-or-nothing scenario. The maximin solutions also reveal what happens as dispersion or the number of offers grows large.
We propose throughput and cost optimal job scheduling algorithms in cloud computing platforms offering Infrastructure as a Service. We first consider online migration and propose job scheduling algorithms to minimize job migration and server running costs. We consider algorithms that assume knowledge of job-size on arrival of jobs. We characterize the optimal cost subject to system stability. We develop a drift-plus-penalty framework based algorithm that can achieve optimal cost arbitrarily closely. Specifically this algorithm yields a trade-off between delay and costs. We then relax the job-size knowledge assumption and give an algorithm that uses readily offered service to the jobs. We show that this algorithm gives order-wise identical cost as the job size based algorithm. Later, we consider offline job migration that incurs migration delays. We again present throughput optimal algorithms that minimize server running cost. We illustrate the performance of the proposed algorithms and compare these to the existing algorithms via simulation.
Solving the time-dependent Schr\"odinger equation is an important application area for quantum algorithms. We consider Schr\"odinger's equation in the semi-classical regime. Here the solutions exhibit strong multiple-scale behavior due to a small parameter $\hbar$, in the sense that the dynamics of the quantum states and the induced observables can occur on different spatial and temporal scales. Such a Schr\"odinger equation finds many applications, including in Born-Oppenheimer molecular dynamics and Ehrenfest dynamics. This paper considers quantum analogues of pseudo-spectral (PS) methods on classical computers. Estimates on the gate counts in terms of $\hbar$ and the precision $\varepsilon$ are obtained. It is found that the number of required qubits, $m$, scales only logarithmically with respect to $\hbar$. When the solution has bounded derivatives up to order $\ell$, the symmetric Trotting method has gate complexity $\mathcal{O}\Big({ (\varepsilon \hbar)^{-\frac12} \mathrm{polylog}(\varepsilon^{-\frac{3}{2\ell}} \hbar^{-1-\frac{1}{2\ell}})}\Big),$ provided that the diagonal unitary operators in the pseudo-spectral methods can be implemented with $\mathrm{poly}(m)$ operations. When physical observables are the desired outcomes, however, the step size in the time integration can be chosen independently of $\hbar$. The gate complexity in this case is reduced to $\mathcal{O}\Big({\varepsilon^{-\frac12} \mathrm{polylog}( \varepsilon^{-\frac3{2\ell}} \hbar^{-1} )}\Big),$ with $\ell$ again indicating the smoothness of the solution.
When algorithmic harms emerge, a reasonable response is to stop using the algorithm to resolve concerns related to fairness, accountability, transparency, and ethics (FATE). However, just because an algorithm is removed does not imply its FATE-related issues cease to exist. In this paper, we introduce the notion of the "algorithmic imprint" to illustrate how merely removing an algorithm does not necessarily undo or mitigate its consequences. We operationalize this concept and its implications through the 2020 events surrounding the algorithmic grading of the General Certificate of Education (GCE) Advanced (A) Level exams, an internationally recognized UK-based high school diploma exam administered in over 160 countries. While the algorithmic standardization was ultimately removed due to global protests, we show how the removal failed to undo the algorithmic imprint on the sociotechnical infrastructures that shape students', teachers', and parents' lives. These events provide a rare chance to analyze the state of the world both with and without algorithmic mediation. We situate our case study in Bangladesh to illustrate how algorithms made in the Global North disproportionately impact stakeholders in the Global South. Chronicling more than a year-long community engagement consisting of 47 inter-views, we present the first coherent timeline of "what" happened in Bangladesh, contextualizing "why" and "how" they happened through the lenses of the algorithmic imprint and situated algorithmic fairness. Analyzing these events, we highlight how the contours of the algorithmic imprints can be inferred at the infrastructural, social, and individual levels. We share conceptual and practical implications around how imprint-awareness can (a) broaden the boundaries of how we think about algorithmic impact, (b) inform how we design algorithms, and (c) guide us in AI governance.
In federated learning, participating clients typically possess non-i.i.d. data, posing a significant challenge to generalization to unseen distributions. To address this, we propose a Wasserstein distributionally robust optimization scheme called WAFL. Leveraging its duality, we frame WAFL as an empirical surrogate risk minimization problem, and solve it using a local SGD-based algorithm with convergence guarantees. We show that the robustness of WAFL is more general than related approaches, and the generalization bound is robust to all adversarial distributions inside the Wasserstein ball (ambiguity set). Since the center location and radius of the Wasserstein ball can be suitably modified, WAFL shows its applicability not only in robustness but also in domain adaptation. Through empirical evaluation, we demonstrate that WAFL generalizes better than the vanilla FedAvg in non-i.i.d. settings, and is more robust than other related methods in distribution shift settings. Further, using benchmark datasets we show that WAFL is capable of generalizing to unseen target domains.
Aggregating signals from a collection of noisy sources is a fundamental problem in many domains including crowd-sourcing, multi-agent planning, sensor networks, signal processing, voting, ensemble learning, and federated learning. The core question is how to aggregate signals from multiple sources (e.g. experts) in order to reveal an underlying ground truth. While a full answer depends on the type of signal, correlation of signals, and desired output, a problem common to all of these applications is that of differentiating sources based on their quality and weighting them accordingly. It is often assumed that this differentiation and aggregation is done by a single, accurate central mechanism or agent (e.g. judge). We complicate this model in two ways. First, we investigate the setting with both a single judge, and one with multiple judges. Second, given this multi-agent interaction of judges, we investigate various constraints on the judges' reporting space. We build on known results for the optimal weighting of experts and prove that an ensemble of sub-optimal mechanisms can perform optimally under certain conditions. We then show empirically that the ensemble approximates the performance of the optimal mechanism under a broader range of conditions.