In differential Evolution (DE) algorithms, a crossover operation filtering variables to be mutated is employed to search the feasible region flexibly, which leads to its successful applications in a variety of complicated optimization problems. To investigate whether the crossover operator of DE is helpful to performance improvement of evolutionary algorithms (EAs), this paper implements a theoretical analysis for the $(1+1)EA_{C}$ and the $(1+1)EA_{CM}$, two variants of the $(1+1)EA$ that incorporate the binomial crossover operator. Generally, the binomial crossover results in the enhancement of exploration and the dominance of transition matrices under some conditions. As a result, both the $(1+1)EA_{C}$ and the $(1+1)EA_{CM}$ outperform the $(1+1)EA$ on the unimodal OneMax problem, but do not always dominate it on the Deceptive problem. Finally, we perform an exploration analysis by investigating probabilities to transfer from non-optimal statuses to the optimal status of the Deceptive problem, and propose adaptive parameter settings to strengthen the promising function of binomial crossover. It suggests that incorporation of the binomial crossover could be a feasible strategy to improve the performances of EAs.
We study the sketching and communication complexity of deciding whether a binary sequence $x$ of length $n$ contains a binary sequence $y$ of length $k$ as a subsequence. We prove that this problem has a deterministic sketch of size $O(k \log k)$ and that any sketch has size $\Omega(k)$. We also give nearly tight bounds for the communication complexity of this problem, and extend most of our results to larger alphabets. Finally, we leverage ideas from our sketching lower bound to prove a lower bound for the VC dimension of a family of classifiers that are based on subsequence containment.
We develop the theory of a metric, which we call the $\nu$-based Wasserstein metric and denote by $W_\nu$, on the set of probability measures $\mathcal P(X)$ on a domain $X \subseteq \mathbb{R}^m$. This metric is based on a slight refinement of the notion of generalized geodesics with respect to a base measure $\nu$ and is relevant in particular for the case when $\nu$ is singular with respect to $m$-dimensional Lebesgue measure; it is also closely related to the concept of linearized optimal transport. The $\nu$-based Wasserstein metric is defined in terms of an iterated variational problem involving optimal transport to $\nu$; we also characterize it in terms of integrations of classical Wasserstein distance between the conditional probabilities and through limits of certain multi-marginal optimal transport problems. As we vary the base measure $\nu$, the $\nu$-based Wasserstein metric interpolates between the usual quadratic Wasserstein distance and a metric associated with the uniquely defined generalized geodesics obtained when $\nu$ is sufficiently regular. When $\nu$ concentrates on a lower dimensional submanifold of $\mathbb{R}^m$, we prove that the variational problem in the definition of the $\nu$-based Wasserstein distance has a unique solution. We establish geodesic convexity of the usual class of functionals and of the set of source measures $\mu$ such that optimal transport between $\mu$ and $\nu$ satisfies a strengthening of the generalized nestedness condition introduced in \cite{McCannPass20}. We also present two applications of the ideas introduced here. First, our dual metric is used to prove convergence of an iterative scheme to solve a variational problem arising in game theory. We also use the multi-marginal formulation to characterize solutions to the multi-marginal problem by an ordinary differential equation, yielding a new numerical method for it.
We consider a platform's problem of collecting data from privacy sensitive users to estimate an underlying parameter of interest. We formulate this question as a Bayesian-optimal mechanism design problem, in which an individual can share her (verifiable) data in exchange for a monetary reward or services, but at the same time has a (private) heterogeneous privacy cost which we quantify using differential privacy. We consider two popular differential privacy settings for providing privacy guarantees for the users: central and local. In both settings, we establish minimax lower bounds for the estimation error and derive (near) optimal estimators for given heterogeneous privacy loss levels for users. Building on this characterization, we pose the mechanism design problem as the optimal selection of an estimator and payments that will elicit truthful reporting of users' privacy sensitivities. Under a regularity condition on the distribution of privacy sensitivities we develop efficient algorithmic mechanisms to solve this problem in both privacy settings. Our mechanism in the central setting can be implemented in time $\mathcal{O}(n \log n)$ where $n$ is the number of users and our mechanism in the local setting admits a Polynomial Time Approximation Scheme (PTAS).
Policymakers face a broader challenge of how to view AI capabilities today and where does society stand in terms of those capabilities. This paper surveys AI capabilities and tackles this very issue, exploring it in context of political security in digitally networked societies. We extend the ideas of Information Management to better understand contemporary AI systems as part of a larger and more complex information system. Comprehensively reviewing AI capabilities and contemporary man-machine interactions, we undertake conceptual development to suggest that better information management could allow states to more optimally offset the risks of AI enabled influence and better utilise the emerging capabilities which these systems have to offer to policymakers and political institutions across the world. Hopefully this long essay will actuate further debates and discussions over these ideas, and prove to be a useful contribution towards governing the future of AI.
This paper focuses on stochastic saddle point problems with decision-dependent distributions in both the static and time-varying settings. These are problems whose objective is the expected value of a stochastic payoff function, where random variables are drawn from a distribution induced by a distributional map. For general distributional maps, the problem of finding saddle points is in general computationally burdensome, even if the distribution is known. To enable a tractable solution approach, we introduce the notion of equilibrium points -- which are saddle points for the stationary stochastic minimax problem that they induce -- and provide conditions for their existence and uniqueness. We demonstrate that the distance between the two classes of solutions is bounded provided that the objective has a strongly-convex-strongly-concave payoff and Lipschitz continuous distributional map. We develop deterministic and stochastic primal-dual algorithms and demonstrate their convergence to the equilibrium point. In particular, by modeling errors emerging from a stochastic gradient estimator as sub-Weibull random variables, we provide error bounds in expectation and in high probability that hold for each iteration; moreover, we show convergence to a neighborhood in expectation and almost surely. Finally, we investigate a condition on the distributional map -- which we call opposing mixture dominance -- that ensures the objective is strongly-convex-strongly-concave. Under this assumption, we show that primal-dual algorithms converge to the saddle points in a similar fashion.
What is learning? 20$^{st}$ century formalizations of learning theory -- which precipitated revolutions in artificial intelligence -- focus primarily on $\mathit{in-distribution}$ learning, that is, learning under the assumption that the training data are sampled from the same distribution as the evaluation distribution. This assumption renders these theories inadequate for characterizing 21$^{st}$ century real world data problems, which are typically characterized by evaluation distributions that differ from the training data distributions (referred to as out-of-distribution learning). We therefore make a small change to existing formal definitions of learnability by relaxing that assumption. We then introduce $\mathbf{learning\ efficiency}$ (LE) to quantify the amount a learner is able to leverage data for a given problem, regardless of whether it is an in- or out-of-distribution problem. We then define and prove the relationship between generalized notions of learnability, and show how this framework is sufficiently general to characterize transfer, multitask, meta, continual, and lifelong learning. We hope this unification helps bridge the gap between empirical practice and theoretical guidance in real world problems. Finally, because biological learning continues to outperform machine learning algorithms on certain OOD challenges, we discuss the limitations of this framework vis-\'a-vis its ability to formalize biological learning, suggesting multiple avenues for future research.
We consider zero-sum games on infinite graphs, with objectives specified as sets of infinite words over some alphabet of colors. A well-studied class of objectives is the one of $\omega$-regular objectives, due to its relation to many natural problems in theoretical computer science. We focus on the strategy complexity question: given an objective, how much memory does each player require to play as well as possible? A classical result is that finite-memory strategies suffice for both players when the objective is $\omega$-regular. We show a reciprocal of that statement: when both players can play optimally with a chromatic finite-memory structure (i.e., whose updates can only observe colors) in all infinite game graphs, then the objective must be $\omega$-regular. This provides a game-theoretic characterization of $\omega$-regular objectives, and this characterization can help in obtaining memory bounds. Moreover, a by-product of our characterization is a new one-to-two-player lift: to show that chromatic finite-memory structures suffice to play optimally in two-player games on infinite graphs, it suffices to show it in the simpler case of one-player games on infinite graphs. We illustrate our results with the family of discounted-sum objectives, for which $\omega$-regularity depends on the value of some parameters.
In this paper, we consider downlink low Earth orbit (LEO) satellite communication systems where multiple LEO satellites are uniformly distributed over a sphere at a certain altitude according to a homogeneous binomial point process (BPP). Based on the characteristics of the BPP, we analyze the distance distributions and the distribution cases for the serving satellite. We analytically derive the exact outage probability, and the approximated expression is obtained using the Poisson limit theorem. With these derived expressions, the system throughput maximization problem is formulated under the satellite-visibility and outage constraints. To solve this problem, we reformulate it with bounded feasible sets and propose an iterative algorithm to obtain near-optimal solutions. Simulation results perfectly match the derived exact expressions for the outage probability and system throughput. The analytical results of the approximated expressions are fairly close to those of the exact ones. It is also shown that the proposed algorithm for the throughput maximization is very close to the optimal performance obtained by a two-dimensional exhaustive search.
We study the class of first-order locally-balanced Metropolis--Hastings algorithms introduced in Livingstone & Zanella (2021). To choose a specific algorithm within the class the user must select a balancing function $g:\mathbb{R} \to \mathbb{R}$ satisfying $g(t) = tg(1/t)$, and a noise distribution for the proposal increment. Popular choices within the class are the Metropolis-adjusted Langevin algorithm and the recently introduced Barker proposal. We first establish a universal limiting optimal acceptance rate of 57% and scaling of $n^{-1/3}$ as the dimension $n$ tends to infinity among all members of the class under mild smoothness assumptions on $g$ and when the target distribution for the algorithm is of the product form. In particular we obtain an explicit expression for the asymptotic efficiency of an arbitrary algorithm in the class, as measured by expected squared jumping distance. We then consider how to optimise this expression under various constraints. We derive an optimal choice of noise distribution for the Barker proposal, optimal choice of balancing function under a Gaussian noise distribution, and optimal choice of first-order locally-balanced algorithm among the entire class, which turns out to depend on the specific target distribution. Numerical simulations confirm our theoretical findings and in particular show that a bi-modal choice of noise distribution in the Barker proposal gives rise to a practical algorithm that is consistently more efficient than the original Gaussian version.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.