The Non-dominated Sorting Genetic Algorithm II (NSGA-II) is the most prominent multi-objective evolutionary algorithm for real-world applications. While it performs evidently well on bi-objective optimization problems, empirical studies suggest that it is less effective when applied to problems with more than two objectives. A recent mathematical runtime analysis confirmed this observation by proving the NGSA-II for an exponential number of iterations misses a constant factor of the Pareto front of the simple 3-objective OneMinMax problem. In this work, we provide the first mathematical runtime analysis of the NSGA-III, a refinement of the NSGA-II aimed at better handling more than two objectives. We prove that the NSGA-III with sufficiently many reference points -- a small constant factor more than the size of the Pareto front, as suggested for this algorithm -- computes the complete Pareto front of the 3-objective OneMinMax benchmark in an expected number of O(n log n) iterations. This result holds for all population sizes (that are at least the size of the Pareto front). It shows a drastic advantage of the NSGA-III over the NSGA-II on this benchmark. The mathematical arguments used here and in previous work on the NSGA-II suggest that similar findings are likely for other benchmarks with three or more objectives.
\textit{Pursuit-evasion games} have been intensively studied for several decades due to their numerous applications in artificial intelligence, robot motion planning, database theory, distributed computing, and algorithmic theory. \textsc{Cops and Robber} (\CR) is one of the most well-known pursuit-evasion games played on graphs, where multiple \textit{cops} pursue a single \textit{robber}. The aim is to compute the \textit{cop number} of a graph, $k$, which is the minimum number of cops that ensures the \textit{capture} of the robber. From the viewpoint of parameterized complexity, \CR is W[2]-hard parameterized by $k$~[Fomin et al., TCS, 2010]. Thus, we study structural parameters of the input graph. We begin with the \textit{vertex cover number} ($\mathsf{vcn}$). First, we establish that $k \leq \frac{\mathsf{vcn}}{3}+1$. Second, we prove that \CR parameterized by $\mathsf{vcn}$ is \FPT by designing an exponential kernel. We complement this result by showing that it is unlikely for \CR parameterized by $\mathsf{vcn}$ to admit a polynomial compression. We extend our exponential kernels to the parameters \textit{cluster vertex deletion number} and \textit{deletion to stars number}, and design a linear vertex kernel for \textit{neighborhood diversity}. Additionally, we extend all of our results to several well-studied variations of \CR.
The advent of novel 5G services and applications with binding latency requirements and guaranteed Quality of Service (QoS) hastened the need to incorporate autonomous and proactive decision-making in network management procedures. The objective of our study is to provide a thorough analysis of predictive latency within 5G networks by utilizing real-world network data that is accessible to mobile network operators (MNOs). In particular, (i) we present an analytical formulation of the user-plane latency as a Hypoexponential distribution, which is validated by means of a comparative analysis with empirical measurements, and (ii) we conduct experimental results of probabilistic regression, anomaly detection, and predictive forecasting leveraging on emerging domains in Machine Learning (ML), such as Bayesian Learning (BL) and Machine Learning on Graphs (GML). We test our predictive framework using data gathered from scenarios of vehicular mobility, dense-urban traffic, and social gathering events. Our results provide valuable insights into the efficacy of predictive algorithms in practical applications.
This work introduces a novel cause-effect relation in Markov decision processes using the probability-raising principle. Initially, sets of states as causes and effects are considered, which is subsequently extended to regular path properties as effects and then as causes. The paper lays the mathematical foundations and analyzes the algorithmic properties of these cause-effect relations. This includes algorithms for checking cause conditions given an effect and deciding the existence of probability-raising causes. As the definition allows for sub-optimal coverage properties, quality measures for causes inspired by concepts of statistical analysis are studied. These include recall, coverage ratio and f-score. The computational complexity for finding optimal causes with respect to these measures is analyzed.
The Bayesian Context Trees (BCT) framework is a recently introduced, general collection of statistical and algorithmic tools for modelling, analysis and inference with discrete-valued time series. The foundation of this development is built in part on some well-known information-theoretic ideas and techniques, including Rissanen's tree sources and Willems et al.'s context-tree weighting algorithm. This paper presents a collection of theoretical results that provide mathematical justifications and further insight into the BCT modelling framework and the associated practical tools. It is shown that the BCT prior predictive likelihood (the probability of a time series of observations averaged over all models and parameters) is both pointwise and minimax optimal, in agreement with the MDL principle and the BIC criterion. The posterior distribution is shown to be asymptotically consistent with probability one (over both models and parameters), and asymptotically Gaussian (over the parameters). And the posterior predictive distribution is also shown to be asymptotically consistent with probability one.
The non-dominated sorting genetic algorithm II (NSGA-II) is the most intensively used multi-objective evolutionary algorithm (MOEA) in real-world applications. However, in contrast to several simple MOEAs analyzed also via mathematical means, no such study exists for the NSGA-II so far. In this work, we show that mathematical runtime analyses are feasible also for the NSGA-II. As particular results, we prove that with a population size four times larger than the size of the Pareto front, the NSGA-II with two classic mutation operators and four different ways to select the parents satisfies the same asymptotic runtime guarantees as the SEMO and GSEMO algorithms on the basic OneMinMax and LeadingOnesTrailingZeros benchmarks. However, if the population size is only equal to the size of the Pareto front, then the NSGA-II cannot efficiently compute the full Pareto front: for an exponential number of iterations, the population will always miss a constant fraction of the Pareto front. Our experiments confirm the above findings.
We study multiclass classification in the agnostic adversarial online learning setting. As our main result, we prove that any multiclass concept class is agnostically learnable if and only if its Littlestone dimension is finite. This solves an open problem studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011,2015) who handled the case when the number of classes (or labels) is bounded. We also prove a separation between online learnability and online uniform convergence by exhibiting an easy-to-learn class whose sequential Rademacher complexity is unbounded. Our learning algorithm uses the multiplicative weights algorithm, with a set of experts defined by executions of the Standard Optimal Algorithm on subsequences of size Littlestone dimension. We argue that the best expert has regret at most Littlestone dimension relative to the best concept in the class. This differs from the well-known covering technique of Ben-David, P\'{a}l, and Shalev-Shwartz (2009) for binary classification, where the best expert has regret zero.
There is abundant observational data in the software engineering domain, whereas running large-scale controlled experiments is often practically impossible. Thus, most empirical studies can only report statistical correlations -- instead of potentially more insightful and robust causal relations. To support analyzing purely observational data for causal relations, and to assess any differences between purely predictive and causal models of the same data, this paper discusses some novel techniques based on structural causal models (such as directed acyclic graphs of causal Bayesian networks). Using these techniques, one can rigorously express, and partially validate, causal hypotheses; and then use the causal information to guide the construction of a statistical model that captures genuine causal relations -- such that correlation does imply causation. We apply these ideas to analyzing public data about programmer performance in Code Jam, a large world-wide coding contest organized by Google every year. Specifically, we look at the impact of different programming languages on a participant's performance in the contest. While the overall effect associated with programming languages is weak compared to other variables -- regardless of whether we consider correlational or causal links -- we found considerable differences between a purely associational and a causal analysis of the very same data. The takeaway message is that even an imperfect causal analysis of observational data can help answer the salient research questions more precisely and more robustly than with just purely predictive techniques -- where genuine causal effects may be confounded.
We present RETA (Relative Timing Analysis), a differential timing analysis technique to verify the impact of an update on the execution time of embedded software. Timing analysis is computationally expensive and labor intensive. Software updates render repeating the analysis from scratch a waste of resources and time, because their impact is inherently confined. To determine this boundary, in RETA we apply a slicing procedure that identifies all relevant code segments and a statement categorization that determines how to analyze each such line of code. We adapt a subset of RETA for integration into aiT, an industrial timing analysis tool, and also develop a complete implementation in a tool called DELTA. Based on staple benchmarks and realistic code updates from official repositories, we test the accuracy by analyzing the worst-case execution time (WCET) before and after an update, comparing the measures with the use of the unmodified aiT as well as real executions on embedded hardware. DELTA returns WCET information that ranges from exactly the WCET of real hardware to 148% of the new version's measured WCET. With the same benchmarks, the unmodified aiT estimates are 112% and 149% of the actual executions; therefore, even when DELTA is pessimistic, an industry-strength tool such as aiT cannot do better. Crucially, we also show that RETA decreases aiT's analysis time by 45% and its memory consumption by 8.9%, whereas removing RETA from DELTA, effectively rendering it a regular timing analysis tool, increases its analysis time by 27%.
We employ pressure point analysis and roofline modeling to identify performance bottlenecks and determine an upper bound on the performance of the Canonical Polyadic Alternating Poisson Regression Multiplicative Update (CP-APR MU) algorithm in the SparTen software library. Our analyses reveal that a particular matrix computation, $\Phi^{(n)}$, is the critical performance bottleneck in the SparTen CP-APR MU implementation. Moreover, we find that atomic operations are not a critical bottleneck while higher cache reuse can provide a non-trivial performance improvement. We also utilize grid search on the Kokkos library parallel policy parameters to achieve 2.25x average speedup over the SparTen default for $\Phi^{(n)}$ computation on CPU and 1.70x on GPU. We conclude our investigations by comparing Kokkos implementations of the STREAM benchmark and the matricized tensor times Khatri-Rao product (MTTKRP) benchmark from the Parallel Sparse Tensor Algorithm (PASTA) benchmark suite to implementations using vendor libraries. We show that with a single implementation Kokkos achieves performance comparable to hand-tuned code for fundamental operations that make up tensor decomposition kernels on a wide range of CPU and GPU systems. Overall, we conclude that Kokkos demonstrates good performance portability for simple data-intensive operations but requires tuning for algorithms with more complex dependencies and data access patterns.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.