We develop and compare e-variables for testing whether $k$ samples of data are drawn from the same distribution, the alternative being that they come from different elements of an exponential family. We consider the GRO (growth-rate optimal) e-variables for (1) a `small' null inside the same exponential family, and (2) a `large' nonparametric null, as well as (3) an e-variable arrived at by conditioning on the sum of the sufficient statistics. (2) and (3) are efficiently computable, and extend ideas from Turner et al. [2021] and Wald [1947] respectively from Bernoulli to general exponential families. We provide theoretical and simulation-based comparisons of these e-variables in terms of their logarithmic growth rate, and find that for small effects all four e-variables behave surprisingly similarly; for the Gaussian location and Poisson families, e-variables (1) and (3) coincide; for Bernoulli, (1) and (2) coincide; but in general, whether (2) or (3) grows faster under the alternative is family-dependent. We furthermore discuss algorithms for numerically approximating (1).
The field of probabilistic logic programming (PLP) focuses on integrating probabilistic models into programming languages based on logic. Over the past 30 years, numerous languages and frameworks have been developed for modeling, inference and learning in probabilistic logic programs. While originally PLP focused on discrete probability, more recent approaches have incorporated continuous distributions as well as neural networks, effectively yielding neural-symbolic methods. We provide a unified algebraic perspective on PLP, showing that many if not most of the extensions of PLP can be cast within a common algebraic logic programming framework, in which facts are labeled with elements of a semiring and disjunction and conjunction are replaced by addition and multiplication. This does not only hold for the PLP variations itself but also for the underlying execution mechanism that is based on (algebraic) model counting.
Error-correcting codes over the real field are studied which can locate outlying computational errors when performing approximate computing of real vector--matrix multiplication on resistive crossbars. Prior work has concentrated on locating a single outlying error and, in this work, several classes of codes are presented which can handle multiple errors. It is first shown that one of the known constructions, which is based on spherical codes, can in fact handle multiple outlying errors. A second family of codes is then presented with $\zeroone$~parity-check matrices which are sparse and disjunct; such matrices have been used in other applications as well, especially in combinatorial group testing. In addition, a certain class of the codes that are obtained through this construction is shown to be efficiently decodable. As part of the study of sparse disjunct matrices, this work also contains improved lower and upper bounds on the maximum Hamming weight of the rows in such matrices.
Insurance claims processing involves multi-domain entities and multi-source data, along with a number of human-agent interactions. Use of Blockchain technology-based platform can significantly improve scalability and response time for processing of claims which are otherwise manually-intensive and time-consuming. However, the chaincodes involved within the processes that issue claims, approve or deny them as required, need to be formally verified to ensure secure and reliable processing of transactions in Blockchain. In this paper, we use a formal modeling approach to verify various processes and their underlying chaincodes relating to different stages in insurance claims processing viz., issuance, approval, denial, and flagging for fraud investigation by using linear temporal logic (LTL). We simulate the formalism on the chaincodes and analyze the breach of chaincodes via model checking.
We give a simple characterization of which functions can be computed deterministically by anonymous processes in dynamic networks, depending on the number of leaders in the network. In addition, we provide efficient distributed algorithms for computing all such functions assuming minimal or no knowledge about the network. Each of our algorithms comes in two versions: one that terminates with the correct output and a faster one that stabilizes on the correct output without explicit termination. Notably, these are the first deterministic algorithms whose running times scale linearly with both the number of processes and a parameter of the network which we call "dynamic disconnectivity" (meaning that our dynamic networks do not necessarily have to be connected at all times). We also provide matching lower bounds, showing that all our algorithms are asymptotically optimal for any fixed number of leaders. While most of the existing literature on anonymous dynamic networks relies on classical mass-distribution techniques, our work makes use of a recently introduced combinatorial structure called "history tree", also developing its theory in new directions. Among other contributions, our results make definitive progress on two popular fundamental problems for anonymous dynamic networks: leaderless Average Consensus (i.e., computing the mean value of input numbers distributed among the processes) and multi-leader Counting (i.e., determining the exact number of processes in the network). In fact, our approach unifies and improves upon several independent lines of research on anonymous networks, including Nedic et al., IEEE Trans. Automat. Contr. 2009; Olshevsky, SIAM J. Control Optim. 2017; Kowalski-Mosteiro, ICALP 2019, SPAA 2021; Di Luna-Viglietta, FOCS 2022.
Data sets sampled in Lie groups are widespread, and as with multivariate data, it is important for many applications to assess the differences between the sets in terms of their distributions. Indices for this task are usually derived by considering the Lie group as a Riemannian manifold. Then, however, compatibility with the group operation is guaranteed only if a bi-invariant metric exists, which is not the case for most non-compact and non-commutative groups. We show here that if one considers an affine connection structure instead, one obtains bi-invariant generalizations of well-known dissimilarity measures: a Hotelling $T^2$ statistic, Bhattacharyya distance and Hellinger distance. Each of the dissimilarity measures matches its multivariate counterpart for Euclidean data and is translation-invariant, so that biases, e.g., through an arbitrary choice of reference, are avoided. We further derive non-parametric two-sample tests that are bi-invariant and consistent. We demonstrate the potential of these dissimilarity measures by performing group tests on data of knee configurations and epidemiological shape data. Significant differences are revealed in both cases.
Feature selection is an expensive challenging task in machine learning and data mining aimed at removing irrelevant and redundant features. This contributes to an improvement in classification accuracy, as well as the budget and memory requirements for classification, or any other post-processing task conducted after feature selection. In this regard, we define feature selection as a multi-objective binary optimization task with the objectives of maximizing classification accuracy and minimizing the number of selected features. In order to select optimal features, we have proposed a binary Compact NSGA-II (CNSGA-II) algorithm. Compactness represents the population as a probability distribution to enhance evolutionary algorithms not only to be more memory-efficient but also to reduce the number of fitness evaluations. Instead of holding two populations during the optimization process, our proposed method uses several Probability Vectors (PVs) to generate new individuals. Each PV efficiently explores a region of the search space to find non-dominated solutions instead of generating candidate solutions from a small population as is the common approach in most evolutionary algorithms. To the best of our knowledge, this is the first compact multi-objective algorithm proposed for feature selection. The reported results for expensive optimization cases with a limited budget on five datasets show that the CNSGA-II performs more efficiently than the well-known NSGA-II method in terms of the hypervolume (HV) performance metric requiring less memory. The proposed method and experimental results are explained and analyzed in detail.
A supervised feature selection method selects an appropriate but concise set of features to differentiate classes, which is highly expensive for large-scale datasets. Therefore, feature selection should aim at both minimizing the number of selected features and maximizing the accuracy of classification, or any other task. However, this crucial task is computationally highly demanding on many real-world datasets and requires a very efficient algorithm to reach a set of optimal features with a limited number of fitness evaluations. For this purpose, we have proposed the binary multi-objective coordinate search (MOCS) algorithm to solve large-scale feature selection problems. To the best of our knowledge, the proposed algorithm in this paper is the first multi-objective coordinate search algorithm. In this method, we generate new individuals by flipping a variable of the candidate solutions on the Pareto front. This enables us to investigate the effectiveness of each feature in the corresponding subset. In fact, this strategy can play the role of crossover and mutation operators to generate distinct subsets of features. The reported results indicate the significant superiority of our method over NSGA-II, on five real-world large-scale datasets, particularly when the computing budget is limited. Moreover, this simple hyper-parameter-free algorithm can solve feature selection much faster and more efficiently than NSGA-II.
The optimization problem associated to fitting two-layer ReLU networks having $d$~inputs, $k$~neurons, and labels generated by a target network, is considered. Two types of infinite families of spurious minima, giving one minimum per $d$, were recently found. The loss at minima belonging to the first type converges to zero as $d$ increases. In the second type, the loss remains bounded away from zero. That being so, how may one avoid minima belonging to the latter type? Fortunately, such minima are never detected by standard optimization methods. Motivated by questions concerning the nature of this phenomenon, we develop methods to study distinctive analytic properties of hidden minima. By existing analyses, the Hessian spectrum of both types agree modulo $O(d^{-1/2})$-terms -- not promising. Thus, rather, our investigation proceeds by studying curves along which the loss is minimized or maximized, generally referred to as tangency arcs. We prove that apparently far removed group representation-theoretic considerations concerning the arrangement of subspaces invariant to the action of subgroups of $S_d$, the symmetry group over $d$ symbols, relative to ones fixed by the action yield a precise description of all finitely many admissible types of tangency arcs. The general results used for the loss function reveal that arcs emanating from hidden minima differ, characteristically, by their structure and symmetry, precisely on account of the $O(d^{-1/2})$-eigenvalue terms absent in previous work, indicating in particular the subtlety of the analysis. The theoretical results, stated and proved for o-minimal structures, show that the set comprising all tangency arcs is topologically sufficiently tame to enable a numerical construction of tangency arcs and so compare how minima, both types, are positioned relative to adjacent critical points.
We develop a Markovian framework for load balancing that combines classical algorithms such as Power-of-$d$ with auto-scaling mechanisms that allow the net service capacity to scale up or down in response to the current load on the same timescale as job dynamics. Our framework is inspired by serverless platforms, such as Knative, where servers are software functions that can be flexibly instantiated in milliseconds according to scaling rules defined by the users of the serverless platform. The main question is how to design such scaling rules to minimize user-perceived delay performance while ensuring low energy consumption. For the first time, we investigate this problem when the auto-scaling and load balancing processes operate asynchronously (or proactively), as in Knative. In contrast to the synchronous (or reactive) paradigm, asynchronism brings the advantage that jobs do not necessarily need to wait any time a scale-up decision is taken. In our main result, we find a general condition on the structure of scaling rules able to drive mean-field dynamics to delay and relative energy optimality, i.e., a situation where both the user-perceived delay and the relative energy waste induced by idle servers vanish in the limit where the network demand grows to infinity in proportion to the nominal service capacity. The identified condition suggests to scale up the current net capacity if and only if the mean demand exceeds the rate at which servers become idle and active. Finally, we propose a family of scaling rules that satisfy our optimality condition. Numerical simulations demonstrate that these rules provide better delay performance than existing synchronous auto-scaling schemes while inducing almost the same power consumption.
With increasing data requirements of users, cellular operators are finding new ways to fulfil these requirements. These attempts involve the practice of deploying Wi-Fi access points nearer to the user and backhauling it to the nearest eNB (in case of LTE and LTE-A). The paper studies LTE-U, an extension of LTE which works in the unlicensed spectrum, as a potential solution to this problem. It is based on the idea of densification. Network deployments incorporating LTE-U will be able to better cater to the growing data rate demand of voice and video, thus reducing the load on eNB. Further we explore the possibility of LTE-U as an alternative to Wi-Fi or co-existing with Wi-Fi deployments and issues revolving around this idea. We show that LTE-U deployment solves the problem of capacity in both cases.